Shortening URLs in AEM

Juan Ayala, October 4, 2021

Let’s face it, AEM URLs are ugly! Shortening them has been a requirement in every single AEM Sites implementation I have worked on. 

Why do clients want shorter, optimized URLs? First of all, less complicated URLs are more user friendly. They are easy for site visitors to type, clearer to read so a visitor is confident they are going to a relevant page, and less prone to user error when copy/pasting or retyping. From an SEO perspective, Google and other search engines can crawl a site more easily when the site structure (subfolders are used well) is clear and the URLs are easy to read without a ton of unnecessary symbols or parameters. 

Here is a typical AEM URL:
http://www.mysite.com/content/mysite/en/mysection/mypage.html

Instead of that, we would shorten the URL to:
http://www.mysite.com/en/mysection/mypage.html

Or assume en is the default:
http://www.mysite.com/mysection/mypage.html

Or remove the extension:
http://www.mysite.com/mysection/mypage/

Unfortunately, there is no single configuration or feature that shortens URLs. Instead, several things come together. Let's cover some terms first.

Outgoing URLs (aka mapping)

Outgoing URLs are the ones getting rendered out and sent to the browser. They can exist in thecustomer

HTML markup as anchor hrefs, form actions, or image sources. Before the AEM publisher sends back the requested page it will need to call ResourceResolver.map(String resourcePath) on all of the long URLs to map them to their short-form.

Incoming URLs (aka resolving)

When the user clicks on one of those short URLs, the browser will start a request that goes back to the dispatcher first. Then, it gets proxied back to the AEM publisher. The publisher will need to call ResourceResolver.resolve(String absPath) to resolve that URL back to its long-form.

Project Archetype

Now that we know the difference between mapping and resolving, let's take a look at the AEM project archetype. I love the project archetype. It is an excellent source of best practice examples that is constantly updated. Over the past few years, the configurations to handle short URLs have been added. I will point them out to you.

Let's create a new AEMaaCS boilerplate project.


mvn -B archetype:generate \
  -D archetypeGroupId=com.adobe.aem \
  -D archetypeArtifactId=aem-project-archetype \
  -D archetypeVersion=30 \
  -D appTitle="My Site" \
  -D appId="mysite" \
  -D groupId="com.mysite"

Apache Sling Resource Resolver Factory

The resolver factory is what does the mapping and resolving. Open up the Felix configuration config console. Locate the “Apache Sling Resource Resolver Factory”. For our purpose, the property that is of interest to us is the URL Mappings.

Read the description. The format is <internalPrefix><op><externalPrefix> and the operators are:

  • < outgoing, meaning mapping
  • > incoming, meaning resolving
  • : bidirectional, meaning for both

Out of the box, the configuration is “/:/”, which has no real effect. Now let's take a look at the config file generated by the archetype. It is in JSON format supported by AEMaaCS. You can translate to the sling:OsgiConfig counterpart if you are working with AEM 6.x.

/apps/mysite/osgiconfig/config.publish/org.apache.sling.jcr.resource.internal.JcrResourceResolverFactoryImpl.cfg.json

{
  "resource.resolver.mapping": [
    "/content/mysite/</", "/:/"
  ]
}

There are two things to note:

  • It is in the config.publish run mode
  • It adds a single outgoing config

You will find these rules and be able to test mapping and resolving on the resolver console. The URL Mappings property is for simple use cases. The Mapping Location that points to /etc/map is for more robust Sling Mappings configuration.

The mapping above would have the effect of mapping /content/mysite/mypage.htm → /mypage.html. Let’s say we have 2 requirements;  extensionless URLs and en should be the default so that the following mappings occur as below:

  • /content/mysite/en.html → /
  • /content/mysite/en/homepage.html → /homepage/
  • /content/mysite/es.html → /es/
  • /content/mysite/es/homepage.html → /es/homepage/

Then we can configure the following mappings. The first one takes care of the en language root. The second one takes care of anything under en. And the third takes care of all the other language nodes.

{
  "resource.resolver.mapping": [
    "^/content/myapp/en\\.html</",
    "^/content/myapp/en/(.+)\\.html</$1/",
    "^/content/myapp/(.+)\\.html</$1/",
    "/:/"
  ]
}

Day CQ Link Checker Transformer

We learned that the archetype configures the resource resolver to map URLs. The link checker is what is calling the ResourceResolver.map(String resourcePath).

The link checker is part of the default rewriter pipeline. The list of configured pipelines is at http://localhost:4503/system/console/status-slingrewriter. Locate the default configuration. The very first transformer is the linkchecker.

Now go back to the configuration manager and locate the “Day CQ Link Checker Transformer”.

You have nothing to configure here. The out-of-the-box transformer will rewrite links on some of the common HTML tags & attributes i.e. <a href=”...”>. It is also important to understand that the link checker has 2 roles, checking and rewriting. It is the rewrite feature that applies the outgoing resolver mapping to our links.

One use case for turning off the link checker’s rewrite feature is when you want to put in place your own transformer. Or you can use an open-source one like the Resource Resolver Mapping Rewriter.

There is a handy Strip HTML Extension that should work if you want extensionless URLs. If you write your own custom transformer you will need to take care of that yourself. And if you are using the ACS one, you will have to use the URL Mappings like the ones above.

Challenge #1 - Rewriting URLs in JSON Models

The SAX pipeline transformers are great for HTML attributes and elements. What about the .model.json generated by Sling Model Exporters? There you will use ResourceResolver.map(String resourcePath) at the model level. But take a look at Sling Filters, particularly the ACS Commons Contextual Content Variables feature. They wrote a filter ContentVariableJsonFilter.java to do just that. They are replacing tokens within the JSON content. You could update it to target known links or link formats. You can also take a look our our custom solution for Content Fragment models here.

Dispatcher

Incoming URLs are going to hit your dispatcher before they make it to the publisher. You can configure the resolver factory to resolve incoming URLs. But the archetype has configured the dispatcher to do it with the rewrite module.

/mysite/dispatcher/src/conf.d/rewrites/rewrite.rules

Include conf.d/rewrites/default_rewrite.rules

# rewrite for root redirect
RewriteRule ^/?$ /content/${CONTENT_FOLDER_NAME}/us/en.html [PT,L]

RewriteCond %{REQUEST_URI} !^/apps
RewriteCond %{REQUEST_URI} !^/bin
RewriteCond %{REQUEST_URI} !^/content
RewriteCond %{REQUEST_URI} !^/etc
RewriteCond %{REQUEST_URI} !^/home
RewriteCond %{REQUEST_URI} !^/libs
RewriteCond %{REQUEST_URI} !^/saml_login
RewriteCond %{REQUEST_URI} !^/system
RewriteCond %{REQUEST_URI} !^/tmp
RewriteCond %{REQUEST_URI} !^/var
RewriteCond %{REQUEST_URI} (.html|.jpe?g|.png|.svg)$
RewriteRule ^/(.*)$ /content/${CONTENT_FOLDER_NAME}/$1 [PT,L]

On the third line, the site root / will default to the English page /content/${CONTENT_FOLDER_NAME}/us/en.html.

On the last line, /content/${CONTENT_FOLDER_NAME} will get appended to every request (assuming it is for an HTML, JPEG, PNG, or SVG file and that it is not within one of those system paths).

In both cases, the flags [PT,L] get applied. PT (pass-through) instructs the rewrite engine to take the new path and send it back to AEM. L (last) instructs the engine to stop any further processing.


Challenge #2 - Extensionless URLs

The archetype doesn’t set up extensionless urls. In the rewrite rules above you will need to append “.html” to the incoming paths. Can you rewrite the rewrite rule? With AEMaaCS, setting up a dispatcher is easy. Check out how to set up the local dispatcher tools.

Conclusion

When it comes to rewriting URLs the general rule of thumb is don’t do it at the HTL or Sling Model level. Let the transformers and mappings do it for you.

Shortening happens between the dispatcher and the publisher. Very rarely have I seen it configured on the author.

The archetype does the simplest of configurations. You can: 

  • Configure the resource resolver to map outgoing URLs from long to short-form before sending them to the browser.
  • Configure the dispatcher to map from the short to long, before sending it back to AEM

Have a question about this or  another solution? Contact 3|SHARE and we'll put you in touch with the right person. Just visit our Contact page and submit the form.

Juan Ayala

Juan Ayala is a Technical Architect at 3|SHARE. He boasts three Adobe Experience Manager certifications: Dev/Ops Engineer, Sites Architect and Sites Developer. His favorite thing about being on the 3|Share is that he is able to do relevant, cutting-edge work that keeps him engaged. Outside of work, Juan enjoys DIY, motorcycling and rescuing dogs from the pound.