What are JCR queries?

AEM stores data in Jackrabbit Oak, a Java Content Repository (JCR) implementation. JCR queries allow searching the repository and retrieving selected nodes.

There are three query languages supported and one of them is JCR-SQL2. It is similar to SQL used in relative database management systems and - similarly to SQL - it is easy to write a query that is time and resource consuming. However, it is often possible to rewrite a query to make it much faster.

Tool – explain query

Diagnosis tool - explain query

As with any optimisation task, we need to diagnose a problem first, before we find a solution. A very helpful tool is AEM Diagnosis Tool, especially its Explain query part. It is available in AEM by default and can be found under the path (AEM version 6.3 or below):

http://<host>:<port>/libs/granite/operations/content/diagnosis/tool.html/granite_queryperformance

or (AEM version 6.4 or above):

http://<host>:<port>/libs/granite/operations/content/diagnosistools/queryPerformance.html

Let's take a look at a sample query:

SELECT * FROM [nt:base] AS s WHERE jcr_title ='Equipment'

This query is so inefficient that most probably it will not be finished. An attempt to run it in the CRX DE console will end up with the following message:

The query read or traversed more than 100000 nodes. To avoid affecting other tasks, processing was stopped.

After clicking on Explain the diagnosis tool displays the query explanation.

Query Explanation

Indexes Used

No indexes were used.
This is a traversal query.

Execution Plan

[nt:base] as [s] /* traverse "*" where [s].[jcr_title] = 'Equipment' */

Logs

Parsing JCR-SQL2 statement: explain SELECT * FROM [nt:base] AS s WHERE jcr_title ='Equipment'
Literal used
Attempting optimisation
cost using filter Filter(query=explain SELECT * FROM [nt:base] AS s WHERE jcr_title ='Equipment', path=*, property=[jcr_title=[Equipment]])
cost for reference is Infinity
cost for property is Infinity
cost for nodeType is Infinity
Evaluating plan with index definition Lucene Index : /oak:index/slingeventJob
Evaluating plan with index definition Lucene Index : /oak:index/versionStoreIndex
Evaluating plan with index definition Lucene Index : /oak:index/cqTagLucene
Evaluating plan with index definition Lucene Index : /oak:index/workflowDataLucene
Evaluating plan with index definition Lucene Index : /oak:index/socialLucene
Evaluating plan with index definition Lucene Index : /oak:index/authorizables
Evaluating plan with index definition Lucene Index : /oak:index/ntBaseLucene
Evaluating plan with index definition Lucene Index : /oak:index/lucene
Evaluating plan with index definition Lucene Index : /oak:index/commerceLucene
Evaluating plan with index definition Lucene Index : /oak:index/cqProjectLucene
Evaluating plan with index definition Lucene Index : /oak:index/cqMobileAppLucene
Ignoring index [/oak:index/damAssetLucene] which is not working correctly since 3.840 h ,77 indexing cycles, accessed 1038 times
Evaluating plan with index definition Lucene Index : /oak:index/cqPageLucene
cost for lucene-property is Infinity
cost for aggregate lucene is Infinity
cost for solr is Infinity
cost for traverse is 538724.0
Traversal query (query without index): explain SELECT * FROM [nt:base] AS s WHERE jcr_title ='Equipment'; consider creating an index
No alternatives found. Query: select [s].[jcr:primaryType] as [s.jcr:primaryType] from [nt:base] as [s] where [s].[jcr_title] = 'Equipment'

The most important parts are at the beginning of the query explanation: Indexes used and Execution plan. The message No indexes were used. This is a traversal query is usually not a good news and most probably means a performance problem.

Traversing vs. Indexes

Traversing is a straightforward but ineffective method of searching the JCR tree. It is reading the whole tree or its sub-tree node by node. AEM has a configurable limit on the number of nodes that can be visited in a single query run. By default it is set to 100,000 nodes. It can be changed in System Console > OSGi Configuration > Apache Jackrabbit Query Engine Settings Service. The parameter name is queryLimitReads.

Should the traversal limit be changed? Well, there are cases when traversing is a reasonable solution. For example, you may need to perform some operation on all nodes from a particular path. It can be done programmatically, recursively reading nodes and their children, but effectively it would be the same as a traversal query. In such cases the traversal limit may be too low and may need to be increased, at least temporarily.

Nevertheless, queries usually execute much faster when they make use of indexes and avoid traversing. Indexes are special data structures that are maintained together with the database and allow a quick access to nodes based on their properties.

There are some predefined indexes but administrators can define their own when necessary.

One of the most useful indexes is Lucene full-text. To use it we need to replace a property = value or LIKE clause with CONTAINS. For example:

SELECT * FROM [nt:base] AS s WHERE CONTAINS(jcr_title, 'Equipment')

Now, the query execution is different:

Query Explanation

Indexes Used

lucene(/oak:index/lucene)

Execution Plan

[nt:base] as [s] /* lucene:lucene(/oak:index/lucene) full:jcr_title:equipment ft:(jcr_title:"Equipment") where contains([s].[jcr_title], 'Equipment') */

Query explanation - Lucene index

Such a query is very efficient and it is unlikely to hit the traversal limit in this case. It may be even 100 times faster than the LIKE counterpart.

It is worth noting that the CONTAINS operator works slightly different than equality or LIKE. CONTAINS finds nodes that contain a specified word or words in a selected property - ignoring punctuation or word order. We can restrict the query to find only nodes with the specified words in the given order, putting them in double quotation marks. If we need more restrictions, there is nothing wrong in adding a more specific condition with AND operator.

For example, the following query is both efficient and strict:

SELECT * FROM [nt:base] AS s WHERE ISDESCENDANTNODE([/content/we-retail]) AND CONTAINS(s.*, 't shirt')  AND [jcr:title] like '%T Shirt%'

It yields the same results as the following, slow query:

SELECT * FROM [nt:base] AS s WHERE ISDESCENDANTNODE([/content/we-retail]) AND [jcr:title] like '%T Shirt%'

There is also one drawback of using the Lucene index: it is asynchronous. It means that the most recent changes in the repository are not guaranteed to be reflected in the index. In some cases it may be unacceptable and then only specific, synchronous indexes can be used.

Other improvements

There are other methods of improving query performance. Some of them are general rules, described in the Adobe Experience Manager documentation (Best practices, Troubleshooting). Among others:

  • Constrain the scope of searching to as specific location as possible with ISDESCENDANTNODE.
  • Use a specific node type instead of a general one in the FROM clause. For example, if you are looking for pages only, use: SELECT * FROM [cq:Page] (...) instead of: SELECT * FROM [nt:base] (...).
  • Avoid queries in components. If you know the exact path to a node, it is not necessary to query for it.
  • Create specific indexes when necessary.

Summary – quick tips

The way you write queries in JCR can have significant impact on the performance. Queries that make use of indexes are usually much faster than the ones that require traversing, ie. reading all nodes one by one.

Here are some general tips on writing queries.

  • Whenever possible, use CONTAINS instead of LIKE or '='. It should be your default choice when you need to find a specific text.
  • If a query seems to be slow, check its execution plan using Diagnosis Tool.
  • If you need more specific restrictions, use both CONTAINS and LIKE/equals.
  • Use a specific location.
  • Use a specific node type.