An anti-pattern causing the “CPU time limit exceeded” error in batch Apex

I recently came across a “CPU time limit exceeded” error thrown from an Apex class that implements the Batchable interface. It is not a typical one caused by excessive Apex logic according to the following observations:

  • There is no stack trace or error details.
  • The error is consistently hit on the first batch of the job.
  • Debug logs in the start method suggest that it runs successfully consuming little CPU time.
  • Debug logs are not even printed at the beginning of the execute method.

This made me think that there is something wrong in between the start and the execute methods. But this is where I have no visibility nor control.

Typically the start method of a Batchable class just queries all records that need to be processed in the system and would not spend much CPU time processing any business logic, whilst the execute method focuses on executing the actual business logic on a single batch of records. The mechanism that chunks all queried records into batches are not exposed at all. But what excessive logic could it possibly run in terms of chunking the data?

I examined the SOQL query in the start method as that is the only thing I can think of. It queries an object with IDs that exist in a subset of child objects. Something like this:

public Database.QueryLocator start(Database.BatchableContext BC) {
    String query = 'select Id from Parent__c '
        where Id in (select ParentId from Child__c)';
    return Database.getQueryLocator(query);
}

For illustrative purposes, I used Parent__c to denote the parent object and Child__c to denote the child object. In the actual org, there are about 40,000 parent records and 40,000 child records. The inner query in the where clause looks alarming. It tries to query 40,000 records in the system to evaluate the where clause of the outer query. The “Id in a set of Ids” condition actually needs to concatenate 40,000 records’ IDs into a set. This is where the 10 seconds CPU time could be spent. Running the following statement proves that concatenating 40,000 IDs consumes roughly 10 seconds CPU time (sometimes faster):

Id anId = '0032000001Dqw31';
String s = '';
for (Integer i = 0; i < 40000; i++) {
    s += anId + ',';
}
Integer cpu = Limits.getCpuTime();
System.debug('>>> cpu: ' + cpu);

What happened (although not documented) was that the SOQL query was executed without the where clause being actually evaluated in the start method so as to promptly return a query locator. Then there came the implicit Apex logic, between the start and execute methods, to evaluate the where clause and retrieve the actual records in chunks.

In summary, the anti-pattern to avoid here is doing an inner query on an object with large data volume in the where clause of the SOQL in the start method. The simple correction is to focus on querying a single object of large data volume in the start method and use filtering logic in the execute method to get the desired records that need to be processed.

Published by Chun Wu

When nothing is certain, anything is possible.

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: