I recently came across a “CPU time limit exceeded” error thrown from an Apex class that implements the Batchable
interface. It is not a typical one caused by excessive Apex logic according to the following observations:
- There is no stack trace or error details.
- The error is consistently hit on the first batch of the job.
- Debug logs in the
start
method suggest that it runs successfully consuming little CPU time. - Debug logs are not even printed at the beginning of the
execute
method.
This made me think that there is something wrong in between the start
and the execute
methods. But this is where I have no visibility nor control.
Typically the start
method of a Batchable
class just queries all records that need to be processed in the system and would not spend much CPU time processing any business logic, whilst the execute
method focuses on executing the actual business logic on a single batch of records. The mechanism that chunks all queried records into batches are not exposed at all. But what excessive logic could it possibly run in terms of chunking the data?
I examined the SOQL query in the start
method as that is the only thing I can think of. It queries an object with IDs that exist in a subset of child objects. Something like this:
public Database.QueryLocator start(Database.BatchableContext BC) {
String query = 'select Id from Parent__c '
where Id in (select ParentId from Child__c)';
return Database.getQueryLocator(query);
}
For illustrative purposes, I used Parent__c
to denote the parent object and Child__c
to denote the child object. In the actual org, there are about 40,000 parent records and 40,000 child records. The inner query in the where
clause looks alarming. It tries to query 40,000 records in the system to evaluate the where clause of the outer query. The “Id in a set of Ids” condition actually needs to concatenate 40,000 records’ IDs into a set. This is where the 10 seconds CPU time could be spent. Running the following statement proves that concatenating 40,000 IDs consumes roughly 10 seconds CPU time (sometimes faster):
Id anId = '0032000001Dqw31';
String s = '';
for (Integer i = 0; i < 40000; i++) {
s += anId + ',';
}
Integer cpu = Limits.getCpuTime();
System.debug('>>> cpu: ' + cpu);
What happened (although not documented) was that the SOQL query was executed without the where clause being actually evaluated in the start
method so as to promptly return a query locator. Then there came the implicit Apex logic, between the start
and execute
methods, to evaluate the where clause and retrieve the actual records in chunks.
In summary, the anti-pattern to avoid here is doing an inner query on an object with large data volume in the where
clause of the SOQL in the start
method. The simple correction is to focus on querying a single object of large data volume in the start
method and use filtering logic in the execute
method to get the desired records that need to be processed.