Secret of SQL Trace Duration Column
Why would a trace of long-running queries not show all queries that exceeded the specified duration filter? We have a server-side SQL Trace that includes RPC:Completed and SQL:BatchCompleted events with a filter on Duration >= 100000. Nearly all of the queries on this busy OLTP server run in under this 100 millisecond threshold so any that appear in the trace are candidates for root cause analysis and/or performance tuning opportunities.
After an application experienced query timeouts, the DBA looked at the trace data to corroborate the problem. Surprisingly, he found no long-running queries in the trace from the application that experienced the timeouts even though the application’s error log clearly showed detail of the problem (query text, duration, start time, etc.). The trace did show, however, that there were hundreds of other long-running queries from different applications during the problem timeframe. We later determined those queries were blocked by a large UPDATE query against a critical table that was inadvertently run during this busy period.
So why didn’t the trace include all of the long-running queries? The reason is because the SQL Trace event duration doesn’t include the time a request was queued while awaiting a worker thread. Remember that the server was under considerable stress at the time due to the severe blocking episode. Most of the worker threads were in use by blocked queries and new requests were queued awaiting a worker to free up (a DMV query on the DAC connection will show this queuing: “SELECT scheduler_id, work_queue_count FROM sys.dm_os_schedulers;”). Technically, those queued requests had not started. As worker threads became available, queries were dequeued and completed quickly. These weren’t included in the trace because the duration was under the 100ms duration filter. The duration reflected the time it took to actually run the query but didn’t include the time queued waiting for a worker thread.
The important point here is that duration is not end-to-end response time. Duration of RPC:Completed and SQL:BatchCompleted events doesn’t include time before a worker thread is assigned nor does it include the time required to return the last result buffer to the client. In other words, duration only includes time after the worker thread is assigned until the last buffer is filled. But be aware that duration does include the time need to return intermediate result set buffers back to the client, which is a factor when large query results are returned. Clients that are slow in consuming results sets can increase the duration value reported by the trace “completed” events.