The client's environment consisted of the following SQL Server instances and their associated databases:
Instances | AG1 | AG2 | AG3 | AG4 | AG5 | AG6 | AG7 | AG8 | Total |
Number of databases | 27 | 16 | 18 | 1 | 28 | 97 | 2 | 230 | 419 |
The cluster servers had the following specifications:
Please note that enabling hyperthreading inflates the logical processor count without providing more physical hardware resources. This can mislead SQL Server into believing that it has more CPU resources than it actually does. As a result, too many threads can be scheduled based on the inflated logical processor count, while the actual physical cores struggle to keep up.
The monitoring system repeatedly flagged an alert: “Free thread count is too low” on SQL Server instance AG8. It clearly indicated that the instance with 230 databases in AG was approaching its maximum capacity for worker threads that were essential for performing various tasks, including processing SQL queries.
Worker threads in AGs are critical for synchronizing data between the primary and secondary nodes. Here is how they work:
Additionally, an active AG reserves around 40 worker threads for its internal operations, which reduces the available pool for other tasks. These operations include managing transaction metadata, maintaining replica connections, and monitoring AG health status. In contrast, an idle AG uses zero worker threads. SQL Server will release a worker thread if it remains inactive for 15 seconds.
When analyzing worker threads in an AG, it is important to determine their maximum default capacity, the minimum required number, and their current usage
The following T-SQL script helps to monitor the current state of worker threads in SQL Server, including:
The default maximum worker threads on a server can be calculated using the following formula:
Max Worker Threads = 512 + ((Logical CPUs − 4) × 16)
In my client's case, the maximum default equals to 832.
512 + ((24 − 4) × 16) = 832
One way to verify this in SQL Server Management Studio (SSMS) is by running the following query:
After accounting for the 40 worker threads reserved by the AG the result would be 792:
832 −40 =792
The minimum number of worker threads required for an AG can be determined by the following formula:
Min Worker Threads AG = D × (LCWT + (LSWT × SRC) + MHWT)
Where:
For example, AG8 requires a minimum of 461 worker threads:
230 × (1 + (1 × 1) + 1) = 461
After subtracting the minimum required threads for AG8, only 331 threads remain available for all other operations:
792 – 461 = 331
While increasing the maximum worker threads might seem like a quick fix at first, it doesn't address the root cause. For instance, raising the maximum worker threads from 832 to 1832 in AG8 would simply add 1,000 more tasks to handle into the CPU's processing queue. Regardless of how much we increase the maximum worker threads, the capacity will always be constrained by the underlying hardware resources. First, we need to understand the SQL Server environment, the workload and assess the available hardware resources.
In a single-instance environment with adequate workloads, 331 threads might be sufficient. However, with 8 instances running on the same machine, resource contention becomes a critical concern. If other instances also consume significant numbers of threads, the overall performance can deteriorate rapidly.
It is important to consider the threads required for other operations such as query executions, backups, integrity checks, index maintenance etc. These activities can quickly exhaust the available worker threads, leading to connection timeouts in SSMS and performance issues. The error message my client encountered is a symptom of an overloaded system, due to the high number of databases in AG8, multiple instances and heavy workloads.
Although Microsoft does not impose a strict limit on the number of AGs or databases, it has only tested configurations with up to 10 AGs and 100 databases. Beyond these numbers, the system's hardware capacity and workload become the determining factors. In our case, with 8 AGs and 419 databases, the system was clearly under strain. We decided to take immediate action and implement the following temporary workaround as well as plan for a long-term solution.
Temporary workaround
Long-term solution
To address the issue of worker threads deficiency, it's essential to look beyond increasing the max worker thread setting. A thorough analysis of the workload, careful management of resources, and possibly even restructuring the whole SQL environment and the number of databases in each AG are necessary steps. Ultimately, the goal is to ensure that the system remains within its hardware limits while maintaining optimal performance.
I hope this was insightful 💚 Don't hesitate to reach out if you have any questions!