Clustered Hash Group By algorithm

In some cases, values in the grouping columns of the input table are clustered, so that similar values appear close together. For example, if a table contains a column that is always set to the current date, all rows with a single date are relatively close within the table. The Clustered Hash Group By algorithm exploits this clustering.

The optimizer may use Clustered Hash Group By when grouping tables that are significantly larger than the available memory. In particular, it is effective when the HAVING predicate returns only a small proportion of rows.

The Clustered Hash Group By algorithm can lead to significant wasted work on the part of the optimizer if it is chosen in an environment where data is updated concurrently with query execution. Clustered Hash Group By is therefore most appropriate for OLAP workloads characterized by occasional batch-style updates and read-based queries. Set the OPTIMIZATION_WORKLOAD option to OLAP to indicate to the optimizer that it should include the Clustered Hash Group By algorithm in the possibilities it investigates.

For more information, see .