In today’s example we are going to see how to solve the following error using Count Distinct and Window Functions. It is an error that we have had with some queries and we want to share it with you. Remember that if you need additional help, do not hesitate to contact us .
The error starts from the following alert message.
Preparing the example
To reach the above conclusion and solve it, let’s first build a scenario. We are going to work with the Adventure Works DB ( https://docs.microsoft.com/es-es/sql/samples/adventureworks-install-configure?view=sql-server-ver15&tabs=ssms )
Let’s use the Product and SalesOrderDetail tables, both in the SalesLT schema. Each row of order details is part of an order and is related to a product included in the order. The product has a category and a color.
We will use the following query to start:
SELECT salesorderid , Count ( * ) AS ItemsPerOrder , Sum ( unitprice * orderqty ) AS Total FROM saleslt . product p INNER JOIN saleslt . salesorderdetail s ON p. productid = s . productivity GROUP BY salesorderid
This query returns the count of items in each order and the total value of the order.
Let’s add a few more calculations to the query, none of them challenging:
SELECT salesorderid , Count ( * ) AS ItemsPerOrder , Sum ( unitprice * orderqty ) AS Total , Count ( DISTINCT productcategoryid ) CategoriesPerOrder , Count ( DISTINCT color ) ColorPerOrder FROM saleslt . product p INNER JOIN saleslt . salesorderdetail s ON p. productid = s . productivity GROUP BY salesorderid
Identifying the problem of Count Distinct and Window Functions
Now, let’s imagine that, along with this information, we would also like to know how many different colors per category there are in this order.
The group of only has SalesOrderId. Because of that, our first natural conclusion is to try a window partition, like this:
SELECT SalesOrderID ,
Count ( * ) AS ItemsPerOrder ,
Sum ( unitPrice * OrderQty ) AS total ,
Count ( DISTINCT productcategoryid ) CategoriesPerOrder ,
Count ( DISTINCT of color ) ColorPerOrder ,
Count ( DISTINCT of color )
OVER (
partition BY productcategoryid ) ColorPerCategory
DE saleslt. product p
INNER JOIN saleslt . salesorderdetail s
ON p. productid = s . productivity
GROUP BY salesorderid
Our problem starts with this query. Count Distinct does not support window partitioning , we need to find a different way to achieve the same result
Solution Planning
We are counting the rows, so we can use DENSE_RANK to achieve the same result, extracting the last value last, we can use a MAX for that. This works similarly to the distinct count in that all matched, records with the same value, receive the same rank value, so the largest value will be the same as the distinct count.
There are two ranking functions: RANK and DENSE_RANK . The difference is how they treat ties.
RANK: After a tie, the tally skips the number of tied items, leaving a gap.
DENSE_RANK: No jump after a tie, count continues sequentially
The following query is an example of the difference:
SELECT ProductID , color , Range ( ) OVER ( ORDER BY color)[rango] , DENSE_RANK ( ) OVER ( ORDER BY color) [DENSE_RANK] DE saleslt . product WHERE color IS NOT NULL
SELECT salesorderid , Count ( * ) AS ItemsPerOrder , Sum ( unitprice * orderqty ) AS Total , Count ( DISTINCT productcategoryid ) CategoriesPerOrder , Count ( DISTINCT color ) ColorPerOrder , Dense_rank ( ) OVER ( partition BY productcategoryid ORDER BY color ) ColorPerCategory FROM saleslt . product p INNER JOIN saleslt . salesorderdetail s ON p. productid = s . productivity GROUP BY salesorderid
However, the result is not what we would expect:
Solving the Solution
The first step to solving the problem is to add more fields to the group. Of course this will affect the whole result, it will not be what we really expect. The query will be like this:
SELECT SalesOrderID , productcategoryid , Count ( * ) AS ItemsPerOrder , Sum ( unitPrice * OrderQty ) AS total , 1 Color Per Order , DENSE_RANK ( ) OVER ( partition BY SalesOrderID , productcategoryid ORDER BY color ) ColorPerCategory DE saleslt . product p INNER JOIN saleslt .salesorderdetail s ON p. productid = s . productivity GROUP BY salesorderid , productcategoryid , Colour
There are two interesting changes in the calculation:
- CategoriesPerOrder was removed because “group by” is a few levels below this calculation.
- ColorPerOrder is a fixed value, because we are grouping by color.
We need to do more calculations on the result of this query, the best solution for this is to use CTE – Common Table Expressions .
2nd consultation level
The 2nd level of calculations will be aggregating the data by ProductCategoryId , removing one of the aggregation levels.
Count Distinct and Window Functions
;WITH ranking AS (SELECT salesorderid, productcategoryid, Count(*) AS ItemsPerOrder, Sum(unitprice * orderqty) AS Total, Count(DISTINCT color) ColorPerOrder, Dense_rank() OVER ( partition BY salesorderid, productcategoryid ORDER BY color) ColorPerCategory FROM saleslt.product p INNER JOIN saleslt.salesorderdetail s ON p.productid = s.productid GROUP BY salesorderid, productcategoryid, Colour) SELECT salesorderid, Sum(itemsperorder) ItemsPerOrder, Sum(total) Total, 1 CategoriesPerOrder, Sum(colorperorder) ColorPerOrder, Max(colorpercategory) ColorPerCategory FROM ranking GROUP BY salesorderid, productcategoryid
The query calculations are defined by how the aggregations were performed in the first query:
- ItemsPerOrder: We do a SUM of the COUNT results, this will add the different counts.
- Total: A simple SUM over the SUM already done.
- CategoriesPerOrder – can have a fixed number of 1, since we are still adding by category
- ColorPerOrder: We make a SUM over the already existing COUNT of the previous query
- ColorPerCategory: After doing DENSE_RANK , we now need to extract the MAX value to have the same effect as COUNT DISTINCT
3rd Consultation Level
In the third step we reduce the aggregation, achieving our final result, the aggregation by SalesOrderID :
;WITH constant AS (SELECT salesorderid, productcategoryid, Count(*) AS ItemsPerOrder, Sum(unitprice * orderqty) AS Total, Count(DISTINCT color) ColorPerOrder, Dense_rank() OVER ( partition BY salesorderid, productcategoryid ORDER BY color) ColorPerCategory FROM saleslt.product p INNER JOIN saleslt.salesorderdetail s ON p.productid = s.productid GROUP BY salesorderid, productcategoryid, Colour), cte2 AS (SELECT salesorderid, Sum(itemsperorder) ItemsPerOrder, Sum(total) Total, 1 CategoriesPerOrder, Sum(colorperorder) ColorPerOrder, Max(colorpercategory) ColorPerCategory FROM account GROUP BY salesorderid, productcategoryid) SELECT salesorderid, Sum(itemsperorder) ItemsPerOrder, Sum(total) Total, Sum(categoriesperorder) CategoriesPerOrder, Sum(colorperorder) ColorPerOrder, Sum(colorpercategory) ColorPerCategory FROM account GROUP BY salesorderid
Once again, the calculations are based on the previous queries. Some of them are the same as the second query, adding more rows. However, there are some different calculations:
- CategoriesPerOrder: Converts to SUM to achieve the result we’d like.
- ColorPerCategory – Converts to SUM, adding all distinct count results from each category.
The execution plan
The execution plan generated by this query is not as bad as we might imagine. This query could benefit from additional indexes and improved JOIN, but other than that the plan looks pretty good.
The join is done using the ProductId field, so an index into the SalesOrderDetail table by ProductId and covering the additional fields used will help the query
We can create the index in the declaration:
CREATE INDEX inorderdetail
ON saleslt.salesorderdetail (productid)
include (orderqty, unitprice)
What is interesting to note in this query plan is the SORT, which now takes 50% of the query. This does not mean that the execution time of the SORT has changed, it means that the execution time for the entire query was reduced and the SORT became a higher percentage of the total execution time.
We hope you will be useful. If you don’t want to miss our tickets. Join our newsletter monthly, to be up to date with all our publications. You will only receive one email per month.
Greetings.