June 11, 2017

SELECT All HAVING Non-Distinct Composite Column with T-SQL

Disclaimer: I imported this post from WordPress to Jekyll on 9/2/23. Forgive the misformatting until I get a chance to clean it up.

Requirement

This week, the cross-functional team that is testing the application I have been building discovered a data issue where a table that should only contain one record for each ([Sku], [GroupCode]) pair had more than one in some cases. This table was not designed to have ([Sku], [GroupCode]) as a composite key (there is an [Id] identity column as a key), allowing this issue to occur.

Below is a sample of what the table [Prices] could have contained.

Id Sku GroupCode Price
1 1 1 1.00
2 1 1 2.00
3 1 2 1.00
4 2 1 1.00
5 2 1 2.00
6 2 2 1.00

To diagnose why this issue (such as records where [Id] IN (1,2)) was happening, I needed a query to select only the records that contained more than one ([Sku], [GroupCode]) composite.

A Solution

I did not have a lot of time to test my Google-Fu for this one, so I received assistance from a BI developer who sits across the hall from me. She suggested a query like below.

SELECT COUNT([Sku] + [GroupCode]) AS 'Count',
[Sku],
[GroupCode]
FROM [Prices]
GROUP BY [Sku], [GroupCode]
HAVING COUNT([Sku] + [GroupCode]) > 1

I had used the HAVING keyword before, but I had not used it extensively enough to think to apply it to this problem off the top of my head. The difference between HAVING and WHERE is quite subtle; but, for this situation, I think HAVING is required since I wanted to select duplicate composite columns after the GROUP BY has already been applied.

Furthermore, I was interested in treating [Sku] and [GroupCode] like a single composite column. So, concatenating the two columns fulfills this requirement. The query above, however, adds the two columns rather than concatenates because the columns are INTs. So, I cast the INTs to VARCHARs below.

SELECT COUNT(CAST([Sku] AS VARCHAR(20)) + CAST([GroupCode] AS VARCHAR(20))) AS 'Count',
[Sku],
[GroupCode]
FROM [Prices]
GROUP BY [Sku], [GroupCode]
HAVING COUNT(CAST([Sku] AS VARCHAR(20)) + CAST([GroupCode] AS VARCHAR(20))) > 1

The result set from the sample [Prices] table is below and is what I needed to diagnose further why the issue was occurring.

Count Sku GroupCode
2 1 1
2 2 1