This week, the cross-functional team that is testing the application I have been building discovered a data issue where a table that should only contain one record for each (
[GroupCode]) pair had more than one in some cases. This table was not designed to have (
[GroupCode]) as a composite key (there is an
[Id] identity column as a key), allowing this issue to occur.
Below is a sample of what the table
[Prices] could have contained.
To diagnose why this issue (such as records where
[Id] IN (1,2)) was happening, I needed a query to select only the records that contained more than one (
I did not have a lot of time to test my Google-Fu for this one, so I received assistance from a BI developer who sits across the hall from me. She suggested a query like below.
SELECT COUNT([Sku] + [GroupCode]) AS 'Count', [Sku], [GroupCode] FROM [Prices] GROUP BY [Sku], [GroupCode] HAVING COUNT([Sku] + [GroupCode]) > 1
I had used the
HAVING keyword before, but I had not used it extensively enough to think to apply it to this problem off the top of my head. The difference between
WHERE is quite subtle; but, for this situation, I think
HAVING is required since I wanted to select duplicate composite columns after the
GROUP BY has already been applied.
Furthermore, I was interested in treating
[GroupCode] like a single composite column. So, concatenating the two columns fulfills this requirement. The query above, however, adds the two columns rather than concatenates because the columns are
INTs. So, I cast the
SELECT COUNT(CAST([Sku] AS VARCHAR(20)) + CAST([GroupCode] AS VARCHAR(20))) AS 'Count', [Sku], [GroupCode] FROM [Prices] GROUP BY [Sku], [GroupCode] HAVING COUNT(CAST([Sku] AS VARCHAR(20)) + CAST([GroupCode] AS VARCHAR(20))) > 1
The result set from the sample
[Prices] table is below and is what I needed to diagnose further why the issue was occurring.