Discussion:
While this can be slow to run as it usually has to read in the entire table, then sort it, then find duplicates, as an occasional run it is useful.
So please don't use it in a KPI! But do use it in a debugging, problem solving report.
Why is this hard for beginners?
If all you cared about was finding duplicates, you could enter
SELECT assetname
FROM asset
WHERE assetName like 'C%'
GROUP BY assetname
HAVING COUNT(*) > 1This will return you a set showing the duplicates. But it will only show you the value of the duplicate column(s), not any other columns, and usually you need to know more about each column to do anything useful with the information of the duplicates. Usually, but not always. If just knowing which values are duplicates, then the above is perfect and you can stop at easy.
But usually you want some context, some of the other values. The 1st 'natural' thought most people have is to try this:
-- note: This code sample does NOT work, see explaination
SELECT assetname, assetID
FROM asset
WHERE assetName like 'C%'
GROUP BY assetname
HAVING COUNT(*) > 1And SQL will return the error:
Which is pretty obvious what the problem is, but doesn't naturally lead to a solution.
Example of duplicates with context
Here's an example, looking for duplicate names in assets. It is perfectly acceptable to have duplicate names in assets, but sometimes it may cause problems.
After the example, there is a template that you can edit and use, or you can edit and use the example of course.
SELECT a.assetName, a.assetpk, a.assetid
FROM asset a
INNER JOIN (
SELECT assetname
FROM asset
GROUP BY assetname
HAVING COUNT(*) > 1
) d ON a.assetname = d.assetname
WHERE assetName like "CKX-%" -- so only show duplicates that start with CKX-
ORDER BY a.assetname;
Template
SELECT -- a.<column1>, a.<column2> etc..,
FROM <table> a
INNER JOIN (
SELECT == <column(s) to look for dups>
FROM <table>
GROUP BY -- <same column(s)
HAVING COUNT(*) > 1
) d ON a.<columnFordup> = d.<columnFordup> -- both should be the same since they are the same table
-- WHERE <condition>
ORDER BY a.assetname;