MySQL: Find Duplicate Entries in a Table

Here is a little background for this issue. The database used to run on ancient copy of MySQL until very recently we upgraded it to 5.0. Not without some headaches of migrating the database, but it worked. Common problem we had with this database was duplicate entries in printed reports. Why did these happen? Mostly because of user error combined with a lack of foreign key constraints.

The two MyIsam tables that were causing the issue were: report and proforma. Both were used for tracking documents through the review process. Report was the actual field audit report that would be submitted, reviewed and sent to the clients, while proforma was the documentation of expenses that our employees were supposed to submit for each assignment. Both would store dates, comments, notes and other info regarding said documents. The proforma had a 1-1 relationship with report (each proforma was tied to a single report entry). But since the old MySQL version did not support Foreign Key constraints this was not enforced (or rather only enforced by the PHP front end).

However users would continuously find ways to submit multiple proformas for the same report – for example by clicking the “submit” button 7 times. Other fun trick for re-submitting proforma was to send it in as TBA (to be annouced). Upon seeing the TBA the clerical staff in the office would manually associate it with appropriate report, usually without checking if another copy is already in the database. And when we locked that down, they would just continuously send bug reports about it forcing us to sort this out at the DB level.

What happened if the non-existent FK constraint was violated? One of the main reporting sections of the site used a complex join across these two tables. If there were two or more proformas per report, that report would show up on the list multiple times. What do people do when they see duplicates on the list? They start deleting them. Problem is – these were not real duplicates, but the same entry repeated several times. Deleting one copy would hose all of them, causing yet more support request for restoring the entry from the nightly DB dump.

Fun times. So after we switched to a DB engine that was actually developed in this century I immediately switched the tables to InnoDB and put a freaking foreign key constraint on that relationship. This way if someone finds a new loophole in the PHP code they will end up seeing a nice MySQL error instead of creating new duplicates.

But guess what – I just got more bug reports about duplicate entries. Apparently the FK constraint is not retroactive, and MySQL will gleefully allow duplicate keys to exist in your FK column when you apply the constraint. I figured that it would start trashing about and keep giving me errors if that was the case, but no – all it cared about was an index on the key column. So I was left with a task to track down all the duplicate entries in the foreign key column in my proforma table. How do I do this? This problem actually got me scratching my head for a bit, until I had a sudden epiphany:

SELECT 
	id, report_id, count(*)
FROM
	proforma
HAVING
	count(*)>1

I found around 17 separate instances of the foreign key (here report_id) being repeated anywhere from 2 to 8 times. All of these were for older entries that no longer show up on the first page of results, so naturally no one noticed. Still, it shows that the problem was more prevalent than we initially suspected.

What I’m really trying to say here is this: use database constraints to enforce table relationships. This is the only proper and effective way to do it. Trying to enforce constraints in software is just asking for trouble as you create multiple points of failure. Different parts of the code will update any given table at different times, and all of them must check this. Failing to implement proper checks in every piece of code that touches that table may lead to duplicate entries in FK columns and similar undesired side effects. A simple constraint on a column will stop this from happening much more effectively, with much less effort.

[tags]mysql, sql, database, database design, foreign key, key, constraint, php[/tags]

4 Responses to MySQL: Find Duplicate Entries in a Table

Ricardo says:

September 19, 2007 at 9:52 pm

Isn’t it also the case of having multiple of the same proforma with different ids?

In that case, you would need to identify columns that would have the value and compare them, like this:

SELECT
id, report_id, title, date
FROM
proforma p1, proforma p2
where
p1.title = p2.title
and p1.date = p2.date
and p1.report_id = p2.report_id
and p1.id p2.id

Since you said this is a 1-1 relationship, there shouldn’t be no results from this query.

Reply | Quote
Luke Maciak says:

September 20, 2007 at 10:44 pm

Actually, this is a whole separate problem since all proformas have unique filenames associated with them. So checking for duplicate entries submitted under wrong report ID is always a wild card search.

And this is actually one of the problems that is a 100% end user problem, and can be easily fixed by the office staff without involving the IT.

Reply | Quote
Nadab says:

July 22, 2008 at 8:08 am

hi all, Few days back i was looking for mysql administrative tool. i came across sqlyog which has super cool features like data sync, migration from various like access, sql server with much more powerful tools. I was very impressed with the GUI and also their multiple databases connections. Just have a look into it !!!

Reply | Quote

Derek

says:

March 4, 2009 at 2:14 am

I think the code should be:

SELECT 
	id, report_id, count(*)
FROM
	proforma
GROUP BY
	id, report_id
HAVING
	count(*)>1

MySQL: Find Duplicate Entries in a Table

4 Responses to MySQL: Find Duplicate Entries in a Table

Leave a Reply Cancel reply

Subscribe

Social

Categories

Archives

Interesting Blogs

My Software Projects

People I Know

Web Comics I Read

Meta

MySQL: Find Duplicate Entries in a Table

4 Responses to MySQL: Find Duplicate Entries in a Table

Leave a Reply Cancel reply

Subscribe

Social

Categories

Tags

Archives

Interesting Blogs

My Software Projects

People I Know

Web Comics I Read

Meta