How to Handle Duplicate Records in MySQL

Arjun Panwar
14y
22.8k
0
0

Article

Prevent Duplicates in a Table

Here we can use a primary key or unique Index on a table with appropriate fields to stop duplicate records.

Example : The following table contains no such index or primary key, so it would allow duplicate records for first_name and last_name.

Syntax :

CREATE TABLE persons
(
first_name CHAR(20),
last_name CHAR(20),
sex CHAR(10)
);

Here we have to prevent multiple records with the same first and last name values from being created in this table, add a primary key to its definition. When we do this, it's also necessary to declare the indexed columns to be not null, because a primary key does not allow NULL values.

Syntax : CREATE TABLE persons
(
first_name varchar(20) NOT NULL,
last_name varchar(20) NOT NULL,
sex varchar(10),
PRIMARY KEY (last_name, first_name)
);

mysql> INSERT IGNORE INTO persons (last_name, first_name)VALUES( 'arjun', 'singh');

mysql> INSERT IGNORE INTO persons (last_name, first_name)VALUES( 'arjun', 'singh');

mysql > select * from persons;

img 1.gif

mysql> REPLACE INTO persons (last_name, first_name)VALUES( 'Anuj', 'Kumar');

mysql> REPLACE INTO persons (last_name, first_name)VALUES( 'Anuj', 'Kumar');

mysql> select * from persons ;

img 2.gif

Counting and Identifying Duplicates

Synatx :CREATE TABLE students
(
last_name CHAR(20) NOT NULL,
first_name CHAR(20) NOT NULL,
street varchar(30) NOT NULL
);
mysql > select * from students;

mysql> SELECT COUNT(*) AS rows FROM students;

mysql> SELECT COUNT(DISTINCT last_name, first_name) AS 'distinct names' FROM students;

coun distinct.gif

Removing Duplicates Using Table Replacement

One way to eliminate duplicates from a table is to select its unique records into a new table that has the same structure. Then replace the original table with the new one. If a row is considered to duplicate another only if the entire row is the same, we can use SELECT DISTINCT to select the unique rows.

mysql> CREATE TABLE top SELECT DISTINCT * FROM students;

mysql> SELECT * FROM top ORDER BY last_name, first_name;

new 1.gif

This method is used in the absence of an index (though it might be slow for large tables), and for tables that contain duplicate NULL values, it will remove those duplicates.

Note : This method considers the rows for Wallace Baxter that have slightly different street values to be distinct.

If duplicates are defined only with respect to a subset of the columns in the table, create a new table that has a unique index first, then select rows into it using INSERT IGNORE.

Synatx :

CREATE TABLE top (
last_name CHAR(40) NOT NULL,
first_name CHAR(40) NOT NULL,
street CHAR(40) NOT NULL,
PRIMARY KEY (last_name, first_name));

mysql> INSERT IGNORE INTO top SELECT * FROM students;

mysql> SELECT * FROM top ORDER BY last_name, first_name;

new 2.gif

The index prevents records with duplicate key values from being inserted into 'top' and ignore tells MySQL not to stop with an error if a duplicate is found. This method is that if the indexed columns can contain NULL values, we must use a unique index rather than a primary key in which case the index will not remove duplicate null keys.

After creating the new table 'top' that contains unique rows, use it to replace the original students table. The effective result is that students no longer will contain duplicates:

mysql> DROP TABLE students;
mysql> ALTER TABLE top RENAME TO students;

Remove duplicates using only a MySQL query

I have a table with the following columns and I want to remove duplicates on the url_addr column using a MySQL query.

only query.gif

Then we can use the multiple-table DELETE syntax as follows which will delete duplicate entries, leaving only the first url based on url_id

only query 2.gif

Resources

Here are some useful related resources: