Prevent Duplicates in a Table
Here we can use a primary key or unique
Index on a table with appropriate fields to stop duplicate records.
Example : The following table contains no such index or primary key, so it
would allow duplicate records for first_name and last_name.
Syntax :
CREATE TABLE persons
(
first_name CHAR(20),
last_name CHAR(20),
sex CHAR(10)
);
Here we have to prevent multiple records with the same first and last name values from being
created in this table, add a primary key to its definition. When we do this,
it's also necessary to declare the indexed columns to be not null, because a primary key does not allow NULL values.
Syntax : CREATE TABLE persons
(
first_name varchar(20) NOT NULL,
last_name varchar(20) NOT NULL,
sex varchar(10),
PRIMARY KEY (last_name, first_name)
);
mysql> INSERT IGNORE INTO persons (last_name,
first_name)VALUES( 'arjun', 'singh');
mysql> INSERT IGNORE INTO persons (last_name, first_name)VALUES( 'arjun', 'singh');
mysql > select * from persons;
mysql> REPLACE INTO persons (last_name, first_name)VALUES( 'Anuj', 'Kumar');
mysql> REPLACE INTO persons (last_name, first_name)VALUES( 'Anuj', 'Kumar');
mysql> select * from persons ;
Counting and Identifying Duplicates
Synatx :CREATE TABLE students
(
last_name CHAR(20) NOT NULL,
first_name CHAR(20) NOT NULL,
street varchar(30) NOT NULL
);
mysql > select * from students;
mysql> SELECT COUNT(*) AS rows FROM students;
mysql> SELECT COUNT(DISTINCT last_name, first_name) AS 'distinct names' FROM
students;
Removing
Duplicates Using Table Replacement
One way to eliminate duplicates from a table is to select its unique
records into a new table that has the same structure. Then replace the original
table with the new one. If a row is considered to duplicate another only if the
entire row is the same, we can use SELECT DISTINCT to select the unique rows.
mysql> CREATE TABLE top SELECT DISTINCT * FROM students;
mysql> SELECT * FROM top ORDER BY last_name, first_name;
This method is used in the absence of an index (though it might be slow for large
tables), and for tables that contain duplicate NULL values, it will remove those
duplicates.
Note : This method considers the rows for Wallace Baxter that
have slightly different street values to be distinct.
If duplicates are defined only with respect to a subset of the columns in the
table, create a new table that has a unique index first, then select rows into
it using INSERT IGNORE.
Synatx :
CREATE TABLE top (
last_name CHAR(40) NOT NULL,
first_name CHAR(40) NOT NULL,
street CHAR(40) NOT NULL,
PRIMARY KEY (last_name, first_name));
mysql> INSERT IGNORE INTO top SELECT * FROM students;
mysql> SELECT * FROM top ORDER BY last_name, first_name;
The index prevents records with
duplicate key values from being inserted into 'top' and ignore tells MySQL not to
stop with an error if a duplicate is found. This method is
that if the indexed columns can contain NULL
values, we must use a unique index rather than a primary key in which
case the index will not remove duplicate null
keys.
After creating the new table 'top'
that contains unique rows, use it to replace the original students table. The
effective result is that students
no longer will contain duplicates:
mysql> DROP TABLE students;
mysql> ALTER TABLE top RENAME TO students;
Remove duplicates using only a MySQL query
I have a table with the following
columns and I want to remove duplicates on the url_addr
column using a MySQL query.
Then we can use the multiple-table DELETE
syntax as follows which will delete duplicate entries, leaving only the first url
based on url_id
Resources
Here are some useful related resources: