The process of minimizing data redundancy uses a series of steps. It is called normal forms and is called normalization. The total normalization process includes 8 normal forms.
- First Normal Form (1NF)
- Second Normal Form (2NF)
- Third Normal Form (3NF)
- Boyce-Codd Normal Form (BCNF)
- Fourth Normal Form (4NF)
- Fifth Normal Form (5NF)
- Domain/Key Normal Form(DKNF)
- Sixth Normal Form(6NF)
First Normal Form (1NF)
A relation is said to be in the first normal form if and only if a primary key is defined for the relation and all non-key attributes are dependent on key attributes. The attributes that are part of the primary key are called key attributes and all the remaining attributes are called non-key attributes. As an example, observe the following relation that contains information regarding various projects and employees working in those projects.
To convert the preceding table to 1NF, a primary key must be defined for the table and all non-key attributes must be dependent on key attributes. In this table, it is not possible to define a primary key on a single column since every column can contain duplicate values. Hence we can define a primary key on the combination of PID and EMPID Columns. Now by giving the values of PID and EMPID columns, we can retrieve the values of every other column in the table. Hence the table is said to be in 1NF. Finally the table is represented as follows after defining a primary key on PID and EMPID columns.
(PID, EMPID) -> Pname, Ename, Job, Charge/Hour, Hours Worked
Second Normal Form (2NF)
A relation is said to be in 2NF if and only if the relation is in first normal form and no partial dependencies exist in the relation. If a primary key is defined on more than one attribute and a non-key attribute is completely dependent on only one key attribute then that dependency is called a partial dependency. The preceding table, after conversion to 1NF, contains a PID and EMPID combination as the primary key. But to get PNAME there is no need to provide the EMPID. Hence PNAME is partially dependent on key attributes. In the same way, to get ENAME, JOB, CHARGE/HOUR and HOURS WORKED, there is no need to provide the PID. Hence these attributes are also partially dependent on key attributes. The preceding relation is not in 2NF.
To convert the preceding relation to 2NF, we need to divide the table as follows.
(PID) |
Pname |
(EMPID) |
Ename, Job, Charge/Hour |
(PID, EMPID) |
Hours Worked |
Now all three relations are said to be in 2NF, since they doesn't contain partial dependencies.
Third Normal Form (3NF)
A relation is said to be in 3NF, if and only if the relation is in 2NF and no transient dependencies exist in the relation. If a non-key attribute depends on another non-key attribute, then that dependency is called a transient dependency. In the preceding three tables obtained by converting the table of 1NF to 2NF, the second table contains a CHARGE/HOUR column that was actually dependent on JOB but not on EMPID. To convert this table that was not in 3NF to 3NF, we need to divide the table as follows.
(PID) |
Pname |
(EMPID) |
Ename, Job |
(JOB) |
Charge/Hour |
(PID, EMPID) |
Hours Worked |
Now all the four tables are in 3NF, since they don't contain transient dependencies.
As the normalization steps proceed to higher normal forms, no tables in the database will increase. When no tables increase, the database performance will be reduced since we need to combine the data from more than one table. This is the reason why database designers are restricted to 3NF.
De-Normalization
Converting the tables from their current normal form to their previous normal form is called de-normalization.