An index is a data structure that enables data to be found quickly. Indexes are special lookup tables that the database search engine can use to speed up data retrieval. Simply put, an index is a pointer to data in a table. An index in a database is very similar to an index in the back of a book. We can create an index on one or more columns (maximum 16 columns).
We can compare the index of a table with an index of a book. An index contains a sorted list of info about the contents of a book. When we need to search for a topic then we go through the index and search in the index info. Such that the index saves time and provides quick results.
Mainly an index increases the speed of data retrieval but it also increases the overhead of the data modification, like Insert, delete and update of the data. Now we consider some important points about indexes.
Why do we need an Index in a table?
Suppose we have a table of records of employees that contain a million records and we want to search for a specific id of an employee. A normal search system will search for the employee id in each row. This process is called “Table Scan”. A table without a clustered-index is called a “heap table”. This process can require a large amount of time if the records are at the end of the table.
For overcoming this problem we can also use indexes. An index arranges the data of the table in a sorted manner in the form of a B-Tree (Hash Tables and R-Trees are also used).
Types of Indexes
There are the following two main index types:
- Clustered Index
- Non-Clustered Index
First we create a table as in the following:
- CREATE TABLE [dbo].[Employee_Detail](
- [Emp_IId] [int] NOT NULL,
- [Emp_Name] [nvarchar](50) NOT NULL,
- [Emp_Age] [int] NOT NULL,
- [Emp_Salary] [int] NOT NULL
- ) ON [PRIMARY]
-
- GO
Now insert some values into the table as in the following:
- Insert into Employee_Detail Values(100,'Pankaj',20,25000)
- Insert into Employee_Detail Values(101,'Rahul',19,28000)
- Insert into Employee_Detail Values(102,'Sanjeev',21,32000)
- Insert into Employee_Detail Values(103,'Divyanshu',22,26000)
- Insert into Employee_Detail Values(104,'Omi',26,27000)
- Insert into Employee_Detail Values(105,'Sandeep',28,24000)
- Insert into Employee_Detail Values(106,'Neeraj',24,35000)
- Insert into Employee_Detail Values(107,'Narendra',29,45000)
Now the table will look as in the following.
We will now consider both types of indexes.
Clustered Index
A Clustered Index is a special type of index that reorders the way records in the table are physically stored. Therefore the table can have only one Clustered Index and this is usually made on the primary key. The leaf nodes of a Clustered Index contains the data pages. It is like a dictionary, where all words are sorted in alphabetical order in the entire book. Since it alters the physical storage of the table, only one Clustered Index can be created per table.
We can create a Clustered Index using one of the following two ways.
When we create a Primary Key on a table then the system creates a Clustered Index automatically.
- Alter table Employee_Detail
- Add Constraint Primarykey_Constraint Primary Key(Emp_IId)
- Using "Create Clustered Index" Query
We can also create an index using a "Create Clustered Index" query . The following is a sample.
- Create Clustered Index My_ClusteredIndex
- on Employee_Detail(Emp_IId)
As we said above, that the system creates a B-Tree when we create a Clustered Index. The system also creates a B-Tree. We will now examine how a Clustered Index works.
If we create a B-tree for our table then it will look like this:
Image Reference Google
A B-Tree contains one head node. In our case node 103 is our head node. The left side value of each node is always less than the node and the right side value is always greater than the node.
Let us examine some examples.
- select * from employee_Detail where Emp_Iid=103
In a normal case the system will perform four comparisons, the first for 100, the second for 101 and the third for 103 and in the fourth comparison it will find the desired result.
Using an index it only does a single comparison because 103 is the head node of the B-Tree.
- select * from employee_Detail where Emp_Iid=106
In a normal case the system will do 7 comparisons from 100 to 105 then in the 7th comparison it will find the desired result.
Using an index it only does 2 comparisons. First we look for the root node value of the root node which is 103 so we move to the next node towards the right side then we find the desired result.
After considering the preceding examples we can say that using an index we can increase the speed of data retrieval.
Let us consider one more example.
- select * from employee_Detail where Emp_Iid=100
In a normal case the system will do a single comparison because it will find 100 at the beginning of the table.
Using an index the system first checks the root node and it will find the value of the root node is equal to 103. It will then go to the left side of the root node. The value of the next node is 101. That is greater than 100 so the system again moves towards the left side. Now the system finds desired result.
In this case the index takes a little bit more time.
Conclusion
Clustered Indexes are useful when the amount of data is large in the table. When the amount of data is small then it not useful since it will take a large amount of time do a non-Indexed query.
Image Reference Google
In the preceding example we can see that the leaf node of the B-Tree is combination of Clustering Key and rows of the data. So in the case of a Clustered Index each node contains a key and row data. When any type searching query is done then the system will compare the key value of the node. If it finds the desired key value then it will return the row data of that node.
Advantages of Clustered Indexes
- Quickly returns a large range of data
- Fast for presorted results
- Physical ordering of the data of the column
- A Clustered Index can improve the performance of data retrieval
Disadvantages of Clustered Indexes
- Frequently updated data increases the overhead of rearranging the B-tree.
- Only one Clustered Index per table.
- If the size of the table is small then it is not effective.
Non-Clustered Index
A Non-Clustered Index, on the other hand, does not alter the way the rows are stored in the table. It creates a completely different object within the table that contains the column(s) selected for indexing and a pointer back to the table's rows containing the data. A Non-Clustered Index is a special type of index in which the logical order of the index does not match the physical stored order of the rows on disk. The leaf node of a Non-Clustered Index does not consist of the data pages. Instead, the leaf nodes contain index rows.
A Non-Clustered Index also uses a B-Tree. Each node of the B-Tree contains a key value and a pointer. The pointer holds the reference of the table row. This means that with a Non-Clustered Index there is extra work required to follow that pointer to the row in the table to retrieve any other desired values.
A table contains one or more than one Non-Clustered Indexes.
The following describes the creation of a non-clustered and non-unique index for a single column.
- Create Nonclustered Index NonClusteredIndex
- on Employee_Detail(Emp_Name)
The following describes the creation of a non-clustered and non-unique index for multiple columns.
- Create Nonclustered Index NonClusteredIndex
- on Employee_Detail(Emp_Name,Emp_Age)
The following describes the creation of a non-clustered and unique index for a single column.
- Create unique Nonclustered Index NonClusteredIndex
- on Employee_Detail(Emp_Name)
The following describes the creation of a non-clustered and unique index for multiple columns.
- Create unique Nonclustered Index NonClusteredIndex
- on Employee_Detail(Emp_Name,Emp_Age)
A non-unique Non-Clustered Index allows us to insert duplicate value into an indexed column but a unique Non-Clustered Index doesn't allow us to enter duplicate values into a column.
The following are some important points about Non-Clustered Indexes:
- Clustered Indexes are only the logical order of the index. In other words, a Clustered Index contains a key and the value of the row but a Non-Clustered Index only contains a key and a pointer to the row.
- If a Clustered Index exists on the table, the Non-Clustered Index instead of Row_Id as the key is used for the Clustered Index's key as a row locator.
- It is only useful when the column contains repeated data.
- For indexes covering multiple columns, the order of the columns in the index is important. The best practice is to use the column with the lowest cardinality first and the column with the highest cardinality last. Recall that cardinality means the number of distinct values for that column.
Difference B/w Clustered and Non-Clustered Index
- A Clustered Index is a physical ordering of the data but a Non-Clustered Index is a logical ordering of the data (more like a pointer to the data).
- A table has only one Clustered Index but a table can have one or more Non-Clustered Indexes.
- Clustered Indexes are used when the data is unique but a Non-Clustered Index can work with clustered and Non-Clustered Indexes.
- Data in a Clustered Index is always sorted but for Non-Clustered Indexes that is not necessary. We can see it in the following example.
Image Reference GoogleLet us see an important scenario.
What will be happen if we already have a Primary Key for the table and we add a new Clustered Index to the table?
Answer:
In the preceding example we can see that we have a primary key in the table. Now if we add a new Clustered Index to the table then it will throw an error.
According to this error we cannot create more than one Clustered Index on a table.
What will happen if we already have a Clustered Index Key in the table and we add a Primary Key to the table?
Answer:
The preceding image shows that we have a Clustered Index in the table. Now we add a primary key.
- Alter table Employee_Detail
- add Constraint Priamy_Key primary Key(Emp_Name)
OutputAccording to the preceding image if we already have a Clustered Index in the table and we are adding a Primary Key to the table then this Primary Key will generate a Non-Clustered Index.