Theory Of Hashing And Hash Tables

Atul Kumar
Jun 03, 2016
7.3k
0
0
Article

In this article, you will learn what Hash Tables are, and why and where we use them. Hash Table is another kind of data structure to place data elements in a specific arrangement, which is easy to read and write.

To learn Hash Tables effectively, we need to discuss the following aspects of Hash Tables.

What is Hashing?
What are Hash Tables and why we use them?
Collision handling while using Hash Tables?
Where we Hash Tables?
Difference between Hash Tables, binary search tree and approach to choose between the two?

What is hashing?

Hash Tables work on the concept of Hashing. Hashing is a process of converting the value from a string space to integer space or an index value or a string, that has a length of fixed size. Hashing is performed by hash functions. Two common hash methods are folding method and cyclic shift, which gives you index for a given key, to be used in Hash Tables. Another category of hash methods are cryptographic hash functions like MD5 and SHA, which gives you fixed size string for a given text, which are used for the security purposes. We are not going to discuss cryptographic hash methods here.

What are Hash Tables and why we use them?

Hash Table is a collection of items in terms of arrays, having special type of an index. An ordinary collection or a table, we can access data quickly, if we know the index of an item because indexes are already sorted by nature.

In the process of hashing a table or collection, we first decide which field or a column is going to be the key field, which will act as an index of Hash Tables. We put that key in some hash function, which gives us an index, which is called as hash value or hash index. Therefore, we place our data at that particular hash index in the table. Each data item in collection is directly mapped to some index ,because we have key and key is an index itself, when a key is processed by hash function. The same thing happens, when we read data from the table.

Therefore, hash is equally efficiently, while reading or writing data.

hash table

Hash function is a mapping from the input space to the integer space that defines the indices of the array.

Problem: Now, it is possible that there is a same hash index in the process of assigning index slots for the several keys.

Example:

hf(k1) = 12
hf(k2) = 12

In other words, collision may happen. Then, how will you decide, where to store the data value for that particular key.

So, here comes the need of collision handling.

Collision handling

There are different strategies available which resolve this issue. Three main strategies are:

Chaining
Linear Probing - Open addressing technique
Double Hashing - Open addressing technique

Chaining

Chaining technique replaces an array index slot with the linked list, if more than one value points to the same index. Here, is a graphical illustration.

Here, hash function(hf) is generating same index (index 3) for the two values. Therefore, both these keys will point to index 3, which will be replaced by linked list and index three will point to the head of linked list. The same algorithm is applied, when we retrieve the values from Hash Tables. This technique is used in managed languages like Hash Tables and hashmap in Java.

Open addressing schemes: This technique is used when there are more buckets (index slots), available than the actual values. Two techniques used in this scheme are as follows:

Linear Probe

Here, if the hash function is generating the same hash index value, then, it will look for another empty slot available to store the data.

Example:

if hf (kiran) = 4 is occupied then it will look for hf (komal) + 1 = 5

It will keep repeating this, until it can found in the next available empty index.

Linear Probe

Double Hashing

In this, two different hashing functions are used. If the first hash function hf (key) is occupied, then the second hash function hf1 (key) is used to increment the result of first hash function. E.g.

hf (key) = 4 occupied then
hf (key) + hf1 (key) = 8 will be used.

Open addressing scheme is relatively slow, because it would have to iterate through lot of buckets until it finds the empty one.

Where do we use Hash Tables?

We use Hash Tables to store arrays; having key value pairs in programming languages such as Dictionaries, Hash Tables in C# and Hash Tables and Hashmap in Java. Using this, you can quickly search an item with a given key, without scanning whole collection of Hash Tables array.
One can use Hash Tables in the temporary tables in a relational database system like SQL Server.

What is the difference between Hash Tables and binary search tree and how do we choose between the two?

Hash Table is a linear and unordered data structure whereas binary search tree is nonlinear and sorted. Lookups and insertion in the binary tree is relatively slow, compared to Hash Tables. Use Hash Tables if you are using large unordered collections of data. Use binary search tree, if you have small collection of ordered data, which will be faster.

In other cases, if you have small collection which is already sorted, there will not be extra memory consumption to sort and store sorted data in the memory, in case we use binary search.