Why do we study data structure?
In real life, when we have to deal with multiple things, objects, or items we want them to be placed or arranged in a good manner. Nobody ever likes things scattered all around or placed randomly, especially when objects are of a different kind or not of the same type. And yes, objects are not always the same in real life. They can be of any type, size or any quantity. We arrange those things in such a way that it is easier to place and find something we want. So the same terminology is being followed in the study of data structure. In electronics computing systems, we refer to these things as data objects. Here is the difference between data structure and no data structure applied.
Data is referred to as a collection of multiple items and data structure is how we store and access those multiple data items in different well-known formats which are termed as different data structure types. In other words, data structure is the arrangement or placement of multiple data items in electronic memory space. There are different arrangements or ways we can store and access data depending upon the requirement which ensures that we can write and access data efficiently and in a timely manner.
Data structure is a study which comes into the picture when we have to deal with collections of multiple items in terms of different data types such as numbers or strings at large. Data structure study is incomplete without taking the following things into consideration. These are:
- What is data structure?
- What are different data types?
- What is abstraction?
- What are different types of data structure?
- What is the relationship between database and data structure?
- What data structures do databases use?
What data structure is?
Data Structure refers to the actual implementation of the data type and offers a way of storing data in an efficient manner. Data structure also maintains the relationship between different data elements.
What are different data types?
Multiple data items or groups of data items in data structures can be of different types which we call data types.
Different data types are:
- Primitive Data Types - These are the basic data types which are built into any computing system. These are integer, float, string, boolean etc.
- Composite Data Types - These are compositions of different types of data like struct, union, arrays which we study in C or C++ programming language and record in database.
- Abstract Data Types - To study this we need to know what an abstraction is.
What is Abstraction?
In simple words, abstraction is just to know what the system or entity needs to do without knowing how to do it. What to do is an abstraction which is accomplished by writing how to do it. What to do can be done by multiple what to dos.
For instance, a user class has a Login method which we can refer to as what to do. That Login method can be written using different databases, languages or business logics which we can refer as how to do.
So abstraction is just operations which can be performed on certain data or collections of multiple data items.
Abstract data types are:
List, Stack, Queue, Set, Tree, Graph etc. There are much more. We can perform insert, remove, push, pop operations on these data types.
As stated above What to do can be done by multiple what to dos. So we can implement these data types using different implementations of composite data types such as arrays, structs, and unions.
Different types of data structures?
As there are multiple ways we can arrange or place our multiple data elements. These ways or arrangements can be categorized in two terms which are,
- Linear Data Structure
- Non Linear Data Structure
Linear data structure
It is where you can reach or traverse multiple data items sequentially. If you want to reach a certain data element you have to traverse elements one by one. Only one data element can be accessed at once. So these are arrays, linked lists, hash tables, stacks, queues etc.
We can think of above linear structures as following in a simplified manner to easily remember. You can think of above boxes as a person or object.
Array is multiple persons standing beside each other facing towards one direction. A person can reach to any empty slot and stand there or come out from there randomly without affecting others.
Queue is multiple persons standing behind each other in one direction surrounded by left and right walls with no space to pass one another. A person can stand only at the end of a queue and come out only at the start of queue. A person who has entered last will come out last. This is LILO (last in last out).
Stack is a collection of books stacked in a box which can be opened from one side only. One book is placed on another. One can reach to a desired book only after taking out all books placed on the top of desired book.
Linked list is something like a social network of people in which one person knows other persons. If A person knows B and B knows C but not vice-versa then it a singly linked list. If A knows person B and B knows about A, B person knows C and C knows B then it will become doubly linked list.
We generally know about stack, queue, and array data structures but hash tables are quite tricky. So I am going to explain the concept of hash table.
Hashtable
Hashtable is a collection of items in terms of arrays having special type of index. In an ordinary collection or table we can access data quickly if we know the index of an item because indexes are already sorted by nature.
In the process of hashing a table or collection, we first decide which field or column is going to be the key field which will act as an index of a hashtable. Then we put that key in some hash function which gives us an index which is called as hash value or hash index. So we place our data at that particular hash index in the table. Each data item in the collection is directly mapped to some index because we have key and key is index itself when key is processed by hash function. The same thing happens when we read data from table.
So hash is equally efficient while reading or writing data.
Hash function is a mapping from the input space to the integer space that defines the indices of the array.
Non Linear data structure
We can think of it as non-sequential arrangement where one data element can be attached to other multiple data items. So these are trees, graphs etc.
Tree data structure can be further divided into following categories,
- Binary Tree
- Binary Search Tree
- Heap Tree (Min-Tree and Max-Tree)
- B-Tree
Tree data structure consists of multiple nodes. A node is an entity which consists of value and pointer to its child nodes. Every parent node knows where its children are.
Binary Tree
One data element (parent element) can have only two nodes I.e left node and right node.
Binary Search Tree: Special form of binary tree in which left node is less than or equal to parent data node and right node is greater than or equal to parent data node. This rule follows to the subtree of child nodes as well. Thisis meant to say that all nodes in the left subtree should be less than or equal to parent data node and all nodes to the right of parent data node should be greater than or equal to. These are for faster data access.
What is the difference between Binary search and Binary search tree?
Binary search is a way we break our sorted or ordered data and traverse through it to find a data element and Binary search tree is the stored structured data in memory in the form of nodes. We can also apply binary search on ordered arrays to search an element.
Binary Heap Tree
Heap tree is where root or parent key is compared with its children in the following ways.
Every node in min heap will be greater than or equal to its left and right child nodes.
Every node in max heap will be less than or equal to its left and right child nodes.
B-Tree
B-tree consists of parent node which has more than two child elements as compared to binary tree in which parent node has only two children. This is mainly used in relational database systems.
Graphs
Graph adds another level of complexity in tree data structure. Node can be connected to any other node to create different relationships. In this nodes are called vertices and connection between vertices are edges.
- A, B, C, D, E are vertices.
- AB, AD, DC and BC are double headed edges (bidirectional).
- AE and CE are single headed edges.
Graph is a network of objects connected with each other and every single connection forms a path.
Graph structure is used in networks, maps, social networking connections and algorithms to find the shortest distance between two objects and is the most common application of graph.
What is the relationship between database and data structure?
To save multiple data items we generally normalize data and store in database in form of rows and columns in table which is called relational database. These are SQL, MySQL, Oracle and others. What we see in our tables are arrays of records and we can easily think that database uses array or list data structure to store items but this is not true. What we see is the only representation of actual data stored in hard drive or memory by our database.
Database is nothing but a tool which stores our data items in different data structures, say lists or trees. Database tools takes an abstraction of multiple underlying data structures to store and read our data. Database uses many data structures depending upon the problem we want to address. For example when we use clustered indexing in table, database saves our actual data in B-Tree data structure.
What data structures does a database use?
B-Tree for indexing.
Hash tables for storing metadata; I.e., information regarding the tables, views, and stored procedures.
Trees for execution plans i the form of different types of nodes.
Some other data structures are used for parsing, query optimization, query execution, concurrency.
So everything explained in this article is the tip of the iceberg of data structure. The idea to explain is to understand the things in the big picture and find out how are we applying and using data structure terminology in our daily life and understand essential components.