What are JOIN Operators
 
 A join operator is a type of an algorithm that is used by the SQL Server  Optimizer chooses in order to implement logical joins between two sets of data.  Selection of algorithm different scenarios like the requested query, available  indexes, statistics and number of estimated rows in each data set. Today, we read  about available JOIN operator types in SQL Server (Nested Loops, Hash and Merge  Joins), we also read about their complexity.
 
 Firstly, we create two table and after that we applied join operator on both  tables.
 
 Student_Information Table
 
 ![Student Information]()
 
 Branch_Info Table
 
 ![Table]()
 
 Nested Loops
 
 Nested loops is the simplest operator that works on bunch of data. The nested  loops join, also called nested iteration, uses one join input as the outer input  table (shown as the top input in the graphical execution plan) and one as the  inner (bottom) input table. The outer loop consumes the outer input table row by  row. The inner loop, executed for each outer row, searches for matching rows in  the inner input table.
 
 Example
 
- SELECT * FROM dbo.Branch_InfoAS BRA   
- INNERJOIN  
- dbo.Student_Information AS STU  
- ON  
- BRA.Branch_Id=STU.Branch_Id  
Resulting execution plan for above query:  
![Resulting Execution]() 
  In above execution plan the operator on the top right is called the outer  input (Student_Information) and the one just below it is called the inner  input (Branch_Info). The outer input maintained a “Index Scan” label and inner  input maintained an “
Index Seek”.  
Index Scan
  Since a scan touches every row in the table whether or not it qualifies, the  cost is proportional to the total number of rows in the table. Thus, a scan is  an efficient strategy if the table is small or if most of the rows qualify for  the predicate.  
![Index Scan]() 
  In above index scan we can see that outer input only executed once and generated  12 rows.  
 Index Seek
  Since a seek only touches rows that qualify and pages that contain these  qualifying rows, the cost is proportional to the number of qualifying rows and  pages rather than to the total number of rows in the table.  
![Index Seek]() 
  In above index seek we can see that number of execution is 12 that verify the inner input executed for each row from outer input. 
 Conclusion is that clustered index scan we see as the outer input is executed  once to get all the relevant records, and the Non-Clustered index seek we see  below is executed for each record from the outer input. We can see that  execution plan contain two nested loops one for outer and inner input. Second  nested loop is maintained for RID Lookup.  
What is RID Lookup  ![RID Lookup]() 
  RID Lookup is a bookmark lookup on a heap using a supplied row identifier (RID).  The Argument column contains the bookmark label used to look up the row in the  table and the name of the table in which the row is looked up. A RID Lookup is a  lookup into a heap table using a Row ID. The Row ID is included in a  non-clustered index in order to find the rest of a table's data in the heap  table, since a heap table is a table without a clustered index and is sorted  unordered a Row ID is required for the correlation.   
Complexity of Nested Loop
  Suppose “
N” is number of rows from outer input and “
M” is total  number of rows in inner Input, then complexity will be 
O(Nlog(M)).  
When to use
  Used usually when one table is significantly small. If one join input is small  (fewer than 10 rows) and the other join input is fairly large and indexed on its  join columns, an index nested loops join is the fastest join operation because  they require the least I/O and the fewest comparisons.  
MERGE Join
  The Merge Join provides an output that is generated by joining two sorted data  sets using a FULL, LEFT, or INNER join. Merge Join requires that both inputs be  sorted and that the joined columns have matching meta-data. It means we can’t  join a column that has a numeric data type with a column that has character data  type. We both column are string type then length of the column in the second  input must be less than or equal to the length of the column in the first input  with which it is merged. The merge join operation may be either a regular or a  many-to-many operation. A many-to-many merge join uses a temporary table to  store rows. If there are duplicate values from each input, one of the inputs  will have to rewind to the start of the duplicates as each duplicate from the  other input is processed.  
Example: - SELECT * FROM dbo.Branch_Info AS BRA   
- INNERJOIN  
- dbo.Student_InformationAS STU  
- ON  
- BRA.Branch_Id=STU.Branch_Id  
- OPTION (MERGEJOIN)  
![Execution Plan]() 
  ![Merge Join]() 
  We can see that execution plan for Merge Join is completely different from the  execution plan of Nested Loop. The following operations are performed in above  execution plan.
 
  	- Nested loop is performed, in which Branch_Info work as outer input and  	“RID Lookup” work as inner input for Nested loop.
- Sort operation is performed for “Student_Information” table.
- Merge JOIN is performed on the data that is obtained from the step1 and  	step2.
Advantages of Merge Join Over Nested Loop:
  	- Both input operators in Merge Join are executed only once.
 
 
- The Merge Join simultaneously reads a row from each input and compares  	them using the join key. If there’s a match found then they are returned  	else the row with the smaller value can be discarded because, since both  	inputs are sorted, the discarded row will not match any other row on the  	other set of data.
Complexity of Merge Join
 
 The maximum cost of a Merge Join is the sum of both inputs, so complexity of  merge join can be defined as O(N+M).
 
 When to use
 
 If the two join inputs are not small but are sorted on their join column (for  example, if they were obtained by scanning sorted indexes), a merge join is the  fastest join operation. Merge join is excellent for very large tables.
 
 HASH Match
 
 The Hash Match physical operator builds a hash table by computing a hash value  for each row from its build input. It’s the one operator chosen when the  scenario doesn’t favor in any of the other join types or tables are not properly  sorted, and/or there are no indexes.
 
 Example:
 
- SELECT * FROM dbo.Branch_Info AS BRA   
- INNERJOIN  
- dbo.Student_Information AS STU  
- ON  
- BRA.Branch_Id=STU.Branch_Id  
- OPTION (HASHJOIN)  
 
 ![Built Input]() 
  Hash join use the first (top) input to build the hash table and the second  (bottom) input to probe the hash table.  
![Hash Match]() 
  Hash join use two important terms “
Hashing Function” and “
Hash table”.  
Hashing Function
  Hashing function is a function that takes one or more values and converts them  to a single symbolic value usually in numeric. If we provide the same value as  input to has function we will always get the same symbolic value as output. Hash  function is one way function that means we can convert back symbolic value to  original value.  
Hash Table
  Hash table is a data structure that divides all rows into equal-sized buckets,  where each “
bucket” is represented by a hash value. 
 For the distinct or aggregate operators, use the input to build the hash table  (removing duplicates and computing any aggregate expressions). When the hash  table is built, scan the table and output all entries.  
How HASH Match Work  SQL SERVER first choose the smaller table as built input and create a hash table  in memory. If there is not enough memory for the hash table, SQL Server will use  physical disk space in TEMPDB. After the hash table is created, now SQL Server will get the data from probe input and compare it to hash table using the hash  function and return the matched rows.  
Complexity of HASH Match
  Complexity for HASH Match is defined as: 
O(N*hc+M*hm+J)
  Where:  
N: Smaller data set 
M: Larger data set 
Hc: Complexity of the hash table 
Hm: Complexity of the hash match function 
J: Complexity addition for the dynamic calculation and creation of the  hash function.  
When to use
  Hash joins can efficiently process large, unsorted, non indexed inputs.  
Conclusion
  If we use an efficient Join Operator according the data tables, it can increase  the performance query else it can decrease the performance. So first examine the  size data in tables, after that select an appropriate Join operator.