I have explained windowing in SQL Server in my previous article and I am continuing with the same discussion in this article. In this article I will explain the Over clause in more detail. If you have not studied my first article then please go through http://www.c-sharpcorner.com/UploadFile/78607b/over-clause-in-sql-server-windows-on-data-part-1/ this link and then continue with this article.
I will extend my last article and add one more record to the StudentMarks table. The Insert query for that is:
- insert into StudentMarks values(1,3,10)
I have added a new record that indicates that now I have added two records for the same subject, 3 for the same student 1. It means there are two records for the same subject and the same student. I have done so because I want to explain the difference between Rows and the Range clause.
Difference between Rows and Range Clause
A Rows clause limits the record for the current row in the window but the Range clause considers all the rows in windows for the current row. (Here window means the set of rows that are created by the over clause.) I will explain this with the example. I will create a query that sums the total marks of a single student with rows and range clauses.
- Select
- StudentId,SubjectId,MArks,
- SumRows = Sum(MArks) Over (Order by SubjectId Rows Unbounded Preceding),
- SumRange = Sum(MArks) Over (Order by SubjectId Range Unbounded Preceding)
- From StudentMarks
- where StudentId=1
As you can observe from the preceding data, the records are the same when the subject ids are different (the first two records) but are different (SumRows and SumRange column) when subjects are the same (third and fourth rows). This is because Rows consider the Sum row-wise whereas Range sums the records for the entire subject ones. It works with all the rows with the same subject id for a single student.
Note: The meaning of the Unbounded Preceding is starting from the first row in the partition and Unbounded Following means the last record in the partition.
Understanding First_Value, Last_Value, Lag and Lead Functions in SQL Server
To understand the use of these functions we will create a new table and insert some data into it. I am creating a new table because with a specific example it becomes easy to understand how these functions work. The scripts for the tables and data are:
- CREATE TABLE [dbo].[Order]
- (
- [OrderDate] DATE NOT NULL,
- [ProductId] INT NULL,
- [Quantity] INT NULL
- )
And the insert script for this table is:
- INSERT INTO [dbo].[Order] ([OrderDate], [ProductId], [Quantity]) VALUES (N'2011-03-18', 142, 74)
- INSERT INTO [dbo].[Order] ([OrderDate], [ProductId], [Quantity]) VALUES (N'2011-04-11', 123, 95)
- INSERT INTO [dbo].[Order] ([OrderDate], [ProductId], [Quantity]) VALUES (N'2011-04-12', 101, 38)
- INSERT INTO [dbo].[Order] ([OrderDate], [ProductId], [Quantity]) VALUES (N'2011-05-30', 101, 28)
- INSERT INTO [dbo].[Order] ([OrderDate], [ProductId], [Quantity]) VALUES (N'2011-05-21', 123, 57)
- INSERT INTO [dbo].[Order] ([OrderDate], [ProductId], [Quantity]) VALUES (N'2011-07-28', 101, 60)
This table is storing the Order date and Product Id and its Quantity sold.
Now I will write a query that gives us the details of the Order table along with the dates on which the Product sale was a maximum and minimum and with some more details.
- Select OrderDate,ProductId,Quantity,
- LowestOn = FIRST_VALUE(OrderDate) Over (Partition By ProductId Order by Quantity),
- HighestOn = LAST_VALUE(OrderDate) Over (Partition By ProductId Order by Quantity
- Rows Between Unbounded Preceding and Unbounded Following),
- ProductPrevOn = Lag(OrderDate,1) Over (Partition By ProductId Order by OrderDate),
- ProductNextOn = Lead(OrderDate,1) Over (Partition By ProductId Order by OrderDate)
- from dbo.[Order]
And the output of the preceding query is:
Note: All the functions used in the query above need an order by clause with Over clause.
First_Value Function
This function returns the date on which the product sale was the lowest.
Last_Value Function
This function returns the date on which the product sale was the highest. With this function we have used Rows Between Unbounded Preceding and Unbounded Following that means that the highest date should come between the lowest sale in the window and the highest sale in the window. This is required otherwise the correct result may not be obtained. This is because as we progress with every record considering current row and all the rows before that the current row will be the last so that's why we require specification of a Rows Unbounded clause with a Last_Value function that allows us to find the date between the lowest and highest values.
Lag Function
This function allows us to access the values from the previous records. If no records are found then it returns null. As you can see in the output window the value of ProductPrevOn is Null for the first record since there are no records prior to the first one.
Lead Function
This function allows us to access the values from the next records. If no records are found then it returns null. As you can see in the output window the value of ProductNextOn is Null for the third record since there are no records next to it for the same product.
Understanding Row Number, Rank, Dense Rank Functions
SQL Server has provided us some various functions that isn't regularly used in our coding but some of them may prove to be very useful for us. Some of those functions are:
- Row_Number
This function is used to provide the sequential row number for a given result set. It is very useful in situations where in our result set we do not have a sequential id column.
- Rank
This function also returns a sequential number depending on partition by but it returns the same rank if the column on which it is computed contains the same value. Then it skips the rank and continues with the next rank value for the following rows.
- Dense_Rank
It returns the rank of rows within the partition of a result set, without any gaps in the ranking. We can observe from the following example that after giving 5 as the dense rank for 2 rows it returns 6 as the next rank whereas the Rank function returns 7 by skipping 6.
The SQL Query that shows all the functions is given below:
- Select RowNumber=Row_Number() Over(Order by Marks),
- DenseRank=DENSE_RANK() Over(Order by Marks),
- RowRank=RANK() Over(Order by Marks),
- * from StudentMarks
The output of the above query is: