Windowing in SQL Server is done by the over clause that was introduced in SQL Server 2005. Windowing of data in SQL Server or the window function is applied to a set of rows (partitioned data based upon some column known as a window) to rank or aggregate values in that window or partition set. Windowing basically creates a window of records for every record. That window is then used for making computations. In SQL Server 2012 various changes have been done to that and I will explain those in my article. I am explaining this concept in two parts. I will first start with how the Over clause works.
For understanding the Over Clause we will start by creating some examples. For that we will first create a table and populate it with data.
- CREATE TABLE [dbo].[StudentMarks]
- (
- [StudentId] INT NOT NULL,
- [SubjectId] INT NOT NULL,
- [Marks] INT NOT NULL
- )
-
- INSERT INTO [dbo].[StudentMarks] ([StudentId], [SubjectId], [Marks]) VALUES (1, 1, 50)
- INSERT INTO [dbo].[StudentMarks] ([StudentId], [SubjectId], [Marks]) VALUES (1, 2, 80)
- INSERT INTO [dbo].[StudentMarks] ([StudentId], [SubjectId], [Marks]) VALUES (1, 3, 70)
- INSERT INTO [dbo].[StudentMarks] ([StudentId], [SubjectId], [Marks]) VALUES (2, 1, 90)
- INSERT INTO [dbo].[StudentMarks] ([StudentId], [SubjectId], [Marks]) VALUES (2, 2, 60)
- INSERT INTO [dbo].[StudentMarks] ([StudentId], [SubjectId], [Marks]) VALUES (2, 3, 50)
- INSERT INTO [dbo].[StudentMarks] ([StudentId], [SubjectId], [Marks]) VALUES (3, 1, 20)
- INSERT INTO [dbo].[StudentMarks] ([StudentId], [SubjectId], [Marks]) VALUES (3, 2, 30)
- INSERT INTO [dbo].[StudentMarks] ([StudentId], [SubjectId], [Marks]) VALUES (3, 3, 40)
For understanding the over clause I will write a query to provide us the Highest, Lowest and Average among all the marks of all the students.
- Select
- StudentId, SubjectId, MARKS,
- TotalSubjects= Count(*) Over (),
- Average = Avg(Marks) Over (),
- Maximum = Max(Marks) Over (),
- Minimum = Min(Marks) Over (),
- Total = Sum(Marks) Over ()
- From StudentMarks
Note: the Over clause allows the use of aggregate functions without using a group by clause.
The result of the query above is:
The preceding query will return the Average, Maximum, Minimum and Sum of Marks among all the records as you can see above. Behind the scenes a window is created for every record and a computation is done on that window.
Now if we want to get all these aggregations for each Student then we need to write a different query. For that we need to partition the data based upon Student Id.
- Select
- StudentId, SubjectId, MARKS,
- TotalSubjects= Count(*) Over (Partition By StudentId),
- Average = Avg(Marks) Over (Partition By StudentId),
- Maximum = Max(Marks) Over (Partition By StudentId),
- Minimum = Min(Marks) Over (Partition By StudentId),
- Total = Sum(Marks) Over (Partition By StudentId)
- From StudentMarks
The output for the preceding query is:
As you can see above, the average, maximum, minimum and total is done based upon student id, in other words we are getting Average, Maximum, Minimum and total for every student in his/her subjects.
Now if we want to get running aggregations, that means aggregations based upon each and every record, then we need to use the order by clause in the over function. In SQL Server 2012 it is possible to use the order by clause in the over function that was not permitted in earlier versions of SQL Server. So the query for getting running aggregations will be:
- Select
- StudentId, SubjectId, MARKS,
- TotalSubjects= Count(*) Over (Partition By StudentId) ,
- Average = Avg(Marks) Over (Partition By StudentId Order By SubjectID),
- Maximum = Max(Marks) Over (Partition By StudentId Order By SubjectID),
- Minimum = Min(Marks) Over (Partition By StudentId Order By SubjectID),
- Total = Sum(Marks) Over (Partition By StudentId Order By SubjectID)
- From StudentMarks
And the output is given as:
As you can see in the preceding, the average, maximum, minimum and total columns are changing based upon every record. When the first record was processed then that was the only record in the window so the average, max, min and total is the same. Then when the second record is processed then there are two records in the window and the average, max, min and total are calculated based upon two records and this process continues for all the records for the same student.
Sliding Aggregations based upon conditions
Now if we want to restrict our computations based upon some rows then we will use Rows and a Preceding clause in the over function. For example I want that the computation (average, maximum, minimum and total) of records should be done based upon the current row and one row above that then we will write the following query.
- Select
- StudentId, SubjectId, MARKS,
- TotalSubjects= Count(*) Over (Partition By StudentId),
- Average = Avg(Marks) Over (Partition By StudentId Order By SubjectID Rows Between 1 Preceding and Current Row),
- Maximum = Max(Marks) Over (Partition By StudentId Order By SubjectID Rows 1 Preceding),
- Minimum = Min(Marks) Over (Partition By StudentId Order By SubjectID Rows 1 Preceding),
- Total = Sum(Marks) Over (Partition By StudentId Order By SubjectID Rows 1 Preceding)
- From StudentMarks
In the preceding query I have used Rows Between 1 and Preceding and Current Row which means that the computation should be done between the current row and 1 row above that. Writing Rows between 1 and Preceding and the current row is similar to Rows 1 Preceding. The output of the preceding query will be:
I hope this article has clarified some concepts of the over clause. I will explain some more concepts on the same topic in my next article.