Quantcast
Channel: sql – Tallan
Viewing all articles
Browse latest Browse all 16

The SQL Server OVER Clause

$
0
0

One useful feature in SQL Server is something called the OVER clause. The OVER clause has been available since SQL Server 2005 and allows you to perform window functions over a set of data.  SQL Server provides two types of window functions: aggregate window functions and ranking window functions.  I will explain what aggregate and ranking window functions are below.  The Adventure Works sample database for SQL Server 2008 R2 will be used for all examples. This database models a retail store that sells biking products. Some may want to review the schema for Adventure works, located here, for better understanding.

Aggregate Window Functions

If you have used SQL’s GROUP BY clause, then you are probably familiar with aggregates.  Aggregate functions allow you to perform a calculation over a set of data records. Each product sold by Adventure Works belongs to a subcategory which in turn belongs to a category.  For example, “Mountain-500 Silver, 48″ belongs to the “Road Bikes” subcategory and the “Bikes” category.  Adventure Works sells 4 categories of product: Clothing, Bikes, Accessories, and Components.  Imagine you wanted to see the total sales for each product category.  You can use the following query:

SELECT C.Name AS Category,
       SUM(D.LineTotal) AS TotalSales
FROM Production.Product P
  INNER JOIN Production.ProductSubcategory S
    ON S.ProductCategoryID = P.ProductSubcategoryID
  INNER JOIN Production.ProductCategory C
    ON C.ProductCategoryID = S.ProductCategoryID
  INNER JOIN Sales.SalesOrderDetail D
    ON D.ProductID = P.ProductID
GROUP BY C.Name

The query uses a GROUP BY clause to split the results into groups by Category Name.  It the performs the SUM aggregate function over the LineTotal for each group.  The final result set has a single row for each group.  The results are as follows:

Result1

As I said before, a GROUP BY clause forces the result set to only contain a single row for each group.  Sometimes you may want to preserve the original rows instead of returning only a row for each group.  Adventure Works splits Sales Orders up into individual line items called Order Details.  Each Order Detail that makes up an Sales Order contains the sales information for a single product.  You can take advantage of the OVER clause and aggregate window functions to find the percentage of the order’s total sale price that each Order Detail takes up.  Look at the following query:

SELECT SalesOrderID,
       ProductID,
       LineTotal AS ProductLineTotal,
       SUM(LineTotal) OVER (PARTITION BY SalesOrderID) AS OrderTotal,
       LineTotal / (SUM(LineTotal) OVER (PARTITION BY SalesOrderID)) * 100 AS SalePercentage
FROM Sales.SalesOrderDetail
ORDER BY SalesOrderID, ProductID

The syntax of an aggregate window function is [AggregateFunction] OVER ([PartitionClause]).  In this example, the OVER clause is used to partition the results by SalesOrderId and then perform a SUM over LineTotal just like we did in the previous query with the GROUP BY, the main difference being that the grouping and the aggregation is done without affecting the rows returned in the result set.  Instead of just returning a single row for each group, the OVER clause allows you to return each individual SalesOrderDetail row while still taking advantage of the power of aggregate functions.  This allows us to find the sales percentage of a product within an order.  Some of the results of the query are below:

Result2

Ranking Window Functions

Ranking window functions allow you to assign a number to each row in the result set.  The number gives an “order” to the rows based on the type of ranking function used.  SQL Server provides four ranking functions:

  • ROW_NUMBER
  • RANK
  • DENSE_RANK
  • NTILE

I will discuss ROW_NUMBER and RANK next.

The ROW_NUMBER Function

The ROW_NUMBER function assigns a value to each row in the result set based on its position in the ordered set.  This technique is often used to implement paging in database queries.  For example, imagine that we wanted to retrieve Products from the database ordered by Name in increments of ten at a time.  We could use the following query:

SELECT P.ProductID,
       P.ProductName
FROM
(
    SELECT ProductID,
           Name AS ProductName,
           ROW_NUMBER() OVER (ORDER BY Name) AS RowNum
    FROM Production.Product
) P
WHERE P.RowNum BETWEEN 1 AND 10
ORDER BY P.RowNum

This query utilizes the OVER clause, the ROW_NUMBER function, and inner queries to retrieve the first ten products ordered by Name.  The OVER clause this time uses the syntax [RankingFunction] OVER ([PartitionClause] [OrderByClause]).  You can optional provide the partition clause which will cause the row numbering to be restarted at 1 for each group.  In the above example, we are only using the ORDER BY clause.  This causes each row in the result set to be assigned a number based on the ordering by name.  This is contained within an inner query, and the RowNum column can then be used to limit the results based on the row number.  Above, we are simply returning rows where the row number is between 1 and 10.  To retrieve the next page of results we would issue the same query with 1 replaced by 11 and with 10 replaced by 20.  Of course, in reality we would programmatically set these numbers rather than hard-coding them.  The results of the query are below:

Result3

The RANK Function

One of the other ranking window functions is called RANK.  RANK, like ROW_NUMBER, will assign a number to each row in the result set.  The difference between RANK and ROW_NUMBER is how they number values that are equal to each other.  For example, if we are ranking sales people by how many sales they made, some sales people might have made the same number of sales.  If Alan made 6 sales, Bernie made 3 sales, Carl made 4 sales, and Dan made 4 sales, then they would be ranked as follows: Alan(1), Carl(2), Dan(2), Bernie(4).

Notice that Carl and Dan are both assigned 2 as their rank.  This is because the both made the same number of sales.  Also notice that the rank of 3 was skipped since there were two people with the rank of 2.  If you wish to not skip a number, then use DENSE_RANK instead.

An example of using RANK appears below along with the results:

SELECT S.FirstName,
       S.Sales,
       RANK() OVER (ORDER BY S.Sales DESC) AS SalesRank
FROM
(
	SELECT C.FirstName, COUNT(O.SalesOrderID) AS Sales
	FROM Sales.SalesPerson SP
	  INNER JOIN HumanResources.Employee E
		ON E.EmployeeID = SP.SalesPersonID
	  INNER JOIN Person.Contact C
		ON C.ContactID = E.ContactID
	  INNER JOIN Sales.SalesOrderHeader O
		ON O.SalesPersonID = SP.SalesPersonID
	GROUP BY SP.SalesPersonID, C.FirstName
) S
ORDER BY SalesRank

Result4

Conclusion

The OVER clause  and window functions available in SQL Server can provide a lot of value if you know how to use them.  I can’t tell you how many times I have seen team members struggling with a query where the OVER clause would be a perfect fit, but they did not even know of its existence.  The OVER clause is a valuable tool that should be a part of every developer’s SQL arsenal.


Viewing all articles
Browse latest Browse all 16

Trending Articles