mssql rank(), dense_rank(), row_number()...펌

One of the most handy features introduced in SQL 2005 were the ranking functions; ROW_NUMBER(), RANK(), and DENSE_RANK(). For anyone who hasn’t been introduced to these syntactic gems, here’s a quick rundown (for those of you who are very familiar with these functions already, feel free to read through, or skip right down to “There’s No Such Thing as a Free Ride” below).

OK – so the general syntax for any one of these commands is more or less the same:

ROW_NUMBER() OVER ([<partition_by_clause>] <order_by_clause>) 
RANK() OVER ([<partition_by_clause>] <order_by_clause>) 
DENSE_RANK() OVER ([<partition_by_clause>] <order_by_clause>)

So the PARTITION BY part is optional, but everything else is required. An example of a non partitioned, and then a partitioned ROW_NUMBER() clause are listed below:

ROW_NUMBER() OVER (ORDER BY TotalDue DESC) 
ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY TotalDue DESC)

The difference between the three functions is best explained using an example. Here’s the data I’m using for this example, in case you want to follow the bouncing ball at home ;-)

CREATE TABLE OrderRanking

   (

   OrderID INT IDENTITY(1,1) NOT NULL,

   CustomerID INT,

   OrderTotal decimal(15,2)

   )

   
INSERT OrderRanking (CustomerID, OrderTotal)

SELECT 1, 1000

UNION 
SELECT 1, 500

UNION 
SELECT 1, 650

UNION 
SELECT 1, 3000

UNION 
SELECT 2, 1000

UNION 
SELECT 2, 2000

UNION 
SELECT 2, 500

UNION 
SELECT 2, 500

UNION 
SELECT 3, 500

I’ll use the following (admittedly ugly) query to demonstrate the difference between each function:

SELECT  *,

        ROW_NUMBER() OVER (ORDER BY OrderTotal DESC) AS RN,

        ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY OrderTotal DESC) AS RNP,

        RANK() OVER (ORDER BY OrderTotal DESC) AS R,

        RANK() OVER (PARTITION BY CustomerID ORDER BY OrderTotal DESC) AS RP,

        DENSE_RANK() OVER (ORDER BY OrderTotal DESC) AS DR,

        DENSE_RANK() OVER (PARTITION BY CustomerID ORDER BY OrderTotal DESC) AS DRP

FROM    OrderRanking

ORDER BY OrderTotal DESC

Excuse the terrible aliases. Anything longer and the code snippets and output in this blog entry get really, really ugly. When we run the query, this is what we get:

So from the example above, we can see that:

ROW_NUMBER() assigns sequential numbers to each partition in a result set (an unpartitioned result set simply has a single partition), based upon the order of the results as specified in the ORDER BY clause. If you look carefully, you’ll see that the values in column RN are based upon a simple sort of TotalDue, while the values in Column RNP (Row_Number partitioned) are first partitioned or “grouped” by CustomerID, and then numbered by TotalDue, with the row number resetting on change of customer.
Contrary to popular belief, RANK() does not sort rows based upon how bad they smell. RANK() does much the same thing as ROW_NUMBER(), only it acknowledges ties in the columns specified in the ORDER BY clause, and assigns them the same rank. Where a tie occurs (as was the case for orders 6/3, and 1/5/8), the numbers that would otherwise have been “used up” are skipped, and numbering resumes at the next available number. As you can see, RANK() leaves a gap whenever there is a tie.
DENSE_RANK() doesn’t like gaps. It’s more of an Abercrombie & Fitch kind of function (ba-dum-ching!). Ohhhhh…that was terrible. My sense of humour may give me up for lent. You might follow it. Anyway….DENSE_RANK() “fills in the gaps”. It starts from the next number after a tie occurs, so instead of 1, 2, 3, 3, 5 you get 1, 2, 3, 3, 4.

There’s No Such Thing as a Free Ride

Ranking functions are not only useful for simple ranking – they’re also great for solving complex problems. In fact, once you get to know them, you’ll find that you’re using them for waaaaaay more than just ranking. In anything from splitting strings to deleting duplicates, ranking functions are the cat’s meow.

But just like most things in ~~life~~ SQL Server, less is more, when you can get away with it. For example, let’s say that you need to get the top order (by TotalDue) for each Customer in AdventureWorks (AdventureWorks2008 in my examples below). You can definitely use ROW_NUMBER (or any of the other ranking functions, for that matter) to do this:

SELECT soh.*

FROM   (SELECT CustomerID, SalesOrderID, TotalDue, 
               ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY TotalDue DESC) AS RowNumber

       FROM    Sales.SalesOrderHeader) AS soh

WHERE  soh.RowNumber = 1

The WHERE soh.RowNumber = 1 restricts our results to the top order for each customer. Lovely. And the really beautiful thing about this is, if you need the top 2 orders for each customer, or 3, or 4, or x, all you need to do is replace the = 1 with <=2 (for example), and you’re good to go. Now, with that in mind, let’s look at this query:

SELECT soh.CustomerID, soh.TotalDue

FROM   Sales.SalesOrderHeader soh

JOIN   (SELECT     CustomerID, MAX(TotalDue) AS MaxTotalDue

       FROM        Sales.SalesOrderHeader

       GROUP BY CustomerID) AS ttls   ON soh.CustomerID = ttls.CustomerID

                                       AND soh.TotalDue = ttls.MaxTotalDue

If you plug this bad boy in, and run it, you might be surprised by the outcome. It’s actually about half as expensive as the Row_Number solution – but why? Well, as you may or may not know, sorts can be very, very expensive in SQL Server. If we’re only fetching the highest $ sales order for a given customer, the MAX solution does it without a sort, whereas the ROW_NUMBER solution needs to sort (the ORDER BY clause is mandatory, remember).

But there are some caveats to the MAX solution – most notably, how in the world can we get the top 5 orders for each customer? Well…the short answer is, we can’t. We need to change the query up, and in doing so, we’re once again going to incur a sort. once we get beyond a query that the MAX or MIN tricks can satisfy – for instance, if we need to fetch the top 5 orders for each customer, we may as well take advantage of the ease of coding, and the improved readability of the Row_Number solution. If we want a solution for the “top 5” problem without invoking a ranking function, we’re going to end up with something like this:

SELECT soh.CustomerID, soh.TotalDue

FROM   Sales.SalesOrderHeader soh

WHERE  soh.SalesOrderID IN  
       (SELECT     TOP 5 SalesOrderID

       FROM        Sales.SalesOrderHeader soh2

       WHERE       soh2.CustomerID = soh.CustomerID

       ORDER BY TotalDue DESC)

Which in this case is a very, very crappy alternative to a ranking function. Not only is it uglier, but the query plan isn’t nearly as efficient, and the execution times were consistently about 20% longer in my tests.

Now that said, my tests are against a single data set only, and based upon the nature of your data, your mileage may vary. As a general rule, I would use aggregate functions if I’m only looking for the highest or lowest data point in a series, and a ranking function for anything that can’t be solved by simple aggregation.

11 comments:

haticsaid...: Hi,
How can I convert these two rows to access sql?

dense_rank() over(partition by field1 order by field2) as name1,

row_number() over(partition by fld1, (dense_rank() over(partition by fldnm1 order by fldnm2)) order by fld2) as name2

thanks in advance; July 21, 2009 at 11:38 AM
Aaron Altonsaid...: Hi hatic,

Sorry - I don't know Access SQL syntax from a hole in the wall. You can give the Microsoft Access Newsgroups a try though. Good luck!; July 21, 2009 at 11:55 PM
Brad Schulzsaid...: Hi Aaron...

For the TOP 5 orders per customer, the following CROSS APPLY method is a more efficient approach than the SalesOrderID IN method. The query plan is roughly half the cost. (Excuse the formatting... it may not come through):

SELECT CustomerID,TotalDue
FROM (SELECT DISTINCT CustomerID
FROM Sales.SalesOrderHeader) soh
CROSS APPLY (SELECT TOP 5 TotalDue
FROM Sales.SalesOrderHeader soh2
WHERE soh2.CustomerID=soh.CustomerID
ORDER BY TotalDue DESC) F1

That being said, though, the ROW_NUMBER() approach is the way to go (no question) for this type of thing.

--Brad; August 12, 2009 at 7:19 PM
Brian Tkatchsaid...: I've been using CROSS APPLY for only one record at a time. I forgot/didn't realize it can return an entire set.; August 13, 2009 at 7:54 AM
huruysaid...: Thank you so much.; March 30, 2010 at 11:29 PM
Thirumal Reddysaid...: Excellent Article ThankQ so much for posting this article; August 26, 2010 at 7:09 AM
Somysaid...: Excellent post, thanks for the article.; May 19, 2011 at 11:11 AM
Abixelsaid...: Hi Great article.This tutorial saved me a lot in trying to write my own custom ranking function.By simply following your example i had to substitute row_number() with rank() in three of my stored procedures and voila!!!, my problem of detecting ties in records was solved.Keep up the good work if you were near i would have bought you a cup of coffee.; November 28, 2011 at 4:46 PM
Abixelsaid...: Thanks your article it saved me a lot of time that I would have spent tying to write a custom ranking function.Substituted the ROW_NUMBER() function with RANK() function in three of my stored procedures and voila!!!!, ties are being smoothly detected.Would have bought you a cup of coffee if you were near.We need people in the world like you.; November 28, 2011 at 4:51 PM
Ksaid...: I know this is a bit late ...

But I'm curious about the queries you tested in the 'no free ride' section. The 'MAX' query is actually sensibly less-expensive, namely because everything but the inner subquery is extraneous. I wonder what your performance test results would be like if the outer query also returned an additional column from the SalesOrderHeader table. I have a hunch that they'll be different and I think that's what I've observed before.; May 30, 2013 at 6:07 PM
salil patilsaid...: Very Helpful :)
Thank You; June 5, 2013 at 9:23 AM

북한 여행 가능한가요?...펌 (0)	2013.07.20
애니팡 토파즈...펌 (0)	2013.07.17
mssql backup 쿼리, noskip, noformat, noinit, norewind, nounload, stats...펌 (0)	2013.07.16
create iif function in sql...펌 (0)	2013.07.16
도스 dir /s 파일 폴더 찾기...펌 (0)	2013.07.13

맘편한넘

mssql rank(), dense_rank(), row_number()...펌

11 comments:

'메모' 카테고리의 다른 글

+ Recent posts

티스토리툴바