The Performance of the T-SQL Window Functions

Window Functions in SQL greatly simplify a whole range of financial and statistical aggregations on sets of data. Because there is less SQL on the page, it is easy to assume that the performance is better too: but is it? Dwain gets out the test harness to investigate.

SQL has always had Aggregate functions like SUM(), MAX(); without them, it would be impossible to do many routine SQL queries. The SQL Standard introduced window functions in ANSI SQL 2003 to deal with a range of processing tasks that required aggregating on a partition of a set, and extended in ANSI SQL:2008.  In SQL 2005, Microsoft introduced the first of the class of window functions in two flavors: Ranking and Aggregates, and released further functions in subsequent releases. 

When were Window Functions introduced in SQL Server?

Let’s take a quick look at some abridged history of the implementation of window functions in SQL Server.

SQL

FUNCTION Types

FUNCTIONS

Remark

2005

Ranking

ROW_NUMBER, DENSE_RANK, RANK, NTILE

Supports PARTITION BY (optional) and ORDER BY (required) in the OVER clause.

Aggregates

SUM, AVG, COUNT, MIN, MAX

Requires PARTITION BY in the OVER clause.

2008

Ranking

ROW_NUMBER, DENSE_RANK, RANK, NTILE

No relevant changes to supported functionality.

Aggregates

SUM, AVG, COUNT, MIN, MAX

2012

Ranking

ROW_NUMBER, DENSE_RANK, RANK, NTILE

No relevant changes to supported functionality.

Aggregates

SUM, AVG, COUNT, MIN, MAX

Window frame capability (ROWS/RANGE) was added.

Analytic

CUME_DIST, LEAD, FIRST_VALUE, PERCENTILE_CONT, LAG, PERCENTILE_DISC, LAST_VALUE, PERCENT_RANK

Added

There are plenty of excellent articles that explain the workings of the SQL window aggregate functions, for example:

They’re useful, but do they perform well?

I’ve seen very few discussions of the performance characteristics of these functions, so today we will attempt to do just that.  We will assume that you have a basic understanding of how both SQL aggregate and window aggregate functions work.  If you are unsure or they are new to you, you should probably read either/both Joe Celko’s or Wayne Sheffield’s articles first.

Before window functions existed, we had to produce quite complex code to do those tasks for which window functions were designed.  Nowadays, we find window functions so useful and flexible that their use has become ubiquitous in SQL.

Although there is no doubt that the window functions add richness to the SQL language, greatly simplifying the syntax and queries they appear in, we’re still left with the nagging doubt as to whether they are as fast as the older methods.  They’re more easily maintained, but are they faster? This is what we want to find out. We haven’t the space to look at every window function in detail, so we’ll limit our testing to just a few, and offer some references where you can look for additional comparisons.

Sample Data and a Simple Example

The simple table and sample data we’ll use for this example is the same as one that I’ve published in a Simple Talk article in the past.  You’ll see the reference when we get around to discussing PERCENTILE_CONT.

If we run this code with the WHERE clause uncommented, it produces this results set showing how the COUNT of the rows within the PARTITION are appended as a column to each row.

Even though they’re named the same, the window aggregates operate differently from the original aggregate functions, in that the result adds a column across the entire row set, rather than condensing the set to the partitions that result from the GROUP BY.  With just a little extra effort on the coding side, it is possible to use the original SQL aggregate functions to mimic what the window aggregates do.

In SQL 2000 and earlier, the equivalent result could be obtained using the following code pattern:

And when SQL 2005 came along, this could also be done with a CROSS APPLY:

When I compared the SQL 2000 pattern and the CROSS APPLY pattern available in SQL 2005, I noted no discernable performance difference.  Since I like CROSS APPLY and I’m the writer, ( … and I’m the editor: Ed) our initial comparison will be between the window aggregate COUNT vs. the SQL 2005 CROSS APPLY pattern.

The Performance Test Harness

If we TRUNCATE the data we put into our table initially, we can add a large number of rows to it using the following test harness.

Our initial test at 1 million rows, shunting the results into a local variable, generated these timing results for COUNT.

You can already see that this particular window aggregate seemingly cannot be classified as “high-performance” SQL.   But we don’t want to generalize, so let’s test each of the window aggregate functions vs. its corresponding traditional pattern and graphically demonstrate the performance results.

Performance of Traditional Aggregates vs. Window Aggregates

Since it is easy to replace COUNT(*) with XXX(N) (where XXX=SUM, AVG, MIN and MAX) in each of the two queries we’re testing, we can easily generate queries that compare each aggregate function against its corresponding window aggregate.  Resource files (described at the end) are provided if you want to see each of these queries or run these performance tests yourself.

We’ll also include a test case with all of the aggregates in one query, to determine whether the cost is cumulative or not.  Those two queries look like this.

The histograms show the comparative elapsed times for each of the aggregates, with the last column showing the case of all aggregates applied to the same partition within one query, with and without the suggested INDEX.

1927-clip_image001-620x314.jpg

1927-clip_image002-620x314.jpg

Many of these cases perform better without the index than with it.

This table shows the CPU and elapsed time, indicating the “cost” of each window aggregate compared to the traditional aggregation method.

With INDEX

Measure

Query

COUNT

SUM

AVG

MIN

MAX

ALL

CPU (ms)

Window Aggregate (WA)

3039

3136

3167

2901

3028

3556

Traditional Aggregate (TA)

390

499

531

453

452

858

   Cost

 

679%

528%

496%

540%

570%

314%

Elapsed (ms)

Window Aggregate (WA)

2208

2260

2419

2236

2233

2676

Traditional Aggregate (TA)

385

491

525

465

457

855

   Cost

 

474%

360%

361%

381%

389%

213%

 

 

 

 

 

 

 

 

Without INDEX

Measure

Query

COUNT

SUM

AVG

MIN

MAX

ALL

CPU (ms)

Window Aggregate (WA)

4945

5226

5130

5257

5008

6287

Traditional Aggregate (TA)

1141

1357

1391

1200

1247

1529

   Cost

 

333%

285%

269%

338%

302%

311%

Elapsed (ms)

Window Aggregate (WA)

1933

1967

1921

1663

1823

2050

Traditional Aggregate (TA)

441

440

453

416

423

648

   Cost

 

338%

347%

324%

300%

331%

216%

Cost in all cases is calculated as: 100*(WA-TA)/TA

I’m not sure where you, my valued reader stands, but this seems to be a pretty steep cost to pay for streamlining the query syntax!  These timing results were obtained in SQL 2008 but, sadly, there was no perceptible improvement when running the equivalent test in SQL 2012.  One could have hoped that Microsoft noticed or was informed of this lag in performance, and could have done something to improve it.

Redemption in the Ranking Functions

When I first started using the window ranking functions in SQL 2005, I was very impressed.  All of them solve some very sticky querying predicaments and they do so with excellent speed.  In fact, I’d be hard-pressed to come up with any SQL 2000 alternatives that have even reasonable performance (without using temporary tables) to compare against so I won’t even try.

Kudos to Microsoft on this one!

SQL 2012: Enter the Window Frame for the Aggregate Functions

In SQL versions past, there are a few problems that posed some very difficult cases for producing a high performing query.  The most notable of these are the following:

Both of these problems are now quite easily solved by the SQL 2012 window frame approach.  Rather than go through all of the details here, I’ve provided some useful links above where you can take a look at these problems yourself.  I will summarize the results you can expect here.

  1. Both problems can be solved using a SQL 2012 window frame applied to the window aggregate SUM function.
  2. You can expect the window frame approach to be faster than any triangular or semi-triangular JOIN approach (these are explained in the two linked articles above).
  3. Neither problem can be solved faster with a window frame than using what is known as the Quirky Update (QU).  The article by SQL MVP Jeff Moden linked in above describes in great detail all the rules needed to correctly implement the QU.  My article on calculating values within a rolling window shows how it is much faster than the window frame approach for that particular problem.  Microsoft Certified Master Wayne Sheffield showed how the QU is faster than a SQL 2012 window frame for the running totals problem in his blog Running totals in “Denali” CTP3.  Even though that study was done in a pre-release of SQL 2012, it is doubtful the performance characteristic has changed.

Overall, the SQL 2012 window frame offers a welcome addition to the SQL window functions.  The performance is better than most methods and given that its behavior is fully documented (unlike the QU), it is a prime choice to use for these problems, unless you absolutely must squeak out the highest possible performance and you’re willing to live with the “undocumented behavior” of the QU approach.

SQL 2012: And Then There are the Analytic Functions

The analytic functions in SQL 2012 offer unprecedented new functionality that is extremely useful in statistical analyses.  Just the fact that they exist will save many developers of scientific, statistical and engineering applications lots of headaches trying to replicate the same functionality within complex SQL code.  Of course, I’m sure CLR versions exist, but there will be cases when they can’t be used due to development constraints.  Overall I have to give kudos to Microsoft for these as well.

But in the world of SQL, it always depends.  Sometimes there are alternatives that may be just enough faster that you’d want to consider using them even though they may look more complicated.

PERCENTILE_CONT

A case in point is the PERCENTILE_CONT analytic function.  Its most common usage is to calculate the median of a statistical sample.  With the PARTITION clause, it can do so against partitioned sets.  Let’s compare a couple of queries that calculate median for this case.  We can use the same sample data that we created before.

While the second query may look extraordinarily complex, the results when we run it are much better than PERCENTILE_CONT.

 

MEDIAN

With INDEX

Without INDEX

CPU (ms)

Elapsed (ms)

CPU (ms)

Elapsed (ms)

SQL 2012: PERCENTILE_CONT

6520

5842

13385

3786

SQL 2008: Complex Median Query

2340

2327

5477

1897

Cost

179%

151%

144%

100%

And there are even (much) faster solutions available to calculate median for cases where the suggested index is present.  The test harness provided in this article’s resources section is described at the end.

On the other hand, PERCENTILE_CONT does a lot more than just calculate the median.  We’ll let you explore the description by Microsoft linked into the function name to find out more.

LAG and LEAD Applied to Finding Gaps in Sequence Numbers

Recently, Simple Talk published an article of mine entitled The SQL of Gaps and Islands in Sequences.  In that article I compared 3 “traditional” solutions for the gaps problem to one of my own design.  We’ll now take that test harness provided for 1M rows and modify it (see description of the resource files provided at the end) to consider the methods within SQL 2012 where LAG and LEAD can be applied to identify gaps.  Oftentimes you see one or the other of these solutions suggested by the author but here we’ll show both.

Once again, these solutions are nice and concise, but let’s see how they perform in our test harness.

MEDIAN

CPU (ms)

Elapsed (ms)

Gaps Solution #1 from SQL MVP Deep Dives

780

782

Gaps Solution #2 from SQL MVP Deep Dives

4665

1451

Gaps Solution #3 from SQL MVP Deep Dives

1233

1219

CROSS APPLY VALUES – Islands to Gaps

421

462

SQL 2012 LEAD

1747

1740

SQL 2012 LAG

1357

1363

In this case, all of the traditional solutions outperformed the SQL 2012 LEAD function, and all but one outperformed LAG, in elapsed time.  Only one traditional solution consumed more CPU time.  My conclusion from this is that, while LAG (or LEAD) may improve the clarity of your code, opting for one of the well-documented, traditional approaches to gaps will solve the problem (particularly against very large row sets) more efficiently.

Wayne Sheffield also blogged on some of the existing high performance approaches to gaps in SQL 2012 Performance Test: Gap Detection.  In this blog, he compares the performance of LEAD only to a couple of competing high-performance methods that can be run in earlier versions of SQL.  His test of LEAD also showed it to be much slower (cost about 250% according to my prior calculation method) than the other approaches that he tested.  Using LAG would be expected to have approximately the same performance characteristic (although in my test harness it appears to be slightly better).

Conclusions

Just as new doesn’t always mean better, simpler sometimes does not always translate into better either, particularly where performance is concerned.  Ultimately you must be the judge of whether simpler query forms are more desirable from a maintenance perspective than having a better performing solution.

As far as the SQL window ranking functions are concerned, there is probably no better approach available in earlier versions of SQL.  On the other hand, the SQL window aggregate functions can be much improved upon with respect to performance by relying on only a slightly less simple code pattern based on standard aggregate functions.

For the SQL 2012 enhancements to the window aggregates (the window frame), for problems such as running totals they’re probably a good alternative if you don’t like or understand how the Quirky Update works.  LEAD and LAG show promise, but there remain a few traditional methods that outperform them for problems like finding gaps in sequence numbers.

All of the new SQL 2012 analytic window functions provided unprecedented flexibility.  But using PERCENTILE_CONT for calculating median can be improved upon from a performance perspective using more traditional, albeit more complicated code patterns.

In the end, using the latest query forms may offer a shorter path to getting your code working, because the queries tend to be simpler.  This article is simply a suggestion that you consider looking into and testing other code patterns in cases where performance is really important to you.

The resource files provided with this article are:

  • Window Aggregate Functions Test Harness.sql – This test harness is initially set up to create a 1,000,000 row test table and then display the timing results for window aggregates vs. corresponding traditional aggregations, while suppressing the actual query results by assigning to a local variable.  Comments in the beginning will guide you if you want to just display results for the simple test data shown in the initial example.
  • Median Test Harness.sql – This test harness is initially set up to create a 1,000,000 row (approximate) test table and then display the timing results for the SQL 2012 vs. an alternative median solution, while suppressing the actual query results by assigning to a local variable.  Comments in the beginning will guide you if you want to just display results for the simple test data shown in the initial example.
  • Gaps Test Harness.sql – This test harness is initially set up to create a 1,000,000 row (approximate) test table filled with gaps and then test 4 “traditional” gaps solutions vs. the SQL 2012 LEAD and LAG functions, while suppressing the actual query results by assigning to a local variable.  Comments in the beginning will guide you if you want to just display results for the simple test data shown in the initial example.