Just a simple solution to show a set-based answer to the FizzBuzz question.
If you are not familiar with the programming scenario of FizzBuzz, it is losely the following: Write a program that displays the numbers from 1 to 100. If the number is a multiple of three, print "Fizz" instead of the number. If the number is a multiple of five, print "Buzz". For numbers which are multiples of three AND five, print "FizzBuzz".
Most programmers immediately turn to a loop, use a counter and up the counter for each iteration, determine if the number is divisible outlined above, and move to the next number.
In my example, I'm using the master.dbo.spt_values table which holds the range of values we need for this exercise. I normally create date and number tables for just such uses on database servers so as not to rely on system tables.
Aaron Bertrand has a good performance comparison with using numbers tables, date tables and common table expressions (CTE) usage. His Generate a Sequence Without Loops Part I and Part II are good articles to read if you want to dive further into this topic. His Part III of the series dives into comparing the use of number and date tables, as well as using CTEs.
Friday, May 22, 2015
Tuesday, April 28, 2015
|Do what? Why?|
A customer recently approached me with a simple request: Can we please add an index to a table? Seems their extraction query was taking quite a long time to run.
This is one of those pivotal moments where you can be a good DBA and do what’s asked, or be a better DBA and dig into what they are asking to make sure they get what they need (and most times, they don’t really know what they need).
I asked the client for two things: the column/table they want the index created on, and the query they are using to extract the data. They returned this information quickly, and I dove in to see how I could help. Turns out, if I’d done what the client asked for they would be no better off than had I done nothing, and would have made inserts or updates to the table with the newly added index slower.
The query had a few joins, as well as some filters. The column they wanted to add the index on was a DATETIME data type and their query was something like this (shortened and sanitized for confidentiality):
My replay back to the customer was asking for a sample of the date parameter they were using as the filter. They returned back: '20150420'.
Ok, so now let's go down a quick date format side track. SQL isn't as picky about date formats as people think. Here are some examples of what the query engine will understand as dates. The image on the left is a simple query, written three ways, and the only difference is what I'm sending in as the date format. The image on the right is part of the three execution plans, all the plans are the same, and when I hover over any of the Clustered Index Scans where the filter is being applied, I don't see any conversions, only the predicate of finding the rows that match the filter.
So let's see what happens to our execution plan if I add the index requested, and the customer continues to use their query with the conversion on the DATETIME column in the where clause.
Exactly what we'd expect, SQL Server is performing a table scan. (Yes, there is no clustered index on this table. If there were, we'd see a clustered index scan.) Ok, now is where I add the index on the DATETIME field and rerun the same query. Here's the new execution after I add the index (I could do the same experiment having a clustered index on the table, and we'd see the clustered index scan. Remember, the clustered index IS the table.):
WHERE CONVERT(VARCHAR(8),a.TimePeriod,112) >= @DateParameter
WHERE a.TimePeriod = @DateParameter
We can do this because I've determined the data they are sending in as the parameter is interpreted by SQL Server as the correct data type. Keep in mind I have not altered anything else, the index existed before I ran the second query. Third query execution plan contains this for the specified filter:
That's what we want, we can see that the index is being used!
Now if the customer truly needed some date manipulation (DATEADD, DATEDIFF) on the TimePeriod column, I would rack my brain for the algebra equivalent of a statement that would move the function to the parameter and off of the column on the table. This would take the burden of converting each value before determining if it qualified for the where clause.
Here is a good answer on Stack Overview of a similar task, and excellent options of moving the function away from the column, can be found here.
This is an example of making a filter SARGable. SARGable is short for Search Argumentable. A filter (or search) is argumentable if a relational database can use an index to find the results. When you put a function (user or system) on a column, it renders your index useless because the index value is the original value, not a converted value.
If you still aren't getting it, I like to use this analogy:
|LEN(LastName) >= 12 ?|
It’s like a phone book where it's ordered by last name, first name. Now someone wants you to search for all the people with the length of their last name greater than 12 characters. Yes, you have them ordered by last name, but you have to read every single last name to determine if it’s at least 12 characters long. So ordering the phone book by last name, first name didn't help you at all and, in terms of SQL Server, it knows the index won’t help and doesn't even use it, it just scans every single row (or every single row of the clustered index) and converts every single value to determine if it meets the filter criteria.
There are plenty of posts regarding this topic, I like the way Paul Randal explains it.
The suggestion was given to the customer to re-write the query and drop the convert function. In this case the performance outcome was a 90% decrease in query time.