Tuesday, April 28, 2015

Index all the things

Do what? Why?
No, don't do that. Step away from the keyboard.

A customer recently approached me with a simple request: Can we please add an index to a table? Seems their extraction query was taking quite a long time to run. 
This is one of those pivotal moments where you can be a good DBA and do what’s asked, or be a better DBA and dig into what they are asking to make sure they get what they need (and most times, they don’t really know what they need).

I asked the client for two things: the column/table they want the index created on, and the query they are using to extract the data. They returned this information quickly, and I dove in to see how I could help. Turns out, if I’d done what the client asked for they would be no better off than had I done nothing, and would have made inserts or updates to the table with the newly added index slower.
The query had a few joins, as well as some filters. The column they wanted to add the index on was a DATETIME data type and their query was something like this (shortened and sanitized for confidentiality):

My replay back to the customer was asking for a sample of the date parameter they were using as the filter. They returned back: '20150420'.

Warning: Side track

Ok, so now let's go down a quick date format side track. SQL isn't as picky about date formats as people think. Here are some examples of what the query engine will understand as dates. The image on the left is a simple query, written three ways, and the only difference is what I'm sending in as the date format. The image on the right is part of the three execution plans, all the plans are the same, and when I hover over any of the Clustered Index Scans where the filter is being applied, I don't see any conversions, only the predicate of finding the rows that match the filter.

Back to our original post topic, filters and indexing...

So let's see what happens to our execution plan if I add the index requested, and the customer continues to use their query with the conversion on the DATETIME column in the where clause.
Exactly what we'd expect, SQL Server is performing a table scan. (Yes, there is no clustered index on this table. If there were, we'd see a clustered index scan.) Ok, now is where I add the index on the DATETIME field and rerun the same query. Here's the new execution after I add the index (I could do the same experiment having a clustered index on the table, and we'd see the clustered index scan. Remember, the clustered index IS the table.):

Well that looks eerily familiar. So after adding the index it's still doing a table scan. Ok, so now let me run the customer's query minus the conversion (a system defined function of CONVERT) on the column in the where clause, so essentially I'm altering their WHERE clause from:

WHERE CONVERT(VARCHAR(8),a.TimePeriod,112) >= @DateParameter


WHERE a.TimePeriod = @DateParameter

We can do this because I've determined the data they are sending in as the parameter is interpreted by SQL Server as the correct data type. Keep in mind I have not altered anything else, the index existed before I ran the second query. Third query execution plan contains this for the specified filter:
That's what we want, we can see that the index is being used! 

Now if the customer truly needed some date manipulation (DATEADD, DATEDIFF) on the TimePeriod column, I would rack my brain for the algebra equivalent of a statement that would move the function to the parameter and off of the column on the table. This would take the burden of converting each value before determining if it qualified for the where clause. 

Here is a good answer on Stack Overview of a similar task, and excellent options of moving the function away from the column, can be found here.

This is an example of making a filter SARGable. SARGable is short for Search Argumentable. A filter (or search) is argumentable if a relational database can use an index to find the results. When you put a function (user or system) on a column, it renders your index useless because the index value is the original value, not a converted value.

If you still aren't getting it, I like to use this analogy:
LEN(LastName) >= 12 ?

It’s like a phone book where it's ordered by last name, first name. Now someone wants you to search for all the people with the length of their last name greater than 12 characters. Yes, you have them ordered by last name, but you have to read every single last name to determine if it’s at least 12 characters long. So ordering the phone book by last name, first name didn't help you at all and, in terms of SQL Server, it knows the index won’t help and doesn't even use it, it just scans every single row (or every single row of the clustered index) and converts every single value to determine if it meets the filter criteria. 

There are plenty of posts regarding this topic, I like the way Paul Randal explains it.

The suggestion was given to the customer to re-write the query and drop the convert function. In this case the performance outcome was a 90% decrease in query time.