Tuesday, September 16, 2014

Dynamically Updating Statistics on all Databases, With Sweet Potato Fries.

Sorry, I lied about the sweet potato fries. But this script will allow you to update statistics with options. Still interested? 

This is one of the steps of my customized maintenance plan to update statistics on all databases on a server. 

Overview of statistics in SQL Server, sample sizes and auto update information can be found on many sites. I like Grant Fritchey's article, SQL Server Statistics Questions We Were Too Shy to Ask.

Also the algorithm for determining the sampling rate to use for auto update statistics can be found here: SQL Server Statistics: Explained

Auto Update stats Algorithm: 
So the Auto Update stats will fire for every 500 + 20% change in table rows. Of course, we have an improved algorithm in SQL 2012 which is SQRT(1000 * Table rows) which is much better.

When it fires it will use the  default sampling rate and here is the algorithm how it calculates the sampling rate.
1)      If the table < 8MB then it updates the statistics with a fullscan.
2)      If the table > 8MB, it follows an algorithm. It reduces the sampling rate as the number of rows in the table are increased to make sure we are not scanning too much data. This is not a fixed value but is under the control of optimizer. It is not a linear algorithm either.
Example: if we have 1,000,000 rows it would use a sampling rate of 30% but when the number of rows increase to 8,000,000 it would reduce the sampling rate to 10%. These sampling rates are not under the DBAs control but optimizer decides it.
Customize these aspects, to fit your needs:
  • I have time in my maintenance window to perform FULLSCAN instead of the standard sample, but you can change this to a sample percent size, or drop the "WITH FULLSCAN" in the script to use SQL Server standard sample size.
  • I have defined a statistic as being "out of date" if RowModCtr / RowCnt is greater or equal to 2, so 2% of the row count changes, according to the "guesstimate" of RowModCtr. I say guesstimate because RowModCtr is not a true count of changes to the row, if you are using SQL 2005 or later (Books online).
  • I also update statistics, regardless of how many changes to the table have occurred, for stats older than 30 days. You can change that time frame.
Disclaimer: this script is based on one found on DBA Stack Exchange as it most closely fit my needs.

Saturday, July 12, 2014

When should I use a function?

I was asked recently by a colleague: When should I use a function? I wanted to refresh my memory on the subject and give a high level, but informative, response. Here goes...

Functions can allow you to encapsulate code and reuse complex statements. UDFs (user-defined functions) in SQL Server consist of scalar (return a single value) and table valued functions. You can think of user defined scalar functions just like system built-in functions (SUBSTRING, GETDATE, CHARINDEX) where they will return one value and, when applied to multiple rows, SQL Server will execute the function once for every row in the result set.

Table valued functions can be similar, in functionality (see what I did there?), to a stored procedure. Rather than executing the stored procedure to get a result, you can select from them, join them to other tables, and generally use them anywhere you would use a table. Awesome, right?

Don’t go changing all your Stored Procs to TVFs just yet...

If the TVF contains a single statement (called inline) then the optimizer will treat it similar to a view, in that it will reference the underlying objects in the execution plan. If the TVF contains multiple statements, the optimizer does not reference the underlying objects, and it treats the statements similar to a table variable, in that it cannot retrieve statistics and guesses the resulting row count of the TVF to be 1 (in SQL Server 2014 the new cardinality estimator increased the estimated row count to 100), which can cause huge performance issues.

The best (and only, IMO) time to use TVF would be creating an inline TVF which is normally a single statement, but can be a complex single statement incorporating CTE’s. A multi-statement TVF would only be useful and performant if it is always expected to return a few rows (or in 2014, right around 100 rows) as the optimizer will make this assumption at compile time.

One item to take note of with even better performing inline TVFs, the outer filter is applied after the TVF is executed. Ingest that statement for a minute. If you are calling the function on a join and filtering the original query significantly, this filter will be applied after you retrieve all the values from the TVF, which can obviously be a waste of valuable resources and time.

I believe it is best to think of inline TVF as a filtered view and, like all other things, test for optimizations and speed, remembering to use SET STATISTICS TIME ON and SET STATISTICS IO ON as execution plans often times treat multi-valued and scalar functions like black boxes and won’t show the execution counts on underlying objects, as well as give you very misleading information. Gail Shaw’s brief discussion goes into more detail: http://sqlinthewild.co.za/index.php/2009/04/29/functions-io-statistics-and-the-execution-plan/

The article below is a deeper dive, and the best article, I found on SQL functions. Jeremiah also provides excellent examples of turning non-performant functions into optimized in-line functions and cross apply statements. https://www.simple-talk.com/sql/t-sql-programming/sql-server-functions-the-basics/

Saturday, May 17, 2014

Back to Basics TSQL: Declaring multiple variables in one line versus individual lines.

I acknowledge that declaring multiple variables in one statement saves typing lines of code, and could save you a small (very small) amount of space in the database (if you add up thousands of stored procedures declaring variables in one statement versus on multiple lines).

I know I go against the grain for code style preferences compared to developers I've come across, as I prefer declaring variables on their own lines. The main reason is so I don't have to search through, what can be, a long list of variables to find something. Preference. The End. 

I did want to determine how SQL Server processes the differences, internally. If there was a valid performance boost to declaring these in one line, then I could be swayed to change my preference. 

I ran the following statements on my local version of SQL Sever 2014 (Developer) and watched SQL Profiler for details of "behind the scenes":

Extremely simple example:
Statement 1: Statement 1 Profiler Output:
Declaring multiple variables all on one line.

Statement 2: Statement 2 Profiler Output:
Declaring one variable per line.

So, at least from my 5 minutes of testing, it appears as if SQL Server deems both statements as one batch.

Which coding practice should you follow? Whichever one feels right to you, because as far as performance, SQL seems to not treat them differently.

Monday, March 3, 2014

SSAS - When processing cube(s), receive "Login failed for user 'DOMAIN\COMPUTERNAME$'"

Pretty simple, but writing this up because I'm diving back into SSAS for some performance testing. This information is for SQL Server 2012, but will be similar for previous versions.

You've created a new SQL Server database where your cubes will pull their data from, as well as a new SSAS database. When you create a datasource, the impersonation information tab is set to (default) "Use the service account". Normally I would create an network user to do this, but installing it locally and running tests, I went with the standard impersonation configuration.

Open your computer services to determine the name of the service account. I do this by right clicking on My Computer -> Manage, and expanding Services. For SQL Server 2012, look for this entry: 

SQL Server Analysis Services (if you have multiple versions installed, choose the version you are using, in my case 2012. I can see the Log On As is set to: NT Service\MSOLAP$SQL2012

Go to SSMS and expand the security tab, if you do not see the above login, create a new one and make sure they are mapped to the database where the data is being pulled into your cube: the one you created for the data source. Make sure the user has read access to the database by adding db_datareader role. Hit Ok to save.

Try to process your cube, you should be golden now.