Saturday, July 12, 2014

When should I use a function?

I was asked recently by a colleague: When should I use a function? I wanted to refresh my memory on the subject and give a high level, but informative, response. Here goes...

Functions can allow you to encapsulate code and reuse complex statements. UDFs (user-defined functions) in SQL Server consist of scalar (return a single value) and table valued functions. You can think of user defined scalar functions just like system built-in functions (SUBSTRING, GETDATE, CHARINDEX) where they will return one value and, when applied to multiple rows, SQL Server will execute the function once for every row in the result set.

Table valued functions can be similar, in functionality (see what I did there?), to a stored procedure. Rather than executing the stored procedure to get a result, you can select from them, join them to other tables, and generally use them anywhere you would use a table. Awesome, right?

Don’t go changing all your Stored Procs to TVFs just yet...

If the TVF contains a single statement (called inline) then the optimizer will treat it similar to a view, in that it will reference the underlying objects in the execution plan. If the TVF contains multiple statements, the optimizer does not reference the underlying objects, and it treats the statements similar to a table variable, in that it cannot retrieve statistics and guesses the resulting row count of the TVF to be 1 (in SQL Server 2014 the new cardinality estimator increased the estimated row count to 100), which can cause huge performance issues.

The best (and only, IMO) time to use TVF would be creating an inline TVF which is normally a single statement, but can be a complex single statement incorporating CTE’s. A multi-statement TVF would only be useful and performant if it is always expected to return a few rows (or in 2014, right around 100 rows) as the optimizer will make this assumption at compile time.

One item to take note of with even better performing inline TVFs, the outer filter is applied after the TVF is executed. Ingest that statement for a minute. If you are calling the function on a join and filtering the original query significantly, this filter will be applied after you retrieve all the values from the TVF, which can obviously be a waste of valuable resources and time.

I believe it is best to think of inline TVF as a filtered view and, like all other things, test for optimizations and speed, remembering to use SET STATISTICS TIME ON and SET STATISTICS IO ON as execution plans often times treat multi-valued and scalar functions like black boxes and won’t show the execution counts on underlying objects, as well as give you very misleading information. Gail Shaw’s brief discussion goes into more detail: http://sqlinthewild.co.za/index.php/2009/04/29/functions-io-statistics-and-the-execution-plan/

The article below is a deeper dive, and the best article, I found on SQL functions. Jeremiah also provides excellent examples of turning non-performant functions into optimized in-line functions and cross apply statements. https://www.simple-talk.com/sql/t-sql-programming/sql-server-functions-the-basics/