Monday, January 27, 2020

Partitioned View and Pesky Error: UNION ALL view is not updatable because a partitioning column was not found

This particular situation occurred with the following architecture: we have multiple tables with check constraints on a column (a column containing the year in this case) and a partitioned view which unions all the underlying tables, and is updatable. We've been running updates and inserts on the partitioned view ever since it was implemented.

I recently ran a VS TFS Build to the production server. The change was adding a new column to the underlying tables, but not altering the primary key or constraints on those table. Suddenly we were getting the below error on any attempt to insert or update on the view:

UNION ALL view is not updatable because a partitioning column was not found

After much research, testing and banging my head on my keyboard, I eventually found the error was due to the underlying check constraints being no longer trusted. Seems like some hiccup occurred during the publish/build.

The error message was technically correct: partitioning column was not found not because it didn't exist but because the check constraints were no longer trusted.

Borrowed this bit of code I found on Brent Ozar's website to find untrusted check constraints on my database:
SELECT '[' + + '].[' + + '].[' + + ']' AS keyname from sys.check_constraints i INNER JOIN sys.objects o ON i.parent_object_id = o.object_id INNER JOIN sys.schemas s ON o.schema_id = s.schema_id WHERE i.is_not_trusted = 1 AND i.is_not_for_replication = 0 AND i.is_disabled = 0;
And then issued this fix, on all untrusted check constraints, to check them.
Everything is back to normal now. Phew.

Friday, May 22, 2015

A Set Based FizzBuzz Answer using T-SQL

Just a simple solution to show a set-based answer to the FizzBuzz question.

If you are not familiar with the programming scenario of FizzBuzz, it is losely the following: Write a program that displays the numbers from 1 to 100. If the number is a multiple of three, print "Fizz" instead of the number. If the number is a multiple of five, print "Buzz". For numbers which are multiples of three AND five, print "FizzBuzz".

Most programmers immediately turn to a loop, use a counter and up the counter for each iteration, determine if the number is divisible outlined above, and move to the next number.

In my example, I'm using the master.dbo.spt_values table which holds the range of values we need for this exercise. I normally create date and number tables for just such uses on database servers so as not to rely on system tables.

Aaron Bertrand has a good performance comparison with using numbers tables, date tables and common table expressions (CTE) usage. His Generate a Sequence Without Loops Part I and Part II are good articles to read if you want to dive further into this topic. His Part III of the series dives into comparing the use of number and date tables, as well as using CTEs.

Tuesday, April 28, 2015

Index all the things

Do what? Why?
No, don't do that. Step away from the keyboard.

A customer recently approached me with a simple request: Can we please add an index to a table? Seems their extraction query was taking quite a long time to run. 
This is one of those pivotal moments where you can be a good DBA and do what’s asked, or be a better DBA and dig into what they are asking to make sure they get what they need (and most times, they don’t really know what they need).

I asked the client for two things: the column/table they want the index created on, and the query they are using to extract the data. They returned this information quickly, and I dove in to see how I could help. Turns out, if I’d done what the client asked for they would be no better off than had I done nothing, and would have made inserts or updates to the table with the newly added index slower.
The query had a few joins, as well as some filters. The column they wanted to add the index on was a DATETIME data type and their query was something like this (shortened and sanitized for confidentiality):

My replay back to the customer was asking for a sample of the date parameter they were using as the filter. They returned back: '20150420'.

Warning: Side track

Ok, so now let's go down a quick date format side track. SQL isn't as picky about date formats as people think. Here are some examples of what the query engine will understand as dates. The image on the left is a simple query, written three ways, and the only difference is what I'm sending in as the date format. The image on the right is part of the three execution plans, all the plans are the same, and when I hover over any of the Clustered Index Scans where the filter is being applied, I don't see any conversions, only the predicate of finding the rows that match the filter.

Back to our original post topic, filters and indexing...

So let's see what happens to our execution plan if I add the index requested, and the customer continues to use their query with the conversion on the DATETIME column in the where clause.
Exactly what we'd expect, SQL Server is performing a table scan. (Yes, there is no clustered index on this table. If there were, we'd see a clustered index scan.) Ok, now is where I add the index on the DATETIME field and rerun the same query. Here's the new execution after I add the index (I could do the same experiment having a clustered index on the table, and we'd see the clustered index scan. Remember, the clustered index IS the table.):

Well that looks eerily familiar. So after adding the index it's still doing a table scan. Ok, so now let me run the customer's query minus the conversion (a system defined function of CONVERT) on the column in the where clause, so essentially I'm altering their WHERE clause from:

WHERE CONVERT(VARCHAR(8),a.TimePeriod,112) >= @DateParameter


WHERE a.TimePeriod = @DateParameter

We can do this because I've determined the data they are sending in as the parameter is interpreted by SQL Server as the correct data type. Keep in mind I have not altered anything else, the index existed before I ran the second query. Third query execution plan contains this for the specified filter:
That's what we want, we can see that the index is being used! 

Now if the customer truly needed some date manipulation (DATEADD, DATEDIFF) on the TimePeriod column, I would rack my brain for the algebra equivalent of a statement that would move the function to the parameter and off of the column on the table. This would take the burden of converting each value before determining if it qualified for the where clause. 

Here is a good answer on Stack Overview of a similar task, and excellent options of moving the function away from the column, can be found here.

This is an example of making a filter SARGable. SARGable is short for Search Argumentable. A filter (or search) is argumentable if a relational database can use an index to find the results. When you put a function (user or system) on a column, it renders your index useless because the index value is the original value, not a converted value.

If you still aren't getting it, I like to use this analogy:
LEN(LastName) >= 12 ?

It’s like a phone book where it's ordered by last name, first name. Now someone wants you to search for all the people with the length of their last name greater than 12 characters. Yes, you have them ordered by last name, but you have to read every single last name to determine if it’s at least 12 characters long. So ordering the phone book by last name, first name didn't help you at all and, in terms of SQL Server, it knows the index won’t help and doesn't even use it, it just scans every single row (or every single row of the clustered index) and converts every single value to determine if it meets the filter criteria. 

There are plenty of posts regarding this topic, I like the way Paul Randal explains it.

The suggestion was given to the customer to re-write the query and drop the convert function. In this case the performance outcome was a 90% decrease in query time.

Tuesday, September 16, 2014

Dynamically Updating Statistics on all Databases, With Sweet Potato Fries.

Sorry, I lied about the sweet potato fries. But this script will allow you to update statistics with options. Still interested? 

This is one of the steps of my customized maintenance plan to update statistics on all databases on a server. 

Overview of statistics in SQL Server, sample sizes and auto update information can be found on many sites. I like Grant Fritchey's article, SQL Server Statistics Questions We Were Too Shy to Ask.

Also the algorithm for determining the sampling rate to use for auto update statistics can be found here: SQL Server Statistics: Explained

Auto Update stats Algorithm: 
So the Auto Update stats will fire for every 500 + 20% change in table rows. Of course, we have an improved algorithm in SQL 2012 which is SQRT(1000 * Table rows) which is much better.

When it fires it will use the  default sampling rate and here is the algorithm how it calculates the sampling rate.
1)      If the table < 8MB then it updates the statistics with a fullscan.
2)      If the table > 8MB, it follows an algorithm. It reduces the sampling rate as the number of rows in the table are increased to make sure we are not scanning too much data. This is not a fixed value but is under the control of optimizer. It is not a linear algorithm either.
Example: if we have 1,000,000 rows it would use a sampling rate of 30% but when the number of rows increase to 8,000,000 it would reduce the sampling rate to 10%. These sampling rates are not under the DBAs control but optimizer decides it.
Customize these aspects, to fit your needs:
  • I have time in my maintenance window to perform FULLSCAN instead of the standard sample, but you can change this to a sample percent size, or drop the "WITH FULLSCAN" in the script to use SQL Server standard sample size.
  • I have defined a statistic as being "out of date" if RowModCtr / RowCnt is greater or equal to 2, so 2% of the row count changes, according to the "guesstimate" of RowModCtr. I say guesstimate because RowModCtr is not a true count of changes to the row, if you are using SQL 2005 or later (Books online).
  • I also update statistics, regardless of how many changes to the table have occurred, for stats older than 30 days. You can change that time frame.
Disclaimer: this script is based on one found on DBA Stack Exchange as it most closely fit my needs.

Saturday, July 12, 2014

When should I use a function?

I was asked recently by a colleague: When should I use a function? I wanted to refresh my memory on the subject and give a high level, but informative, response. Here goes...

Functions can allow you to encapsulate code and reuse complex statements. UDFs (user-defined functions) in SQL Server consist of scalar (return a single value) and table valued functions. You can think of user defined scalar functions just like system built-in functions (SUBSTRING, GETDATE, CHARINDEX) where they will return one value and, when applied to multiple rows, SQL Server will execute the function once for every row in the result set.

Table valued functions can be similar, in functionality (see what I did there?), to a stored procedure. Rather than executing the stored procedure to get a result, you can select from them, join them to other tables, and generally use them anywhere you would use a table. Awesome, right?

Don’t go changing all your Stored Procs to TVFs just yet...

If the TVF contains a single statement (called inline) then the optimizer will treat it similar to a view, in that it will reference the underlying objects in the execution plan. If the TVF contains multiple statements, the optimizer does not reference the underlying objects, and it treats the statements similar to a table variable, in that it cannot retrieve statistics and guesses the resulting row count of the TVF to be 1 (in SQL Server 2014 the new cardinality estimator increased the estimated row count to 100), which can cause huge performance issues.

The best (and only, IMO) time to use TVF would be creating an inline TVF which is normally a single statement, but can be a complex single statement incorporating CTE’s. A multi-statement TVF would only be useful and performant if it is always expected to return a few rows (or in 2014, right around 100 rows) as the optimizer will make this assumption at compile time.

One item to take note of with even better performing inline TVFs, the outer filter is applied after the TVF is executed. Ingest that statement for a minute. If you are calling the function on a join and filtering the original query significantly, this filter will be applied after you retrieve all the values from the TVF, which can obviously be a waste of valuable resources and time.

I believe it is best to think of inline TVF as a filtered view and, like all other things, test for optimizations and speed, remembering to use SET STATISTICS TIME ON and SET STATISTICS IO ON as execution plans often times treat multi-valued and scalar functions like black boxes and won’t show the execution counts on underlying objects, as well as give you very misleading information. Gail Shaw’s brief discussion goes into more detail:

The article below is a deeper dive, and the best article, I found on SQL functions. Jeremiah also provides excellent examples of turning non-performant functions into optimized in-line functions and cross apply statements.

Saturday, May 17, 2014

Back to Basics TSQL: Declaring multiple variables in one line versus individual lines.

I acknowledge that declaring multiple variables in one statement saves typing lines of code, and could save you a small (very small) amount of space in the database (if you add up thousands of stored procedures declaring variables in one statement versus on multiple lines).

I know I go against the grain for code style preferences compared to developers I've come across, as I prefer declaring variables on their own lines. The main reason is so I don't have to search through, what can be, a long list of variables to find something. Preference. The End. 

I did want to determine how SQL Server processes the differences, internally. If there was a valid performance boost to declaring these in one line, then I could be swayed to change my preference. 

I ran the following statements on my local version of SQL Sever 2014 (Developer) and watched SQL Profiler for details of "behind the scenes":

Extremely simple example:
Statement 1: Statement 1 Profiler Output:
Declaring multiple variables all on one line.

Statement 2:
Statement 2 Profiler Output:
Declaring one variable per line.

So, at least from my 5 minutes of testing, it appears as if SQL Server deems both statements as one batch.

Which coding practice should you follow? Whichever one feels right to you, because as far as performance, SQL seems to not treat them differently.

Monday, March 3, 2014

SSAS - When processing cube(s), receive "Login failed for user 'DOMAIN\COMPUTERNAME$'"

Pretty simple, but writing this up because I'm diving back into SSAS for some performance testing. This information is for SQL Server 2012, but will be similar for previous versions.

You've created a new SQL Server database where your cubes will pull their data from, as well as a new SSAS database. When you create a datasource, the impersonation information tab is set to (default) "Use the service account". Normally I would create an network user to do this, but installing it locally and running tests, I went with the standard impersonation configuration.

Open your computer services to determine the name of the service account. I do this by right clicking on My Computer -> Manage, and expanding Services. For SQL Server 2012, look for this entry: 

SQL Server Analysis Services (if you have multiple versions installed, choose the version you are using, in my case 2012. I can see the Log On As is set to: NT Service\MSOLAP$SQL2012

Go to SSMS and expand the security tab, if you do not see the above login, create a new one and make sure they are mapped to the database where the data is being pulled into your cube: the one you created for the data source. Make sure the user has read access to the database by adding db_datareader role. Hit Ok to save.

Try to process your cube, you should be golden now.

Thursday, December 19, 2013

Unofficial benchmark results comparing SSIS to Talend

As a follow up to my previous post, here is the output of my various tests comparing the extraction only processing rates between SSIS and talend.

Flat to Flat
I created a flat file (.csv) with 10M rows with the following fields (sizes to indicate approx size of expected data)
Id - Bigint
OtherId - Int
SomeOtherNumber - Int
RandomName - VarChar(250)
AnotherNumber - Int
BitField - true or false
EmailAddress - VarChar(250)

Three runs to verify consistency were done, results are below:

SSIS: Average duration 44.5 seconds
Talend: Average duration 1 minute 9 seconds

Large Flat File to SQL Server
Same flat file used above, but loaded into SQL Server 2012 located on my laptop. So Flat File -> SQL Server 2012, with no transformations just straight data columns to data fields. Again, three runs to verify consistency, results are as follows:

SSIS: Average duration 59.6 seconds
Talend: Average duration 6 minutes 56 seconds
(nope, that's not a typo, that's nearly 7 minutes)

Many Files to SQL Server
Next I attempted ingesting from a folder containing 3 subfolders containing 79 files, which contained a total of 2,187,842 records. Three runs, results are below:

SSIS: Average duration 22.8 seconds
Talend: Average duration 42.2 seconds

This is a very unofficial benchmark testing in trying to determine if we should start focusing on Talend as an ETL tool over SSIS. As Talend boasts over 450+ connectors out of the box and is platform independent, it was worth a look. I did not expect for SSIS to outperform in all scenarios. I expected better integration and speed with MSFT products (Flat File to SQL Server), which is what SSIS was designed for and which performance meets my expectations. 

This is not an indication that Talend is or may be a poor solution, but in our environment, SSIS is the clear winner so far.
To be completely fair, and if I'd had time, I would have installed Talend on linux and run similar tests, but we are a mostly MSFT shop so I went with what we have access to, locally.

Hardware/Software configurations:
Dell Precision M4700, 32GB Ram, Win7 64-bit
[Flat file storage, Talend and SSIS installed on laptop]
Virtual Server with 32GB Ram, Server 2012 64-bit
[SQL Server 2012]
Talend Open Studio - Data Integration v 5.4.1
I'm being pretty vague, as the title states: unofficial benchmark.

I attempted to ONLY have minimal processes running on both, etc. I also ran tests without connecting to SQL Server, i.e., local flat file to flat file ETL, small files, large files, etc., trying to vary the scenarios we may have at work. 

Friday, December 6, 2013

SSIS Package to extract data from Hortonworks to SQL Server

I am in the midst of comparing a few different architectural scenarios, one if which led me to test out the functionality of extracting data from Hortonworks (running on a local VirutalBox) using an SSIS Package and the Hive ODBC Driver 1.2. 

 To get you up to speed on high level architecture options I am considering, here is an overview:

The goal is to be able to ingest nearly any data source, join it with internal metadata, aggregate and expose it to our users via our application (web based) and/or export slices back into any format customers may require. To determine the best ETL solution for our needs, we are comparing talend and SQL Server SSIS. This was after some thorough research into other solutions as well, but for our particular needs, these two options seemed viable.

I like the ability of the talend Data Integration tool that comes with connectors to over 400 different data types, but to use it in an enterprise setting, with shared source control, we'd need to license it correctly which becomes costly as it's on a per-user subscription basis. The other alternative is to utilize tools that we are already paying for with our SQL Server licensing, Integration Services which I've been using since SQL Server 2005 (and before 2005 it was DTS).

If you haven't already played with Hortonworks (which can be done on Windows if you need, easiest way is to download their Sandbox on VirtualBox), I thoroughly encourage you to do so. Their tutorials are extremely easy to follow. Technical disclosure: all of these pieces I'm testing initially are running on my laptop on Windows 7 Pro (64-bit) with 32GB RAM. I set the VirtualBox to use 8GB RAM. The goal would be to initially test the functionality and then create a working prototype to benchmark and fully test to make a final decision.

One of the tutorials resulted in installing the Hortonworks ODBC 1.2 Driver (I installed the 32-bit for this test) to pull data into Excel. I then uploaded a test file into Hortonworks, HCatalog which consisted of a 16 column, approx. 48K row .csv file for my initial dataset.

I configured SSIS with the ODBC connection previously created, and an OLE DB Connection to a local SQL Server 2012 database installation. Because I installed the 32-bit ODBC driver, I needed to update the debug configuration and set the Run64BitRuntime to False:

SSIS Data Flow task successfully ran, fairly quickly considering all the objects are locally on the same machine which is far from an optimal solution:

Next I'll be attempting to do a similar test in talend. I'll post the results of that one soon. 

Sunday, July 7, 2013

Columnstore Indexes: the best thing since cake (and I love cake!)

Most times with software upgrades comes a few "oh, that's great I think I may be able to utilize that" going through my mind. When I read and dove deeper into SQL Server 2012's new feature, Columnstore Indexes, my heart actually started to race. It takes a true love of data warehouses to feel this way, but it's as if SQL Server answered some of my questions and frustrations over the past few years. These frustrations grew around dealing with TBs worth of data and how normal row indexes (normal custered and non-clustered indexes are actual row indexes) just were not optimized for VLDB optimized querying. 

From msdn online, here is a summary of the difference: 

"An xVelocity memory optimized columnstore index, groups and stores data for each column and then joins all the columns to complete the whole index. This differs from traditional indexes which group and store data for each row and then join all the rows to complete the whole index."

The biggest difference is that data is grouped and stored one column at a time. The benefits are: because only the columns needed are read, this results in less disk reads, better compression, improved buffer pool usage (reducing I/O), and utilizes batch processing which reduces CPU.

Wah? Yes! Wah? I'll say it again, YES! And yes, it works with table partitions. Of course, there are logical limitations to it, like the most important being the table must be Read Only. But, for normal data warehouse architecture, data is not transactional and is loaded at certain times, most normally once or maybe a few times a day. So dropping and recreating indexes are something most data warehouse engineers are familiar with, in detail. Or you can always implement partitions and then switch, as well as other options to get around the read only requirement. There are other limitations, which I encourage you to explore, as well as some gotchas.

A fantastic, and thorough walk through of this new, exciting, feature in SQL Server 2012 can be found here: What's new in SQL Server 2012: Using the new Columnstore Index, it's a YouTube video by Kevin S. Goff.

My hat's off to the SQL Server development team, warms my heart that you addressed the growing needs of us data warehouse minions.

Friday, June 21, 2013

SharePoint 2013 - allow others to edit a discussion (not just reply)

One of many SharePoint tips...

Some sites may want more collaboration, the ability to edit discussion topics, etc. I would highly recommend turning on version control for the list you modify in order to see who changed what. Here are the steps:

You need to be in the Site Owner group to make these changes. If you don't have the appropriate permissions, contact your SharePoint administrator to  assist.

Go to Site Contents, click on the discussion list name and in the ribbon click on List, then List Settings.

Once in List Settings, click on Versioning Settings and make sure Create a version each time you edit is set to Yes. I typically do not require content approval for submitted items, but this is an option. It would not allow anything to show until one member of the Approval Members group approves the content. 

Once you click "OK" on the Versioning Settings screen, go into Advanced Settings.

Under the Create and Edit access, select the "Create and edit all items" instead of the "...created by the user" like so, then be sure to click the OK button.

Next, we're going to alter the Site Members contribute rights. Click on the Gear in the upper right hand corner and select Site Settings, then under Users and Permissions, select Site permissions. Check the box next to the [Site Name] Members name, and the "Edit User Permissions" option on the ribbon will become available.

Click the Edit User Permissions, the default rights should be set to only Contribute. Check the boxes next to Edit and Approve like so:

Then click the "OK" button. Now have a site member test the ability to "Edit" a discussion but finding a discussion topic, clicking the ellipses and seeing the "Edit" ability like so:

Hope this helps. 

Tuesday, June 18, 2013

SharePoint 2013 - How to make the discussion edit window wider without altering the Master Pages.

One of many SharePoint tips...

When creating or editing a discussion, the text field for the body of the discussion is way too narrow. Can it be changed?

There are a few ways, the easiest is to implement JQuery in a web part on the page, that way no-one is mucking with the templates in SharePoint - which can be overwritten in patches, etc.

This following way would need to be done at each site or sub-site as the editform.aspx is unique (copied from the site collection level once a site is created). Which means implementing this at one site will not affect this form on any other site.

Note: You need design rights or site ownership rights in order to perform this modification.

Go to the site you would like to implement this on.
Go to a discussion, click the Ellipses (...) and then the Edit choice and you should see a layout similar to this where the "Body" textarea is fairly narrow.

You'll want to edit the page this form displays in, which is why I've highlighted the gear in the upper right hand corner, click on it and then select "Edit Page":

Now you will want to click the "Add a Web Part", we're going to be adding a Content Editor

After you click "Add a Web Part", chose Media and Content, then Content Editor and click "Add":

Now you will see your page again with your newly added Content Editor web part. Click the "Click here to add new content":

It will then place your cursor in the Content Editor, but what we want to do is edit the HTML, so select the Edit Source button as shown here:

In the HTML Source editing window, type the following code:

Your window should now look like this:

Click the "OK" button at the bottom of the HTML Source window. Then you are back on the edit page, click the "Page" on the ribbon and select "Stop Editing" as shown here:

It will redirect you to the discussion list, so go back into a discussion, click the Ellipses and Edit, and now your text areas should be wider like this:

Happy Customizing.

Friday, August 17, 2012

Checking VPN logs... CiscoSecure ACS v4.0

I was asked to create a "simple time clock" front end that integrates with our users HR data (downloaded nightly in a data pull I built from HRB). One of the potential pitfalls I pointed out to the person requesting the interface and data objects was that since most of our technical staff have VPN access turned on by default, they could potentially clock in or out from home.

So I started my search to determine if a user was on our network from their desk or via VPN. My network engineer pointed me to the interface showing the logs. I saw how simple the output was and believed it was just reading from a text file. Sure 'nuf, found the text file locally on the domain controllers. There are probably multiple ways to determine how someone is connected, but I couldn't come up with any off the top of my head after a brief pow-wow with my network engineer, so this is the direction I went.

I found the domain controllers that held the logs (.csv files) for passed authentications which was located in ProgramFiles\CiscoSecure ACS v4.0\Logs\Passed Authentications and the files were named: Passed Authentications Active.csv. Ahhh, data, data, data...

When a user logs into the simple time clock application, I check these log files for the user name and text "Remote Access (VPN)" to determine if it's been more than a certain amount of time, say an hour, to attempt to verify this web app is not being accessed by someone connected via VPN.

An over simplified schema of how I solved this problem:

If you need more details, let me know.

Tuesday, July 24, 2012

SSRS Prompting for login when deploying reports or data sources (SSRS 2008)

While migrating SSRS 2005 reports to a new SSRS 2008 server, I had to recreate the project in Visual Studio, add the data source and the .rdl files because my original project files became corrupted (long story).

I saved a few .rdl files to the new folder, created a new and updated .rds (shared data source) and attempted to deploy to the new server, to a subfolder from the main reportserver folder.

I was repeatedly prompted for my login. After some futzing around, I attempted to strip all the sub folder items out of the project properties and I found I could deploy to the main reportserver folder so I thought I'd take a screen shot of the correct configuration for future reference.

What I had been doing is adding after /reportserver/[name of sub folder], where in reality you need to add that to the "TargetReportFolder" and also if you want your data source to be under that subfolder, update the "TargetDataSourceFolder" to the subfolder/data sources url as shown here.

Hope it helps.

Thursday, March 15, 2012

SQL DBA Best Practices - Why separate tempdb?

I've had discussions with server/system architects attempting to justify not only separating the database and log files, but also the tempdb to separate disks. 

This is one of those that more applies to SQL Server 2008 and beyond.

Did you know that tempdb not only stores temporary tables, but SQL Server also utilizes it for grouping and sorting operations, cursors, the version store supporting snapshot isolation level, and overflow for table variables? 

Hopefully knowing these other objects that can be highly utilized even if you do not believe your databases utilize a large amount of temporary objects will arm you with ammunition to vote for more and different physical disks for tempdb.

Disk cost cannot dwarf the speed benefit you will have, especially in data warehouse situations where some BI solutions create very large temp tables to join on before returning results.

Thursday, November 10, 2011

TSQL - Refresh all orphaned users of a database after a restore

I've used this script for years, not sure where I pilfered it from but putting this here for safekeeping.

Thursday, July 7, 2011

Lovin' the OUTPUT clause...

Change is good, even if it appears big and scary at first…

Ever find a MUCH easier way of doing things and just fell in LOVE with it? It was that way for me once I discovered table valued functions in SQL Server. From that moment on I saw everything as a function “hill” to climb. My happiest moment: turning a query which took over 90 seconds down to 3 seconds because of this discovery and a subsequent re-architecture of all database calls.

Fast forward to… now… the OUTPUT clause. I know, I know, been around since SQL 2005 but until you wrap your head around it, create a new database or have time to re-architect something old, you can’t really get the implications until you incorporate it in a new database design.

One call to a database to update or delete can not only save your data, but OUTPUT the data you want to save in another table for, oh I don’t know, change management, historical, CYA implications.

Simple usage:

DELETE FROM [table] OUTPUT deleted.* INTO [tablearchive] FROM [table1] t1 JOIN [table] ON t1.ID=[table].table1_ID WHERE ID = 123

Double duty – deleting (or updating or inserting for that matter) and saving whatever details from that statement into another table. Saving yourself either an additional call from the front end, or at the very least, an additional SQL statement.

Just keeping it simple.