I drafted a document and submitted it to a company in an attempt to help eliminate some very common timeout issues that plague their internal software processes. Since I don’t have anyone I know who is a SQL guru, I had to rely on what I already know. I’m posting it here in hopes that it will either help someone or I can get a SQL guru to tell me I’m a dolt. To give you an idea of the magnitude of the issue the database is 300GB and running on a single disk drive with millions of records coming in each day.
Some of you might read this and ask, “Why not suggest a SAN?” and that would be a great question. SANs are expensive and everything is hosted in house. A SAN would make most sense in this situation but due to cost it was a suggestion that was frowned upon. I chose the following recommendations as ‘stage 1′ tasks. I had planned to draft up stage 2 and 3 which moved into index optimizations and then query optimizations.
I chose to start by solving the hardware bottle necks because even if you spend an hour shaving 3 minutes off of a long running query, with the amount of data being housed it would only really be a band-aid and a cost that wouldn’t give a very good ROI. Starting as close to the metal as possible will bubble up performance increases to the upper layers automatically and would have a far wider touch that focusing on specific queries.
Goal
The goal of these recommendations is to significantly reduce or eliminate SQL timeouts by decreasing the execution times of queries and increasing overall performance of the database.
Database Configuration
The biggest bottleneck of any server will always be the physical disk drive(s). The recommendations in this section are based on scaling out instead of scaling up.
Problem – Single physical drive serving all IO requests
The databases are sharing a single physical drive for all SQL data files. This means that every IO read (table, index) and IO write (data, transaction logs) is served by the same drive causing longer queue times for all requests.
Solution: Moving transaction logs to dedicated physical drive
Moving transaction logs to a dedicated physical drive will allow SQL to send all transaction log IO reads and IO writes to the dedicated log drive while increasing the number of read requests that the data drive can serve. Both are done in parallel.
Result: Faster inserts and updates. Increased read throughput from data drive.
Solution: Moving tempdb to a dedicated physical drive
SQL uses the tempdb for many things including storing results of joins. By moving the tempdb to a dedicated physical drive, SQL can send both IO reads and IO writes to be handled by the tempdb drive thus freeing the data drive from serving these requests.
Result: Faster joins, improved performance while working with temp tables and increased read throughput from the data drive.
Solution: Moving non-clustered indexes to dedicated physical drive
Non clustered index pages are not the same as the data pages (as with clustered indexes) so each index seek requires 2 IO reads (one to find the index and the other to find the data row the index points to). By moving the non-clustered indexes to separate file groups on a dedicated physical drive, the required IO reads required by the non clustered indexes can be split between the data drive and the index drive.
Result: Increased performance for non clustered index lookups, reduced MDF file size.
Problem: Historical data
Historical data (data that is no longer relevant to the current time context) is important for statistical and performance reporting over wide time spans but requires storage. With data increasing every hour, the overhead of maintaining the historical data grows significantly and causes longer execution times for queries.
Solution: Archive historical data
By moving historical data out of the production environment, the overhead of storage, indexing and IO reads drops significantly and increases performance on clustered indexes.
- Separate server – Moving historical data to a dedicated, separate server will reduce the overhead of storage and some or most historical reporting and processing, thus reducing load on the production server.
- Separate table and file group – Keeping historical data in a separate table, isolated from the production tables will still require storage, but will no longer be a burden on production processing. Moving the historical data table into a file group and storing on a physical drive other than where production data is stored will improve performance of queries and reporting.
Result: Reduces number of rows in very large tables which will increase overall query performance, reduced IO workload, smaller index files, faster rebuilding of clustered indexes
Alternative Solution: Table partition
An alternative to moving historical data to separate storage would be to partition very large tables into different file groups and moving the file groups containing older data to a dedicated physical drive.
Result: Improved read performance against large tables
Microsoft MVP, MCTS, PostSharp MVP,
what about a RAID setup to improve write performance for the database?
You should never run a database server (at least a production server) without some type of redundancy. RAID 10 is the recommended RAID setup for running a SQL server. However, allowing SQL to parallelize the read and write operations will increase performance overall. A RAID array may contain multiple drives but it’s still controlled by a single I/O controller. Incorporating a second and dividing up the work should be the goal.
This scenario requires evaluating your needs and your hardware.
Hey Dustin,
I would advise you to find some decent profiling software for your server and start there. The cost would probably be far less than what you’re budgeting for the hardware upgrades. While you are correct that the hardware does seem to be limiting, you don’t really know that your server is bottle-necking (is that a verb? :) ) on I/O unless you have a profiler that can tell you.
In a few of my last jobs the SQL that was written was done by developers who really had no decent experience (or desire) with SQL. In one particular project I replaced the current sql with about 4 lines of optimized code and dropped an end-of-day batch routine from 4 hours to 5 minutes. This particular system ran the Commissary inventory for a single jail in Orange County. After my fix we rolled it out to all 5 facilities. With the 5 hour process they were running before they simply didn’t have the time to roll out the system for the entire department.
So, I would really recommend the profiler, and if that can pinpoint some pain areas then start with that. Otherwise you’re just shooting in the dark. You may aim well, but it’s still shooting. :)
Good luck!
Hi, very good article, where can we find your continuity articles (draft 2 & draft 3). I am waiting for those.
Please let me know the same.
Thanks.