DBA Skills For Developers

Allen James
9y
44.2k
0
23

Article

Introduction

In many smaller companies or departments, there is no dedicated DBA. In these cases, it falls on *someone* in the office to come to grips with the blackbox that is the database server, keep it running, and dive in when the things go wrong. The objective of this article is to give non data-focused developers a quick roundup of useful information and knowledge that will enable them to both solve some routine issues that come up and also help prepare for some in advance. This is not intended to be a very in-depth article, rather it's a quick read over coffee break that should give the reader some useful starting points. I would be delighted if any more experienced DBAs out there would like to comment and help expand the list. This article focuses specifically on MS SQL Server but the concepts should be useful for most relational database systems. I have also attached some scripts as a download that I use to gather the information on a system that you may find useful.

Introduction

Core concerns

A lot of projects start small, and over time, the complexity increases. What was a good decision to start with in the original design of a database or system may now be sub-optimal. Rather than putting solid fixes in place, for the sake of speed of development and 'getting things out the door', it is not uncommon to code around issues without digging-deep to find the problem and solve it. The following few items will give the new developer DBA a head start in getting and keeping on top of things.

The three core concerns of the developer DBA should be System stability, Data Integrity/quality, and Speed. With this in mind, we will look at some of the options available to assist in these areas.

Getting to know the database

One of the key things you need to know about your database is its purpose and architecture. Hopefully, there is someone on the team who can give you a guided tour, if not (and regardless!), there are numerous tools, both, commercial and free, you can use (out of the box, I normally have both Visual Studio and MS SQL management studio in my toolkit at least). To start with, here are some ways to get a quick handle on things:

Confirm a good backup and restore system is in place
clearly, with no database, there is no DBA job to do! ... the most important first step I always take is ensure there is a reliable backup system in place and that a restore of the database has been properly tested.
Identify where the data is / which tables have the most data / how fast it churns
I normally run a series of scripts that give me information about the state of the database, and I use this to both inform myself on how things are and to indicate any areas that I need to investigate further or enquire about. Finding out which tables have the most data, which tables change data most often, which tables have potential index issues / where indexes take up more space than the actual data, etc.
Find out the relationships between the tables
Examining the links between data tables can give insight to the design of the database, the importance of certain tables, and dependencies.
Find out how data comes in / goes out (data acquisition/reporting).
Data can come in in various ways, identify those routes and look for potential points of failure, brittleness and anything that could put an undue strain on the system over time.
Investigate how/why data is transformed/changed from its original state.
Some systems take data in and simply drop it in place into a series of tables and models, others manipulate the data out of the way. Find out where these transformation areas are and what they do. Frequently they can hold clues down the line when things go wrong. One place that transforms and 'creative happenings' can take place is buried inside triggers. A trigger placed against the Insert of data in one table can result in the delete or update perhaps from another seemingly unrelated table except for some long forgotten business rule. If you are trying to track down what has caused a change, then the SQL-Search tool I mention later in the article will assist. If you are specifically hunting for triggers then check out the solutions here.
Start to document from day one.
If you have inherited an undocumented or sparsely documented system, then make it your job to document your findings as you go along. It will make life easier not only for you in the long run, but also anyone coming behind you.
Use inbuilt reports.
MS SQL Management studio comes with a number of ready-to-go system reports that will help you to both get to know what you are dealing with, and help you to monitor changes over time. Access the reports by selecting the database, right-clicking, and running the report you need.

Make it robust

It is critical when working with the data to write robust code. Some things you should be aware of are transactions, error trapping and logs. Once I have a handle on the structure and architecture of a database, my next stop is generally the stored procedures. Stored procedures as you most likely know are bundles of pre-written database scripts that can be called on demand. When writing code for the server in C# or the FrontEnd in JavaScript we are careful to code defensively, have appropriate error handling in place, and log appropriately when things go wrong. The database stored procedure is no different, and as we generally have less interaction with users than say a front-end of a website or the error dialog in a WinForms application, I would argue that we need to be extra careful in our SQL code to ensure we handle things robustly and gracefully. There are three things I look for in a stored procedure:

If appropriate, is the script enclosed with a transaction block? ... I generally use transaction blocks where the data process I am doing is an all or nothing situation - that is, it must all work perfectly, or else we need to roll-back (literally) to the state we were in before we ran the script.

This screenshot shows a selection of data, a change, and then a rollback with the original data unaffected.

More information about transactions here.
Where there is a danger of a failure of the script through data transformation errors, data mis-match or availability etc, then we need to know what happened. In these cases I use a Try/Catch block in a similar manner we would in a procedural language such as C#. The following example forces an error by attempting an insert of a primary key that already exists.
1. BEGIN TRANSACTION; BEGIN TRY -- Generate a constraint violation error. DELETE FROM Production.Product WHERE ProductID = 980; END TRY BEGIN CATCH SELECT ERROR_NUMBER() AS ErrorNumber ,ERROR_SEVERITY() AS ErrorSeverity ,ERROR_STATE() AS ErrorState ,ERROR_PROCEDURE() AS ErrorProcedure ,ERROR_LINE() AS ErrorLine ,ERROR_MESSAGE() AS ErrorMessage; -- log the error! IF @@TRANCOUNT > 0 ROLLBACK TRANSACTION; END CATCH; IF @@TRANCOUNT > 0 COMMIT TRANSACTION; GO
An example error output (for an insert problem) is as follows,

Severity: 16 LineNum: 2 ErrorMessage: Cannot insert explicit value for identity column in table 'Customers' when IDENTITY_INSERT is set to OFF.

More information about try/catch in tsql here. The catch is useful as it can tell you the line number of your script it has a problem with, deliver an error message, and also give you an indication of the level of severity of the problem. More information about tsql error severity codes here.
When things go wrong, we need to record them. You can use the inbuilt MS SQL error logging system (raise error with log), but that requries permissions you may not always have (eg: hosted servers), so mostly I have an error log table I insert into myself.

Finally, you can of course (and I mostly do), combine all of the above to give yourself a better chance of not getting a phonecall at 3am! (yes, I've been there...)

This example from msdn illustrates:
1. BEGIN TRANSACTION; BEGIN TRY -- Generate a constraint violation error. DELETE FROM Production.Product WHERE ProductID = 980; END TRY BEGIN CATCH SELECT ERROR_NUMBER() AS ErrorNumber ,ERROR_SEVERITY() AS ErrorSeverity ,ERROR_STATE() AS ErrorState ,ERROR_PROCEDURE() AS ErrorProcedure ,ERROR_LINE() AS ErrorLine ,ERROR_MESSAGE() AS ErrorMessage; -- log the error! IF @@TRANCOUNT > 0 ROLLBACK TRANSACTION; END CATCH; IF @@TRANCOUNT > 0 COMMIT TRANSACTION; GO
nb: as with most safety harnesses, there is a cost to using error trapping, transactions etc - in most cases it's fine, but always better to read up on it and know the implications before you implement.

Speed issues

There can be many things at play when you have speed issues with a database. Slow downs can be caused by large volumes of data, inefficient processing/scripts, bad or no indexing, and OS/hardware (both raw power, and system configuration). Before looking deeply, and after looking at general things like disk-space, cpu, etc., my first main port of call (in the database) is usually indexes - I am always happy when a carefully placed index gives a query a new lease of life. You do have to be very careful however - adding an index is not always the solution, but often seems to be the default answer ... beware of having too many indexes as these can cause performance issues - more on that in a later article. Pinal Dave has a lovely set of scripts on his website that let you locate both redundant indexes, and also alert you to possible missing indexes. Download Pinal's scripts from here. There is also another obscurity to be aware of, that can, depending on data types, cause issues with both speed and storage space. That is using nVarChar by default, where standard VarChar should be used instead. If you are not going to be storing extended character sets in your database, then there is usually no need to use nVarChar. At the start of a project it may seem like a good idea, but after it grows for a while you may regret it - think it through! There is a script in the SQL Scripts bundle attached to this article that helps convert all nVarchar to Varchar in a database for you.

Working with Indexes

Understand how to use indexes correctly
I think most readers will know what indexes are, but I am always surprised at how they are not used optimally to get the best result by most developers I meet. There is an excellent book I highly recommend by Marcus Winand called 'SQL Performance Explained' - here is one of his insights for example (take the test!):

Question: Is the following SQL good or bad practice from a performance perspective?
Searching for all rows of the year 2012:
1. CREATE INDEX tbl_idx ON tbl (date_column); SELECT text, date_column FROM tbl WHERE datepart(yyyy, date_column) = 2012;
Answer: it's bad practice.
Reason: Wrapping the table column in a function renders the index useless for this query.
Write queries for continuous periods as explicit range condition:
1. SELECT text, date_column FROM tbl WHERE date_column >= CAST('2012-01-01' AS DATE) AND date_column < CAST('2013-01-01' AS DATE)
Use SQL Management Studio index hints
The index hints functionality in MS SQL Management Studio is an extremly useful tool in helping identify missing indexes that may speed things up. I put together a sample database of 100k records with parent/child dummy data, with no indexes. When we run a simple join query between the two tables, if we select 'display execution plan', then Studio will display a hint where it thinks there are missing indexes:

Digging deeper

# When the problem is not with the database

Before getting tied up in knots with deep analysis I always check all the way along the pipeline from data query to data output (this could be the screen, to the user, or printing on a report). In web-apps for example, I find it is not uncommon to find that the data is moving along fine ... *until it gets to the server-side C# code*, and that's where the bottleneck is. Let's look at an example I have seen quite a few times.

Let's say we have a one to many relationship between a customer and their monthly sales. If we wanted to get a listing of customer sales for a month, we might do this in SQL:

Select C.CustomerName, S.TotalSaleValue from Customers C join Sales S on S.CustomerID = C.CustomerID where (S.SalesDate >= X) and (S.SalesDate <= Y)

Nice and clean (we could also use the 'between' keyword of course for the date criteria).
What I have seen however, is variations of this pattern in C#, Java, etc..

DatabaseEntities ent = new DatabaseEntities(); var Customers = ent.Customers.all(): foreach(var customer in Customers) { var Sales = ent.Sales.where(s => s.CustomerID == customer.CustomerID ....<etc> ) }

What this translates to on the backend database is one query to get the list of customers, and then for each customer, another query to get their individual sales list. Highly inefficient and of course slower. In this case the remedy is to use a proper join in C# that results in a single efficient query being generated and sent to the server.

The point here is that we should not always assume that the problem with a slow database is in the database itself - there may be reason to think it may be elsewhere - it is our job to eliminate these kinds of possibilities.

# When extracting the data simply takes too long,

Pre-stage/de-normalize
Sometimes, adding indexes and optimizing queries can only go so far. The problem is that the data may be very heavily normalized, and pulling it together from a multitude of locations packed heavily with deep data simply takes as long as it takes. In these cases, the best option may be to 'pre-stage' your data or de-normalize it. If your query for example supplies data that feeds a live pivot table, and this is currently fed by a series of intricate joins/views/stored procedures, then a good solution may be to create a stored procedure that gathers the data needed say overnight and puts it into a wide de-normalized table that can be read instantly in one sweep by the pivot table data provider. This type of concept is what underpins Data Warehouses and Data marts. This type of strategy can be used to great effect for summary type reports/dashboards like quarterly sales, archived metrics, etc. A final comment on pre-staging is that for speed, you may save some precious time not only by pre-staging the data, but also the format. By this I mean, if you are sending data over the net to a webpage or mobile app using JSON, then why not have the data stored ready in its JSON structure ready to go - is there really a need once the data has been collated to transform it on the fly on request? ... I have for example used MongoDB to great effect simply to pre-stage data in native JSON format for web-based reports, by feeding it data from SQL on a nightly basis or when discreet data changes.
Run don't wait...
There are certain types of stored procedures that take time to run and theres not a lot you can do about it. However, if you call these procedures from your .net client and *wait* .... you may experience a timeout exception. As far as the user is concerned, the system is broken. As far as the database is concerned, its still doing its job. In cases like this, it may be better to kick off the stored procedure and poll back to see if its done rather than wait around. This type of situation can occur in a system where when the script was designed, it ran quickly, but over time, with increased data, it takes longer and longer. Sure, you need to get to the bottom of that, but in the meantime, you can use the SqlCommand.BeginExecuteNonQuery()method. This kicks off the procedure in a background thread and your code can get on with doing something else and not scare the poor user!

Importing data

MS SQL Server includes some excellent inbuilt tools to enable you to import data, be that text files, database files, csv or XML. This topic can get quite broad so I have written a seperate article how to import data from csv and xml into MS SQL server.

Tools

DBCC
'Database console commands' (DBCC) is an incredibly useful tool to know about. You can use this to check integrity of data on a database or table level, reindex tables, and carry out an array of general maintenance tasks. Here are some highlights from a query window in Management Studio:
DBCC CheckDB: Checks the logical and physical integrity of a database
DBCC CheckTable ('TableName'): Checks integrity of the structure and pages of a table or view
DBCC ShrinkDatabase ('DatabaseName'): Compresses the physical size of the database if possible
Who!?
There are two inbuilt system stored procedures that can be very useful. These are SP_WHO and its newer sibling,SP_WHO2. These scripts give you quite detailed information on current SQL connections, the different applications that have been used to connect to the server (with those connections), the type of work the connection is carrying out, if there are any processes blocking execution of others, etc. When trouble starts, this is one of the first ports of call for me to get a snapshot of whats going on.

usage: exec sp_who2

The output gives very useful insight such as who/what is connected, what kind of resources they are consuming, current action/status, and, if the 'BlkBy' has an entry - this tells you if one process is being blocked by another.
SQL Server Profiler
This application is bundled with SQL Management Studio and allows you to put a live trace on your database. Filters in the application allow you to fine tune what you want to observe if neccessary. It is extremly useful for seeing what's actually being generated by EF/Linq C# code for example.
RedGate SQL Search
This free tool is something you may use on a daily basis - well worth a download, and no email address required!

Its functionality is simple - enter a search term, and it looks for that in your database, across tables, triggers, stored procedures, indexes, etc. You get an inline view of the object and can click to go straight to it in Object Explorer. #Invaluable #Free #ThanksRedGate!
Brent Ozar
The kind chappies at Brent Ozar have put together a set of free scripts and eBooks etc that are incredibly useful. Scripts include utilities for server/database health-check, index analysis, figuring out which queries are causing the biggest performance problems and more. The pack also includes a useful sanity checklist of things to run through when you have a server down situation. Get them here.
Server monitor
Usually it's a good thing to have a headstart if something goes wrong before you get a call from a customer. It's worth checking out (go bingle!) different tools to monitor your servers and send you an email/text message if something wrong is detected.

Wrapup

Getting to grips with a database and becoming a part-time data wrangler can be very rewarding once you have the correct toolkit and base knowledge in place. Hopefully the notes in this article help get Developers who have inherited the (mini) DBA job up and running. I have attached a few scripts to the article that you may find useful.