This article assumes you have suffcient hands-on experience in ADO.NET and a suffcient understanding of the ADO.NET architecture. If you don't then I suggest you go through the MSDN for those topics. In this article we would like to explain how to improve performance and the scalability of ADO.NET data access.
If you are an experienced software developer then you must know the importance of databases and a good data access mechanism to improve performance and the scalability of an application. Good data access mechanisms reduce thousands of headaches in production environments and improves response times too.
Remind basic concept
The following is a reminder of a few basic concepts very quickly to brush up on our ADO.NET skills. If you are currently working on the same topic in your organization then you may skip the following.
What is ADO.NET
ADO.NET is nothing but a set of classes along with the .NET class library to talk with various databases and one or more data sources.
ADO.NET relies on data provides to provide access to the underlying data source. Each data provider exposes a set of objects that we can be used to manage a connection, retrieve data and update data. The core objects are:
- Connection
- Reader
- Adapter
- Command
Now, depending on the underlying database, the object may differ. For example if we use SQLServer as our backend then to make a connection we need to use a of SqlConnection object whereas it is MySQLConnection and OracleConnection for MySQL and Oracle correspondingly. I hope you are familiar with the above ADO.NET components and have used them in your daily work.
Abstracting Data Access
ADO.NET is designed around a set of generic interfaces that abstract the underlying data processing functionality. We can directly use those interfaces to abstract our data access layer so that we can minimize the impact of changing the type of data source that we use in our application. This kind of data access is very helpful when a choice in databases is available to the customer.
The core interface provided by ADO.NET are found in the System.Data namespace as in the following:
- IDbConnection: This is an interface for managing database connection.
- IDbComand: This is an interface for running SQL command.
- IDbTransaction : This is an interface for managing transaction
- IDataReader: This is an interface for reading data returned by a command.
- IDataAdapter: This is an interface for channeling data to and from interface.
So, those are the interfaces we can use to implement a platform-independent data access operation. Now, let's explain a few causes of a slow-running database operation.
Why is my application slow?
There might be thousands of reasons for that. In this article we will narrow down those reasons and we will explain the issues only related to database operations. Those are the main issues that can adversely affect the performance and scalability of data access in applications.
Inefficient query
Sometimes we do not care about the proper condition to query a database. The result caused by an insufficient condition is unnecessary data access and slow response. For example, sometimes we write "SELECT * FROM TABLE" that returns all rows of the table. We can customize the query by specifying a column name with a condition. Even if you want to fetch all records, don't use '*' to do it. Specify the column names of the table one by one.
No Indexes
Never ever create any table without an index, if there is no index or key in a table then a table scan operation is done, so to improve search functionality we definitely should specify an index. Now, the question is, what column to use for an index? The answer is beyond the scope of this article. So, it's better to say to use the proper column and data type to specify an index.
Don't forget to enable connection pooling
As we know, a connection object is very specious and costly to create. And by nature, a specious and valuable thing we handle with care. The same is true for a connection object. We should not keep any open connection after our database operation. The golden rule for connection management is to open the connection as late as possible and close the connection as early as possible. In other words, don't open a connection for a long time. And we need to keep in mind that our connection object is re-useable. The concept if connection pooling is that, after finishing the operation, the connection object is stored in the pool. It is re-useable for other database operations.
Transaction misuse
If you select the wrong type of transaction management then you may add latency to each operation. Beside that, if you keep a transaction open for a long time then it might block some other transaction being performed and may cause a deadlock.
Don't forget to use proper visibility scope in order to avoid dirty read.
Overly normalized table
Normalization is good but over normalizing is bad. Avoid over normalizing a table to reduce join operations. As we know, a join query is a very time-consuming operation.
So those are the basic mistakes we can make during the development of an application. Now let's understand a few guidelines to implement the proper ADO.NET implementation.
Design consideration
To ensure that a data access application is optimized for performance, there are a few concepts we need to keep in mind during implementation.
Design your data access layer based on how the data is used
Yes, before choose any data access layer, first analyze the type of application. In the current market there are many third-party data access layers (like Entity Framework, NHibernate and many more) but before choosing them blindly let's determine whether those are realy necessary. Because those layers are nothing but an abstraction over a database and at the day's end everything converts to ADO.NET code.
The data access logic layer provides exposed data and functionality based on how the data is used and abstracts the underlying data store complexity.
Do not arbitrarily map objects to tables and columns, and avoid deep object hierarchies. For example, if we want to display a subset of data and if the design retrieves an entire object to be processed then we are fetching unnecessary data from the database.
Cache data to reduce unnecessary work
Caching data can reduce the load on the database server. By caching data, we can avoid the overhead of connecting to a database again and aging in order to process the same operation. The question is when to cache data? Those are a few scenarios where we can cache data in an application.
- If the data is used application-wide and shared by all users, for example a list of products then we can cache the data to improve performance.
- Try to cache data as near as possible to the user. The best form of data to cache is usually determined by the form that your clients require the data to be in. Try to reduce the number of times that you need to transfer data.
- Consider the mechanism that you will use to clear old items from the cache and what is the best expiration time for your application.
Use service account in order to performance
If security is the first concern then use Windows Authentication. If not then use a fixed service account to the database connection and avoid impersonation. The fixed service account is typically the process account of the application. By using a fixed service account and a consistent connection string, we can ensure that connection pooling is working efficiently and the same connection object is shareable across multiple clients.
Acquire Late, Release Early
Keep this golden rule for opening database connections and transactions. Open a database connection right when we need it. Close the database connection as soon as we are finished with it. Do not open them early or keep them open during some other operation. This is true for a lock also. Acquire locks late, and release them early.
Dispose after finish
If any class implements the IDisposiable interface then we can call the Dispose() method by it's object . Most of the database objects can dispose by calling the Dispose() method. If possible, always use them after finishing the operation. It is better to use a using block to do it automatically.
Use a Stored Procedure rather than embedded query
The ultimate goal for this point is to store an execution plan in the database server. If we use a Stored Procedure then SQL Server can cache the execution plan in memory whereas that is not possible if we use an embedded query. If it is not possible to use a Stored Procedure then an alternative is to write a parameterized query. The following are a few more advantages of Stored Procedures:
- Logic separation: We can separate all our database operation logic from the application using the Stored Procedure.
- Reduce network traffic: Using the Store procedure it's possible to reduce network traffic. Hence performance improves.
- Simplified batch of command: A Stored Procedure makes it easy to batch work and provide simpler maintenance.
- Avoid SQL Injection: This is the simplest method to avoid SQL Injection.
Conclusion
So, those are a few suggestions from my side for improving performance of an ADO.NET operation. Hope you have understood them. You can add your own suggestions by a few keystrokes in the comments section.