Introduction
In today business computing, data frequently resides on multiple sites inside an organization. This data might be managed by several Database Management Systems for multiple reasons such as scalability, performance, access and management. Thus, the information requirements for executing transactions and answering questions might not reside in a single site. Distributed Database Management Systems deal with distributed database as a single logical database, and the principles and techniques of Database Management Systems are still applicable to the distributed one; although the distributed one has special characteristics, and I'm going to illustrate the principles of Distributed Database Management Systems through this article.
Walking through Distributed Database Management Systems
We can find several advantages to distributed data such as:
- Data is located close to users fostering better communications.
- Users work with a subset of whole data increasing the performance of the solution due to faster delivery of data.
- Data can be processed in several locations increasing the performance of the solution.
- Workload balance between the underlying systems.
- Increase the availability because in a distributed environment, we don't have a single-point of failure.
We can also find drawbacks to distribute data such as:
- Management of a distributed environment is a complex task because there are several features of database systems such as transaction management, concurrency control, security, backup and recovery as well as query optimization and access path selection which can be all synchronized.
- Increase storage requirements because multiple copies of data can be found in different locations.
There are two basic ways to deal with distributed data:
- Distributed processing is accessing to one or more entities distributed in several database systems connected by a network using distributed transactions.
- Distributed database is the use of replication techniques to create a copy of the data in several sites connected by a network.
Transparency is a very important feature of Distributed Database Management Systems which means that the end-user feels like he's working with a centralized Database Management Systems by hiding all the complexities of a distributed database; that's transparent to the end-user. In a distributed environment, the end-user does not need to know about the location of the data and whether the data is replicated or fragmented as well as distributed transactions enable updating the data residing in several locations ensuring that the transaction is either completed or aborted by all the involved systems, thus maintaining the integrity of the data. Other characteristics of transparency in distributed database systems are the performance and the integration of heterogeneous DBMS under a common and global schema.
In this section, I will explain how to access entities which are distributed in multiple databases. The main approach is that users can see the database and external data sources as a single logical database. This approach is implemented using database link mechanisms which are most often created by database administrators responsible for managing the access to distributed data sources. Views, synonyms and procedures are useful in distributed database programs, because the can create the appearance of location transparency by hiding the global naming.
When users want to update data back to the distributed database system becomes a bit more complicated because failure of part of the distributed transaction (taking place across multiple database systems) could trigger a global integrity constraint violation and the underlying transaction must be either committed or roll backed as logical unit of work. Data integrity when change occurs across distributed database is ensured by using the two-phase commit protocol. The first phase of the two-phase commit protocol is the prepare phase, where an initializing system called the global coordinator or main database system polls each of the participant of the distributed transaction to determine if they are ready to commit or roll back. Preparation also involves recording information on transaction logs as well as placing distributed locks on modified tables to prevent reads of uncommitted data. In the second phase, if all the participants agree that the messages have properly been received, the changes are committed. If any of the nodes involved in the transaction cannot verify receipt of the changes (because of a system failure or a network failure), the transaction is roll backed to the original state.
In 1991, X/Open defined an open system standard interface through which transaction processing (TP) monitors could communicate with XA-compliant resource managers (the role of Oracle database systems and Microsoft SQL Server as well as other XA-compliant databases in the X/Open model). Then several TP monitors supporting XA have appeared such as BEA Tuxedo, IBM'S CICS, Encina and Microsoft Transaction Server (MTS).
Conclusion
In this article, I have illustrated the principles of Distributed Database Management Systems in order to help system architects and developers created their own distributed solution.