A Distributed And Parallel Database Systems Information Technology Essay

In recent years, Distributed and Parallel database systems have become important tools for data intensive applications. The prominence of these databases are rapidly growing due to organizational and technical reasons. There are many problems in centralized architectures; distributed databases have become a solution to those complications. Parallel databases are designed to increase performance and availability. It enhances throughput, response time and flexibility. In this paper, I presented an overview of the distributed DBMS and parallel DBMS technologies, highlight the issues of each, and distinguish the similarities among them.

Database Management system (DBMS) is a software that is used for managing incoming data, organizing data and providing ways to retrieve data to users.

2) Architectural Issues

2.1 Distributed DBMS

A Distributed Database Management System permits a user to access and manipulate data from different databases that are distributed to several sites. In Distributed database system architecture sites are organized as specialized servers instead of general purpose computers. In distributing environment, we use different servers for specific purpose like application servers, database servers. For example, a bank implements database System on different computers as shown in figure[1].Computer systems are located at different branches, but network link enables communication between them. The difference between Database Management System and DDBMS is local dbms is allowed to access single site where as DDBMS is allowed to access several sites.

Figure[1]

Distributed DBMS should have atleast the following components.

Network software and hardware

Computer workstations

Communication media

Transaction processor

Data Manager

Client-Server architecture is the famous architecture, in which one server is accessed by more than one client. There are three possible architectures in distributed DBMS such as

multiple client/single server

multiple client/multiple server

peer to peer server

In multiple client/single server, a database is accessed by more than one client. But this may possibly lead to locks. In multiple client/multiple server, database is distributes across many servers. So, in order to process a user queries, servers should communicate each other according to the request by user. Peer to Peer is the advanced architecture in which requires each host can behave as client and server. But this is possible with advanced protocols for data management.

2.2 Parallel DBMS

Parallel DBMS improves performance through parallelizing various operations: loading data, indexing, query evaluation. Data may be distributed, but purely for performance reasons. In parallel database system, parallelization of operations is performed for enhancing the performance of the architecture. In real time, there are situations where centralized systems are not enough flexible to handle some applications like in fluid mechanics. The architectures related to Parallel DBMS [3] are

ââ‚¬” Shared memory: In this architecture,a common global memory is shared by all processors.Any processor has access to any memory modle.

ââ‚¬” Shared disk: All processors have private memory, but direct access to all disks.

ââ‚¬” Shared nothing: Each processor has exclusive access to its own main memory and disk unit. In this, each memory/disk owned by processor acts as server for data.

Distributed Databases: Issues and types

There are three key issues in distributed database design.

Data Allocation:-Four strategies used for data allocation are

Centralized:- This is local dbms where data is stored at single database and users are distributed across the network.

Partitioned:-In this, firstly database is divided into fragments and each site is allocated with a fragment.

Complete Replication:-Maintaining complete backup of database at each site.

Selective Replication:-It is the combination of centralized, partitioned and replication.

Fragmentation

A relation R is divided into fragments r1, r2, r3..rn

which contain sufficient information to reconstruct relation r. This helps in improving efficiency and security. Different types of fragmentation are

Horizontal Fragmentation:-This is defined using selection operator of relational algebra. Here each fragment is incorporated with subset of tuples of relation R.

Vertical Fragmentation:-This is defined using projection operation of relational algebra. Here each fragment is incorporated with subset of attributes of relation R.

The other rarely used fragmentation are mixed and derived fragmentation.

Replication

It helps system in maintaining multiple copies of data, stored in different sites, for faster retrieval and fault tolerance. The advantages of it are availability, parallelism and reduced data transfer.

There are two types of DDBMS, Homogenous DDBMS and Heterogeneous DDBMS. In Homogenous DDBMS, all sites use identical software and they are acquainted of each other and accede to help in processing user requests. In Heterogeneous DDBMS, one or more databases use different software and schema which may lead to problem while query and transaction processing. Two-phase commit is a transaction protocol used in DDBMS for reducing the complications arise with resource managers. The distributed transaction manager employs a coordinator to manage the individual resource managers with the help of this protocol.

DDBMS have transparency in distribution, transaction, failure, performance and heterogeneity. There is concurrency control in DDBMS to avoid deadlock transactions and data inconsistencies.

Parallel DBMS: Issues and types

A parallel DBMS can be defined as a DBMS implemented on a multiprocessor computer .It mainly uses two parallelisms, pipeline and partition parallelism. Pipeline parallelism comprises of many machines, each doing one step in a multi step process. Partition Parallelism is same as pipeline parallelism but applying the process to different pieces of data. Its main objectives are to improve performance, availability and reliability of data. It has ideal goals such as Linear Speed-Up and Linear Scale-Up.

Linear speed-up refers in figure [4] to linear increase in performance for a constant DB size and proportional increase of the system components. Linear scale-up in figure [5] refers to sustained performance for a linear increase of database size and proportional increase of the system components.

Fig[4] Fig[5]

Parallel DBMS technologies are Data placement, parallel data processing, parallel query optimization and transaction management. The different types of DDBMS parallelism are

Intra-operator parallelism:-In this parallelism, all machines work to compute given operation using scan, sort and join. This applies projection on tuples.

Inter-operator parallelism:-In this, each operator may run concurrently on different databases. This executes different operations in a single query.

Inter-query parallelism:-In this, different queries run on different sites in parallel.

Intra-query parallelism:-In this, single query is ran on different sites in parallel.

Each relation is divided into n sub relations, where n is a function of relation size and access frequency. It utilizes the concept of horizontal partitioning to disperse the tuples of each relation on to different disk drives. Three popular strategies are round robin, hash partitioning and range partitioning.

Round robin strategy spreads tuples of relation in round robin manner. It is simple but it suitable for exact match queries. Hash partitioning supports exact match queries but has small index. In this randomizing function is used for partitioning attributes of each tuple as shown in figure[6]. It gives great control over tuples in distributing among sites.

figure[6]

Range partitioning supports range queries but it uses large index. This also uses a hashing function to disperse the tuples of relation among sites.

Distributed and Parallel Query Processing

Main issues of Query processing in distributed databases are

Localization

Distributed query operators

Cost based optimization

The main steps involved in distributed query processing are decomposition, localization and optimization. In decomposition step, it generates query tree for given sql query. These relations are replaced by fragments in localization step. The process of reducing cost of a tree is done in optimization step.

Parallel query processing is combination of automatic translation of a query into an efficient execution plan and its parallel

execution. The execution plan must be optimal. It follows the following steps in parallel query processing translation, optimization, parallelization and execution. Query is translated into query tree and choosing different join algorithms to minimize the cost of execution. Transforming the query tree to a physical operator tree and load the plan to the processors. Finally running the concurrent transactions.

DDBMS vs Parallel DBMS

In DDBMS, components are geo-distributed where as in parallel DBMS components are tightly coupled. Low bandwidth links are associated with DDBMS whereas high bandwidth links in parallel. Autonomic sites in distributes where as non-autonomic sites in parallel. The purpose of DDBMS design is to share data and high availability where as the purpose of parallel DBMS is to enhance performance and availability. In distributed dbms, sites can perform local and global transactions. In parallel dbms, sites can perform only global transactions.

Conclusion

Thus, I provide main issues of distributed and parallel database technologies. There are some issues yet to be resolved such as network scaling problems, effective query processing in distributed and parallel databases and distributes transaction processing. Some of the topics that can be proceeded to research are multidatabase systems and distributed object-oriented databases.

Order Now