Distributed Data Processing
Distributed Data Processing
Distributed database system technology is the union of what appear to be two diametrically opposed approaches to data processing: database system and computer network technologies. Database system have taken us from a paradigm of data processing in which each application defined and maintained its own data to one in which the data is defined and administered centrally. This new orientation results in data independence , whereby the application programs are immune to changes in the logical or physical organization of the data
One of the major motivations behind the use of database systems is the desire to integration the operation data of an enterprise and to provide centralized ,thus controlled access to that data. The technology of computer networks , on the other hand , promotes a mode of that work that goes against all centralization efforts . At first glance it might be difficult to understand how these two contrasting approaches can possibly be synthesized to produce a technology that is more powerful and more promising than either one alone . The key to this understanding is the realization that the most important objective of the database technology is integration , not
Centralization. It is important to realize that either one of these terms does not necessarily imply the other . It is possible to achieve integration with centralization and that is exactly what at distriduted database technology attempts to achieve .
In this chapter we define the fundamental concepts and set the framework for discussing distributed databases .We start by examining distributed system in general in order to clarify the role of database technology with distributed data processing , and then move to topics that are more directly related to DDBS.
The term distributed processing is probably the most abused term in computer science of the last couple of the year . It has been used to refer to such diverse system as multiprocessing systems, distributed data processing , and computer networks . This abuse has gone on such an extent that the term distributed processing has sometime been called a concept in search of a definition and a name .Here are some of the other term that have been synonymously with distributed processing distributed / multicomputers , satellite processing /satellite computers , backend processing , dedicated/special-purpose computers, time-shared systems and functionally modular system.
Obviously, some degree of distributed processing goes on in any computer system, ever on single-processor computers .Starting with the second-generation computers, the central processing. However ,it should be quite clear that what we would like to refer to as distributed processing , or distributed computing has northing to do with this form of distribution of function of function in a single-processor computer system.
A term that has caused so much confused is obviously quite difficult to define precisely. They have been numerous attempts to define what distributed process is , and most ever researcher has come up with a definition. The working definition we use for a distributed computing systems states that it is a number of autonomous processing elements that are interconnected by a computer network and that cooperate in performing their assigned tasks. The processing elements referred to in this definition is a computing device that can execute a program on its own .
One fundamental question that needs to be asked is : Distributed is one thing that might be distributed is that processing logic .In fact , the definition of a distributed computing computer system give above implicitly assumes that the processing logic or processing elements are distributed . Another possible distribution is according to function . Various functions of a computer system could be delegated to various pieces of hardware sites . Finally, control can be distributed . The control of execution of various task might be distributed instead of being performed by one computer systems . from the view of distributed instead of being system , these modes of distribution are all necessary and important .
Distributed computing system can be classified with respect to a number of criteria . Some of these criteria are as follows : degree of coupling , interconnection structure , interdependence of components ,and synchronization between components . Degree of coupling refer to a measure that determines closely the processing elements are connected together . This can be measured as the ratio of the amount of data exchanged to the amount of local processing performed in executing a task . If the communication is done a computer network ,there exits weak coupling among the processing elements. However if components are shared we talk about strong coupling . shared components can be both primary memory or secondary storage devices. As for the interconnection structure , one can talk about those case that have a point to point interconnection channel .The processing elements might depend on each other quite strongly in the execution of a task , or this interdependence might be as minimal as passing message at the beginning of execution and reporting results at the end . Synchronization between processing elements might be maintained by synchronous or by asynchronous means . Note that some of these criteria are not entirely independent like the processing elements to be strongly interdependent and possibly to work in a strongly coupled fashion.
The fundamental reason behind distributed processing is to be better able to solve the big and complicated problems by using a variation of the well-known divide-and -conquer .This approach has two fundamental advantages from an economics standpoint .First ,distributed computing provides an economical method of harnessing more computer power by employing multiple processing elements optimally .This require research in distributed processing as defined earlier as well as in parallel processing .The second economic reason is that by attacking these problem in discipline the cost of software development .Indeed it is well known that the cost of software has increasing in opposition to the cost trends of hardware.
Distributed database system should also be viewed with this frame work and treated as tools that could make distributed processing easier and more efficient .It is reasonable to draw an analogy between what distributed database might offer to the data processing world and what the data technology has already provided .There is no doubt that the development in the task of developing distributed software .
Distributed Database System
Distributed Database system is a collection of multiple ,logical interrelated database distributed over a computer networks. A distributed database management system is known as the software that permits the management of the DDMS and make the distributed transparent to thr user . The two important terms in these in these definition
are Logically interrelated and distributed over a computer networks .They help eliminate cases that eliminate certain that have sometimes been accepted to report a DDBS.
A DDBS is not a collection of files that can be individually stored at each node of a computer networks .To form a DDBS , files should not only be logically related but there should be structure among the files and access should be via a common interface. We should note that there has been much recent activity in providing DBMS functionality over semi-structured data that are stored in file on the Internet .
The physical distribution of data is not the most significant issue . The proponent of this view would therefore feel comfortable in labeling as a distributed data base two database that reside in the same computer system . However the physical distribution of data is very important .It creates problem that are not encountered when the database in the same computer . This brings us to another point is multiprocessor system as DDBSs . A multiprocessing system is generally considered to be a system where two or more processors share some from of memory either primary memory in which case the multiprocessor is called shared memory Or Tightly couple Or shared disk.
The shared-nothing architecture is one where each processor has its own primary and secondary memories as well as peripherals and communicates with other processors other processors over a very high speed interconnect. However there are differences between the interactions in multiprocessors architectures and the rather loose interaction that is common in distributed computing environments . The fundamental difference is the mode of operation . A multiprocessor system design is rather symmetrically , consisting of a number of identical processor and memory components and controlled by one or more copies of the same operating system, which is responsible for a strict control of the task assignment to each processor.
PROMISES OF DDBMSs
Advantages of DDBS have been cited in literature , ranging from sociological reasons for decentralization .All of these can be distilled to four fundamental which may also be viewed as promises of DDBS technology.
Fundamental relational DDBMS
A relational DBMS is a software component supporting the relational model and a relational language .A DBMS is a reentrant program shared by multiple alive entities, called truncations that run database program . when running on a general purpose computer ,a DBMS is interfaced with two other components: the communication subsystem and the operation system . The communication with applications such as the terminal monitor needs to communicate with the DBMS
DISTRIBUTED DBMS ARCHITECTURE
The architecture is defined as the structure , this means that the components of the system are identified as the function of components is specified and the interrelationships and interactions among these components are defined .This general framework also hold true for computer system in general and software systems in particular. The specified of the architecture of the software system requires identification of the various modules with their interface and interrelationship , in term of the data and control flow through the system From a software engineering perspective the task of developed individual modules is called programming in the small where the task of integrating them into a complete system is referred to as programming -in-the-large.
There are three type of distributed architecture in DDBMS client/server system , peer-to-peer distributed DDMS and multidatabase system .There are idealized views of a DBMS in that many of the commercially available systems may deviates. A reference architecture is commonly created by standards developed since it clearly defines the interfaces that need to be standardized .
DBMS STANDARDIZATION
The standardization efforts to DBMSs because of the close relational ship between the architecture of the system and the reference model of that system which is developed as a precursor to any standardization activity. For all practical purpose , the reference model can be through of as an idealized architectural model of the system .It is defined model as a conceptual framework whose purpose is to divide standardization work into manageable pieces and to show at a general level how these piece are related with each other. A reference model can be described as three difference approaches :
- Based on components . The components of the system are defined together with the interrelationships between components . Thus a DBMS consists of a number of components , each of which provides some functionality . Their orderly and well-defined interaction provides objectives is to design and implements the system under consideration . On the other hand to determine the functionality of a system by examining its components . The DBMS standard proposals prepared by the computer corporation of America for national bureau of standards
- Based on functions . The difference classes of user are identified and the functions that the systems will perform for each class are defined The system specification within this category typically specify a hierarchical structure for user classes This results in the hierarchical system architecture with well-defined interface between the functionalities of different layers . The advantage of the functional approach is the clarity with which the objectives of the system are specified . However , it gives very little insight into how these objectives will be attended or the level of complexity of the system.
- The different types of data are identified and an architecture frameworks is specified which defined the functional units that will realize or use data according to these different views . Since data is the central resource that a DBMS manages this approach is claimed to be the preferable of the data approach is that central important it associates with the data resource .This is significant from the DBMS viewpoint since the fundamental resource that a DBMS manages is data . On the other hand it is impossible to specify an architectural model fully unless the fundamental modules are also described . The ANSI/SPARC architecture discussed in the next section belongs in this category.
Even through three distinct approaches are identified , one should never lose sight of the interplay among them . As indicated in a report of the Database Architecture framework Task group of ANSI .All the three approaches need to be used together to defined an architectural model, with each point of view serving to focus our attention on different aspects of an architectural model.
A more important issue is the orthogonality of the foregoing classification schemes and the DBMS objective . Regardless of how we choose to view a DBMS these objective have to be addressed within each functional unit. In the remainder of this section we concentrate on a reference architecture that has generated considerable interest and is the basis of our reference model.
Order Now