A Taxonomy Of Distributed Systems Information Technology Essay

The improvement technology nowadays and as the users of Internet grows extremendously, it has reached a point where the potential benefit of very large scale results distributed application more apparent than ever. Opportunities are emerging to develop large systems that cater to highly dynamic and mobile sets of participants, who desire to interact with each other and stores of online content in a robust manner. These opportunities will inevitably dictate a substantial body of research in the years to follow. Although applications intended to function at this scale have recently begun to appear, there remain a broad set of several issues that must be faced before this emerging class of distributed system can become a reality. One of the current issues in the distributes system is open problem based on taxonomy.

1. Introduction

Distributed systems research has historically avoided many hard problems through the carefully calculated use of operating constraints. Scalable resource clusters are assumed to be tucked away in protected facilities and connected by reliable infrastructure [1]. Large systems are assumed to have cooperating nuclei of administrative organizations that do not fail [2]. In peer environments, participants are assumed to behave fairly instead of leaching resources [3]. As the specifications of these systems grow to require operation at a massive scale with highly distributed administration, these assumptions will be strongly challenged as a means of providing useful systems. In short, distributed systems research is quickly approaching a point at which many hard problems cannot be avoided any longer. Prior to embarking on the construction of a large-scale distributed operating system, we felt that it would be useful to survey the landscape of problems that will be faced in the construction of this class of system. This paper is a summary of open problem based on taxonomy that must be addressed in order for successful systems of this caliber to be realized.

To describe the domain of existing and future distributed systems, we have to design a taxonomy. This model is a two-dimensional space whose axes define the concurrency and conflict of resource access, and the degree of distribution and mobility of resources within the system. From this model, we draw four phyla of application that is point-to-point, multiplexed, fragmented, and peer-to-peer. This last phylum defines our target domain and we apply lessons learned from the other three groups to it. Through our taxonomy, we describe a set of architectural systems problems that must be addressed.

2. A Taxonomy of Distributed Systems

We will describe four phyla of distributed systems in a continuous space along two axes. The axe which is access concurrency and resource distribution is a stem from an examination of the evolution of distributed applications. Access concurrency considers the number of simultaneous accesses to a resource and the degree of conflict between these accesses. Access concurrency problems occur as researchers began to move towards time sharing on mainframes. Resource distribution represents how broadly a system is spread across a network infrastructure. Individually, each of these axes represents a steadily increasing gradient of complexity within system architecture. It is in the cases where both axes have high degree that system complexity explodes. Indeed, distributed applications seem to all reside very close to the axes in our models. This observation suggests that there must be some limiting factors that exist, inhibiting the development of complex systems. We now consider the two axes and four phyla of systems individually.

2.1 Access Concurrency

Access concurrency originated with the desire to allow users to share the resources of original mainframe computers. Concurrency mechanisms allow clients to share a resource while preserving the state of that resource during simultaneous accesses. It is worth nothing that without a requirement to avoid conflict, concurrency mechanisms need only act as stateless request multiplexers. Although there are complexity issues in simple multiplexing at the Internet scale, it is conflict avoidance that makes access concurrency especially hard. In order to avoid conflicts between concurrent accesses, extra mechanisms must be put in place. These mechanisms add overhead and complexity to the system. Mechanisms to support access concurrency involve tradeoffs between efficiency and effectiveness. Concurrency control techniques that are very efficient is aim to allow the highest possible amount of simultaneous access, but may do so at the cost of poorly preserving resource state or unfairly scheduling this access. Techniques that are optimized for effectiveness protect resource state, but may do so by severely limiting concurrency of access. As an example, consider the locking of files to preserve consistency in concurrent systems. Pessimistic locking is most effective at preserving state, but results in a complete loss of concurrency whenever the file is locked for writing. Optimistic locking allows a higher degree of concurrency, but may perform worse in a high state of conflict as many transactions must be aborted. Conflicts may simply be flagged and left for a separate mechanism to resolve later. It is usually happen in the extreme case of efficient concurrency. This is how inconsistencies are addressed after a disconnection in distributed file systems such as Coda. Similar analogies for access concurrency exist with respect to other resources such as process scheduling and memory protection. In this emerging class of large distributed systems, the issue is that a high degree of concurrency within a system demands efficiency, while individual users will expect effective consistency preservation. Measures, such as conflict resolution, have not been well explored. It is a non-trivial problem to automatically resolve conflicts on information that does not have a high degree of structure, such as files and ad hoc databases (i.e. the Windows registry). Additionally, there exist a set of resources for which resolution may not be appropriate after the fact, and large scale active conflict avoidance is a necessity.

2.2 Resource Distribution

Resource distribution describes the degree to which a system has been spread across a network, and how dynamic resources are within it. Even the smallest degree of resource distribution mandates a substantial amount of overhead within a system. Consider the difference between accesses to a local file versus a remote file service such as NFS. Both cases contain all of the complexity involved in reading a file from disk; however the remote access has the additional responsibilities of locating the service, marshalling data in and out of message structures, interacting across the network, and handling a considerably larger set of potential error cases. Transparency, a hallmark goal of distributed systems only obfuscates this problem by concealing the details of distribution. Remote Procedure Calls (RPC) mechanisms, which were intended to simplify application development is forced distribution to be implemented deep within the system. This occurs directly in many of the problems traditionally associated with distributed systems such as fragility and inflexibility. The troubling aspect in this line of consideration is that these issues indicate a fundamental flaw at the very onset of approaches to distribution. RPC will only provide one degree of distribution, by passing a call to a single remote host. We have only just entered the arena of distributed systems, and already complexity is overbearing with RPC. A larger problem exists in their distribution by assuming that resources can be accessed in an expressive and reliable manner. In order to access resources, it must be possible to first locate them. Moreover, mechanisms must exist to find them in an ongoing manner if the resources are not static within a system. For instance, the location of a resource may have to be determined through a directory service and refreshed with each successive access. In very large scale or highly dynamic systems, a centralized service may not be sufficient to track resource location and other methods, such as forwarding pointers [4], may have to be employed. Distribution equates almost exactly to extra mechanism, and therefore complexity, within a system. The larger and more distributed a system becomes, the more mechanism will be required to locate, track, and access objects within it.

2.3 Four Phyla of Distributed Applications

From the two axes described above, we draw four phyla of distributed applications, shown in Figure 2.3. Note that the respective sizes of these domains are by no means equal, we represent this division as it is for simplicity. What follows is a very brief presentation of each of the four classes. In each case, we supply an example of the phylum to demonstrate its characteristics. We also try to identify weaknesses that exist within the domain that may not be acceptable within more advanced systems.

Figure 2.3 : Taxonomy of Distributed Application

2.3.1 Point-to-point

The point-to-point phylum represents a very simple set of applications in which a client connects to a resource for un-shared access. Point-to-point examples exist primarily as components of more complex applications, for instance the data channel of an FTP session is point-to-point, in that all of the associated resources are allocated at both ends of the connection at the beginning of a transfer. We would also consider simple RPC to be primarily a point-to-point application, provided that the RPC server handles a single request at a time. Point-to-point applications are characterized by the fact that the distribution aspects of the system are typically quite visible. For example when failure does occur it can be identified and resolved primitively by the user. If an FTP server does not respond or crashes during a transfer, the user can attempt a connection somewhere else. Regarding to this problem it is clearly shows that this is not a good system property; however it is generally tolerable within the domain of simple applications.

2.3.2 Multiplexed

Multiplexed applications are those in which resources are delivered with a high degree of concurrency, and possibly conflict control, over a relatively small scale of distribution. We can take file and web server as an excellent examples. It is because they provide a set of centralized resources to large number of concurrent users. Figure 2.3.2 show us the taxonomy of web server. Note that in our model, both file and web servers have a high degree of access concurrency, but are still barely distributed. This is because users typically need only connect to a single point to access resources. There are more distributed examples of multiplexed applications which are distributed striped file systems and scalable data structures [5]. In both of these cases, users may still connect to a single resource, but that resource may forward requests through an additional link to an appropriate secondary server. The risk of failure is more significant in multiplexed systems because failure has the potential to affect a much larger number of users on the resource provision side. A very large multiplexed service is often served by specialized hosting facilities where a very high degree of resource reliability may be assumed. It is commonly used to mitigate those problems. Further precautions may involve the installation of redundant resources that take over in the rare case of system failure.

Figure 2.3.2: Taxonomy of web server

2.3.3 Fragmented Resource

Fragmented systems are those in which resources are spread across, or move within, a set of connected endpoints. Communication is substantially more complex in these systems as messages may not travel directly to a resource, but instead may lead to a cascade of interactions across the system. Existing fragmented systems, such as the domain name service (DNS), are frequently structured as a hierarchy of coupled administrative domains. Note that there do not have many examples of highly fragmented systems. Considered as a whole, the global DNS database is fragmented across a considerable number of hosts. However this is doubtlessly orders of magnitude smaller than the scope desired by advocates of universal Internet-scale directory services, such as LDAP, which have yet to see broad acceptance within the network. The distribution of administration presents a difficulty within the ongoing provision of fragmented systems. In a centralized resource, a single administrative body is capable of quickly affecting changes across the scope of a system. In a fragmented resource, issues arise in how changes should be applied and who is allowed to do them. In the case of DNS, updates must frequently be submitted to human administrators, who authenticate and apply changes by hand. In existing systems this is an acceptable property, DNS lookups are handled with an acceptable degree of expedience, and the frequency of change is small enough typically to be handled off-line. This is not, however, an approach that provides a high degree of scalability.

2.3.4 Peer to peer

The client-server model assumes that certain machines are better suited for providing certain services. For instance, a file server may be a system with a large amount of disk space and backup facilities. A peer-to-peer model (Figure 2.3.4) assumes that each machine has somewhat equivalent capabilities, that no machine is dedicated to serving others. An example of this is a collection of PCs in a small office or home. Networking allows people to access each otherâ€™s files and send email but no machine is relegated to a specific set of services.

Peer to peer applications are highly distributed and involve a high degree of potentially conflicting, concurrent access to resources. This is a fairly hypothetical description, as very few such applications currently exist at the Internet scale. Peer-based file sharing applications, such as Gnutella [6], are initial steps within this domain but only begin to enter the phylum. Gnutella does not need to address any conflict issues, nor has it proven able to scale. In this class of application, the acceptable weaknesses within the other phyla compound and cannot be avoided. Failure has a high potential impact, but resources cannot be protected. Administration is distributed and the coupling between administrative domains may become much more dynamic. We discuss these issues more extensively in the next section.

Figure 2.3.4

3. Conclusion

The purpose of this paper has been to identify open problem based on taxonomy that necessarily must be addressed in order to develop advanced, Internet-scale distributed systems. Regarding to the explanations above, we can highlight that the open problem in taxonomy that contains in access concurrency and resource distribution. We also must consider the four phyla of distributed system; point-to-point, multiplexed, fragmented resource and peer to peer to identify weaknesses that exist within the domain that may not be acceptable within more advanced systems.

Projects to develop environments for ubiquitous, invisible, and pervasive distributed applications have, and continue to be, very exciting research that will need to address many of these issues in order to realize their visions.

Order Now