The Advantages And Disadvantages Of Data Replication

By having a file system replicated, if 1 of the replica crashes, the system is still able to continue working by switching it to another replica. Having multiple copies helps in protecting against corrupted data

Example: If there are three copies of a file data with each of them performed read and write operation. We can prevent failing of a single write operation having the value returned by the other two copies are correct.

Improve the performance by replicating the server as well as dividing the work. This can be achieved by increasing number of processes needed to access data managed by the server

Scaling in geographical area

Client at all sites can experience the improved availability of replicated data. When the local copy of the replicated data is unavailable, the clients will still be able to access the remote copy of the data

Disadvantage:

Leading to inconsistency of files containing data

When there are multiple copies and that one copy is being modified, the copy will be different from the other replicas. If the copy is being modified and is not propagated to other copies. It will make the other copies out-dated.

Example: replication to making improvement the access time of web pages. However, the users might not get the most updated webpages because the webpages that are returned might be a cached version of the pages previously fetched from the web server

Cost of increased bandwidth for maintaining replication

Replication of data in the files needs to be kept up to date, a network often has large number of message flowing through when the users interact with the file data having to modify or delete data. Thus, data replication will get expensive

Give at least two examples of a distributed system, and explain how scalability is addressed in those systems.

An online transaction processing system is scalable due to it can be upgraded by adding new processors, storage and devices to process more transactions. This can be upgraded easily and transparently without shutting the system down.

The distributed nature of DNS (Domain Name System) allows working efficiently even when every host in the worldwide Internet are served. Thus, it is said to scale well. DNS has the hierarchical design around administratively delegated namespaces and also the use of caching. This seek to reduce load on the root servers at the top of the namespace hierarchy as well as the successful caching limit client-perceived delays with the wide area network bandwidth usage.

Question 2

We mentioned in the lectures three different techniques for redirecting clients to servers: TCP handoff, DNS-based redirection, and HTTP-based redirection. What are the main advantages and disadvantages of each technique?

TCP handoff

Advantage:

The TCP handoff achieved total transparency from the clientâ€™s point of view as it operates on transport level streams. Therefore, the client will not be aware of them being redirected. When they send the requests to the service machine, they will not be able to know the intermediate gateway switch them between replicas.

Disadvantage:

The disadvantage of TCP handoff is that the client will not be offer more than one replica to choose from and the redirection mechanism remains in charge of what happen to the client requests.

TCP handoff is being treated as a redirection mechanism as it distinguishes service based on the combination of the target machine port number and address. Thus, if we want to replicate service, it is needed to make full copy on the each replica where in this way will lose the flexibility of partial replication.

DNS-based Redirection

Advantage:

DNS-based Redirection achieves transparency without the loss of scalability. It achieved transparency due to the clients are obliged to use the provided addresses by the DNS server. It cannot establish whether the addresses are from the home machine of the server or its replicas. DNS is very efficient as a distributed name resolution service.

DNS allows multiple replicas addresses to be returned and to enable the client to choose one of them.

Another advantage of DNS is its good maintainability.

Disadvantage:

DNS queries carry no information on the client triggering the name resolution. For the service-side DNS server, it knows the network address of the DNS server only that ask about the service location.

DNS cannot distinguish between the different services that are located on the same machine.

When a recursive query occurs, DNS server needs to create chain of queries that end at the server domain DNS server. This will only let the latter knows the address of the DNS server that is a step before the chain and not the origin of the created chain of queries. Thus, the service domain DNS server does not have information about the location of the client.

HTTP-based Redirection

Advantage:

HTTP-based redirection is easy to deploy. What is needed is the possibility of serving dynamic generated web pages. In addition to create the actual content, the generator can determine an optimal replica which rewrite internal references that point to the replica.

It is proven to be efficient even though it is always required to retrieve initial document from the main server. All the further works proceeded between client and selected replica; this is likely to give optimal performance to the client.

Disadvantage:

The disadvantage is that it lacked of transparency. Receiving a URL explicitly points to certain replica and that the browser will become aware of the switching between the different machines.

And for scalability, the necessity of making contact with is always the same, the single service machine can make it bottleneck as the number of clients increase which makes situation worse.

What is multicast communication? Explain an approach for achieving multicasting.

Multicast communication refers to the delivery of a data source transmitted from a source node to an arbitrary number of destination nodes.

Application-level multicast is an approach to achieving multicasting, the nodes are organise into overlay network and is use to disseminate information to the members. The nodes are organising into either a tree or a mesh where there will be a unique path between pair of nodes or every node will be having multiple neighbours which can mean that there are multiple paths between each pair.

Having nodes organise into a mesh will be more robust due to having the opportunity to disseminate information without immediate reorganise the whole overlay network.

Example: Multicast tree in chord

This is because when a multicast message is send by a node towards the root of the tree, it looks up the data that is along the tree it wants.

In the case of reliable FIFO-ordered multicasts, the communication layer is forced to deliver incoming messages from the same process in the same order as they have been sent. What are the permissible delivery orderings for the combination of FIFO and total-ordered multicasting in Figure. 8-15 (shown on the last page of this assignment)?

Question 3

Why is receiver-based message logging considered to be better than sender-based logging? Explain the reasons behind your answer.

The reason for this is that recovery is entirely local. In the sender-based logging, a recovering process has to contact the senders to retransmit their message.

Example:

When a receiving process crashes, most checkpoint state will be restored and replay the message that is been sent again. It combines checkpoint with message logging make it easier to have a state restore that lies beyond the recent checkpoint.

As for sender-based logging, it is difficult to find recovery line as the checkpoint will cause a domino effect meaning that there will be inconsistency checkpoint and cost of taking a checkpoint is high.

In conclusion, receiver-based message logging is better than sender-based logging

Does the Triple Modular Redundancy Model (TMR) capable of masking any type of failure? Explain your answer.

Triple Modular Redundancy Model is not capable of masking any type of failure. This is because TMR assumption on the voting circuit determines which replication it is in error having a 2 to 1 vote is observed. The voting circuit will output the result that is correct and discard the erroneous one. The TMR is able to mask the erroneous version successfully if it is assumed to be a failure presenting itself to the system.

Also, if there is 1 fault or more appearing at the same time in the particular system, TMR will not be able to mask. In addition, TMR is not able to mask successfully if the above assumptions are invalid. Thus, it is sometimes extended to QMR (Quad Modular Redundancy).

Example: if X1, X2 and X3 were to fail all at the same time, the voter will have a undefined output.

Compare the two-phase commit protocol with the three-phase commit protocol (chapter 8 in the book). Would it be possible to eliminate blocking in a two-phase commit when the participants were to elect a new coordinator? Explain your answer.

The blocking can never be completely eliminated. This is because after the election, the new coordinator might crash. Thus, the remaining participants will not reach a final decision because the election requires vote from the newly elected coordinator.

Question 4

Why do persistent connections generally improve performance compared to non-persistent connections? Explain reasons to why persistent connections are disabled on some Web servers (why would anyone want to disable persistent connections)?

The client is able to issue several requests without the need of waiting for the response to the 1st request. The server is also able to issue several requests without having to create spate connection for the communicate pair.

It is so because when using non-persistent connections. A separate TCP connection is establish to load every component of a Web document and when the web documents contains embedded content such as images or multi-media content, it will become inefficient.

Also it is because some of the web serversâ€™ middleware layer is weak and unable to manage clients that are sending several requests. These requests will only stack up in the middleware layer that will cause response to be slow due to only 1 connection for all the requests.

Explain the difference between static web content and dynamic content created by server-side CGI programs.

The difference between static web content and dynamic content is that:

Dynamic content is able to customize response and providing transparency to users. Users are unable to know if HTML document is generated on demand or it is physically stored in a location.

The value can be store in database and will be retrieved and generated on demand when user requested for the values using the CGI program.

Flexibility is provided in CGI program as it can run executable file from the server which allowed interactivity on the site. However, static web content is not able to do it.

Static web content, the users are aware that the data is stored as information presented would be the same. If multiple webpages were needed to be updated it will be quite tedious. A lot of time is consume due to each update requires retrieving of HTML documents to update. When create a new webpage, time is consume.

For static web content, overhead will not be generated as much as dynamic content as CGI program will take up time and memory to generate and produce output. Whereas for the static web content, it is displayed as how it is being retrieved.

Order Now