An Election Algorithm In Distributed Systems Information Technology Essay

In our increasingly globalised, distributed computer systems have been changed several aspect of our life, especially parallel computing; these days commercial application provide the strength of the development of faster computer, which allow a large amount data in sophisticated way, for instances, economic and financial modelling, database, multimedia technology, and oil exploration.

Parallel computing has the ability to solve larger problems, provide the attribute of concurrency, save time and money.

Distributed systems architecture leads to a wider use of parallelism as future of computing sector; this evolution made possible to run software on concurrent process on multiple processors. One of the main issues in designing concurrent software is election algorithm.

The issue of election algorithm is the very question the current paper seeks to answer. The importance of this issue leads to effectiveness enhancements by develop and design algorithm use a new technique. Therefore, the aims of this dissertation are design and develop this technique, also discuss the benefits of parallel computing.


I have been used the computer and the internet for several of years, my concentration is networks with emphasis in distributed systems.

Recently, the internet is vastly interconnected a variety number of computer networks in many different institutions, which make a huge motivation to be professional user; therefore, my MSc course at Oxford Brookes University is really brilliant opportunity for achieving this goal with deep information path for further research in software engineering and networks.

In particularly distributed systems are one of the significantly growing on demand in this sector, as they used to accelerate the computational speed of the problem solving. Thus, one of the most fundamental problems in distributed systems is the leader failure, furthermore, it is spectacular to improve and enhance my C language programming.

With all these changes in the area of computer industry, my goal to helps and assists for use the election algorithm in distributed systems with a new technique and software for solving the coordinator failure.

During study of distributed systems with connecting of parallel computing, this was bring a new interested challenge to go more further in development and contribute to gain wide rang of knowledge, all of these will have prepared me for a future career in computer software engineering and broadened my perspective and enriched my life beyond my career aspiration.

Preliminary Research (maybe)


The objectives of this dissertation is to research and develop the theme of an election in distributed systems which allow a group of processes to elect one process to act as a coordinator when a failure happen. However, as well as to achieve this goal PVM (Parallel Virtual Machine) and improve the understanding of this environment as it will used as an environment message passing for this project, this facility provides send and receive data message between processes and support concurrent programming, moreover it is easy and simple to spawn a set of processes. One of the most significant advantages of PVM is provide unified framework within parallel programs like cluster, where is possible for a programmer to divide a block of data into tasks that can run concurrently by send and receive data to all processes. This paper will focus on how to produce an election algorithm use above idea.

This environment will be written in C language, and there will be a main program (Master), in which the user can initialise and make the declaration for the main attribute of executed program; there are varieties of functions to send and receive all data types during the communication between programs. The second program called (Slave), this program will carryout the operation after receive the data from master program, then packet the result and send it back.

Finally, a part from the objectives will be self skills specifically improve project planning, programming as well as project writing skills.


The research methods which used to achieve these objectives will be as the following:

Research the Oxford Brookes library for books and documentation for the distributed systems and specially an election and agreement, the library has a lot of sources, the book Distributed Systems: Principle and Paradigms [7] has cover important information to achieve the goal of this project; also the book Distributed Systems: Concept and Design [1] the was a brilliant theory to cover the distributed systems. Another book was giving a complete list of function, examples, and explanation to be familiar with virtual environment [2]. On the other hand, research the Internet, which has a lot of useful materials and describing an election algorithm [3].

Study and evaluate some kinds of algorithms those help and develop the topology idea of the project to achieve my goal.

Produce a solution design to meet the requirement of the problem, as well as developed which needed for every task and discussed the findings with my supervisor. Moreover, PVM message passing communication design which gives a right direct path for problem solving.

Implement every task as a proposed design use PVM library written in C language. System run and testing will be done on Cluster of school of technology.

Make a review and keep an arranging to complete writing of the final report.

Testing (may)


This dissertation will be achieved by using a variety of resources of software and hardware, also some of information and documents sources that cover all the area of the research.


The environment of code will be writing in C language, that it support concurrent computer and UNIX, as well as it is simple language to understand. Moreover, it is also support the software PVM (Parallel Virtual Machine). PVM is mainly designed for using by concurrent and heterogeneous environment; it is possible for providing a virtual view. There are many of useful websites like (


The Cluster of Brookes School of Technology will be needed to implement the project, and all test and run steps will do on it. The cluster consists of 6 nodes with 1 main node ssh connected as processors and run PVM which installed in every node.

Information Sources and Experts Advice

There are several of documentation belongs to PVM which can be found in the book of A Users’ Guide and Tutorial for Networked Parallel Computing as mentioned earlier, and websites [4]. Further more, the lectures of modules software technologies that have a good reference of covering the topic of distributed systems in general. Besides all of this my supervisor/tutor Chris Cox who gives useful advice and information as he is specialise in distributed systems programming.


PERT Chart

Gantt Chart


The election algorithm code has been implemented according to the design decision made. It includes function to perform sending and receiving (communication channel) operations for group of functions to


I would like to take the opportunity to thank my supervisor Mr. Chris Cox who was always available to support when I missed up; I am deeply indebted for his constant help and advice. Since this push to more steps in my research, this reflects on the right direction to the target of finishing the dissertation.

I want to thank my friend Abobaker Almowfeq for his encouragement and advice.

Finally, I dedicated this thesis to my parents and thank them for the supporting throughout my MSc study in the UK. I am honoured to show my respects to them. They were my powerful source of energy.


Dissertation Plan



Introduction …………………………………………………………………………

The Importance of Concurrent Programming

The Need of Election in Distributed Systems

Structure of this Document

Research …………………………………………………………………………….

Election Algorithm Theory


Bully Algorithm

Ring Algorithm



Design Stage

The Core Algorithm

Comments on the Algorithm

Evolution of the Core Algorithm

The Software


C Language




Failure Simulation

Testing Function


Critical Appraisal of the Project


Area of Future Improvement

Work Schedule Review

Lessons Learnt

List of Abbreviation

DDMS (Distributed Database Management System)

MPI (Message Passing Interface)

MPP (Massively Parallel Processing)

PVM (Parallel Virtual Machine)

SMP (Symmetric Multi Processing)

List of Figures

Chapter 1


The Importance of Concurrent Programming

The past thirty years have seen increasingly rapid advances in the field of concurrent programming computing, a concurrent program is a set of sequential programs that can be executed in parallel [book of principles concurrent]. On the theme of distributed systems that enable the computers to coordinate their activities and share hardware, software, and data. The strategies characteristics of the systems which include shared state, multiple computers, and a processor that suited to a particular function to make the improvement for performance and utilization. As well as the concurrent computing have the availability of fault interconnections give the predominance over all the others. All of these offer several advantages: The cost of this computing can be low; each individual task assigned to appropriate architecture that lead to optimized performance. Network computing can be offer a computer task partitioning along of services functions. Another advantage, the computing networks have the capability to execute subtasks of a computation on tolerance. Consequently, one of the major problems that distributed systems face is the failure; this problem can be solved by election algorithm for the agreement to only one node that distinguished and all other nodes aware of it. Thus this paper describes the design and producing an algorithm to achieve communication of parallel programming in virtual environment.

Read also  The Digital Jewellery Of Wireless Communications Information Technology Essay

The Need of Election in Distributed System

Election process is a program distributed over all nodes. When the leader has failed and discovered by one or more nodes, then election starts. The leader election algorithms are used in many topologies. For example, in token ring and when the node that has token fails, a new node should be selected from the system to have the token. The leader election is also used to sole single point failure in client-server that is when the systems need to transfer the leadership to another station and the fail occurs.

Many researches have studied the subject of leader election algorithms. The researches presented different techniques and strategies to deal with the election algorithm.

The objective of this dissertation is to research the election algorithm in distributed systems and design and develop an election algorithm in distributed systems to choose and agreed with biggest identifier to be coordinator. The software systems PVM used as a library of design and development.

Structure of this Documents

This dissertation has structured and separated into three parts. The first part describes and discusses the research topic theory of the election algorithm in distributed systems, as provides some applications with description to show and investigate their use in the subject of election algorithm.

The second part describes the methodology that adapted to accomplish this dissertation, as well as outline design of the core algorithm and development description which shows the dissertation deliverables.

The final part outlines the conclusion with achievement of this project.

Chapter 2


This section shows the research and development of the related investigation; and moreover, the works that have been done relating election algorithm. It covers the theory of distributed systems and the failure problems; this part was particularly important in the progress of understanding an election and the agreement in distributed systems; and gives a clear idea for design the core algorithm.

The need for software technologies module lectures, specially distributed systems was deceive; there were brief description for some material and examples of programming code those were good point to start from for this dissertation.

Election Algorithm Theory (in Distributed Systems

As more and more, distributed systems have been rapidly changed the field of computer science. Distributed systems are application that executes a collection of autonomous network computers to coordinate and communicate their action.

However, the major goal of distributed systems including the ability to connect remote users with remote resources in open scalable way; in other words give an easy way and simple way for the users and application to access remote resources, and sharing the facilities included by efficient/controlled way.

Furthermore, the brilliant benefits of sharing resources are vary; the obvious one is economic, that is to say reduce the expenditure for institutions, such as Universities, and also the collaboration will be easier to connect users and resources. This characteristic is a direct as a result of having independent computer [7].


An election algorithm: Is an algorithm for solving the coordinator election problem which is choosing a unique process from among a group of processes on different processors to act as central coordinator in distributed system, that is to say a group of processes on different machines need to make agreement to choose a coordinator[7].

However, there is no way to choose/select one process, if all processes are exactly the same and there are no distinguishing characteristics. Accordingly, there requirement is for the choice of elected process to be unique; hence election algorithm attempt to locate process with highest process number and designate it as coordinator [7].

When a failure occurs, which means one or more node fails, or during a communication channel fails the subsystem that enables the nodes/processes to communicate; there is necessity for the nodes to start coordination agreement to elect node for new situation, therefore live nodes can continue working to fulfil their task [3].

Distributed systems have to be adaptable to failures at least by two strategies. The first strategy is a capability for operating continuously and correctly by have software when failures occur (using an algorithm). The other strategy is to take a period of time out to reorganize the system by temporarily halt operation (reorganization by coordinator) [3].

The election algorithms vary, for example:

Wired system which are for instances.

Bully Algorithm (the biggest guy in town wins)

Ring Algorithm

Wireless system

Very large-scale system

Bully Algorithm

It is an election algorithm, in the paper of Election in a Distributed Computing Systems by Garcia-Molina [9]; this algorithm can handle process crashes and the presumptions are all processes have got unique identity, all other processes in the network are known by every process and the system is synchronous, that is, there is a maximal time limit T within a request and if the requested process P is alive the request will be answered.

However, the algorithm defined as, when there is no responding to the requests from the coordinator, there will an initiating for an election. Therefore:

The process P send an election-message to all processors with higher numbers than itself, then P waits for answer-message.

The process P will itself elected (P win the election), if the answer-messages are not arrive with in the time T, after that sends a coordinator message to all processors a lower number.

Process P will wait a further time period T a coordinator message, if there is one or more answer-message

A process P receives an election message and returns the answer-message, then start the algorithm from the beginning if there is nothing have been done before.

A process P receives the coordinator message register the sender’s number and considers it elected.

The algorithm will be started , when faulty process restarts; if the process which was previously down comes back up, it holds an election and if happens to be the highest numbered process currently running, this will win the election and take over the job of coordinator. As a result of that the biggest process in all processes always wins; therefore the name is bully algorithm.

Here an example of the election of coordinator P2 after the failure of P4 [3].



Stage 1











Stage 2







Stage 4

Stage 3


Time out









Figure (Bully Algorithm)

In this operation, four processes are shown when process P1 detect the failure of the coordinator P4, therefore it is announces an election in the stage 1; during the receiving an election-message from P1, processes P2 and P3 send answer-message to P1 and start their own elect-ion, then in the stage 2, P3 sends an answer-message to P2, for P3 has no answer-message received from P4 which is failed process. Consequently, it decides that it is the coordinator; moreover, in stage 3 P3 fails too before it can send out the coordinator-message.

When the timeout period expires for P1, it deduces the absence of the coordinator-message and begins another election; eventually P2 is elected coordinator.

Ring Algorithm

An election which is suitable for a collection of processes based on the use of a ring; in the book Distributed Systems Concepts and design [1] chapter12 section 12.3 provides a good reference and explanation of the subject of coordination and agreement.

However, the messages are sent clockwise around the ring, any process can begin an election initially, every process is marked as nonparticipant in an election, it proceeds by making itself as a participant, so when any process notices that the coordinator is not functioning, it builds an election-message placing its identifier in an election message and sending it to its clockwise neighbour, when an election-message is received by a process, it compare the identifier in the message with its own, the message will be forwarded if the arrived identifier is greater than a received process; whereas, if the arrived identifier is smaller and the receiver is not a participant, then it substitutes its own identifier in the message and forwards it, but if it is already a participant, it does not forward the message.

Read also  A Business Value Of Service Transition Information Technology Essay

In addition, if the received identifier is that of the receiver itself, thus this identifier must be the greatest, and it becomes the coordinator. Accordingly, the coordinator makes itself as a non-participant once more and sends an elected message to its neighbours announcing its election and enclosing its identity. As well as, if the neighbour process which is received the election message is down, the sender will skip over the neighbour and goes to the next member along the ring, or the one after that until a running process is located.


Adds 3 (2 ,3)








Elected P3

Elect P3 message (2,3)


Elect-P2 message (2)


Elect-P3 message (2,3)

Stage 2

Stage 1

Coordinator = P3

Sends elected P3 (2,3)




Elected P3 message

P3 receives the message


Stage 3

Figure (Ring Algorithm)

Supposing P2 detects that coordinator P4 is not responding, thus P2 sets active-list to ( ) and sends elect-P2 message to P3, after P2 sets active-list to (2). However, P3 receives elect-P2 which is the first message has been seen, so P3 sets its active-list to(2,3); then P3 sends elect-P3 towards P4 and sends elect-P2 also respectively, the message will pass P4 and P1 due to their crash and reach P2, the process P2 adds 3 to active-list (2,3), then P2 forward elect-P3 to P3 and receives the elect-P2 message which leads to choose P3 from P2 as the highest processes in its-list (2,3) and sends an elected P3 message, finally P3 receives the elected-P3 and P3 choose P3 as the highest process in its-list.

This is an example of process which consists of four processes and the assumption is P1 and P4 are crashed.

Distributed Database systems

Distributed database technology required to merging of two significant concepts, the integration via the element of database and distribution through the element of networking. Therefore, distributed database management system (DDBMS) provides the powerful tools for managing an integrated collection of shared data, and supply total solution to information processing problems within large organization. Furthermore, the reliability of the data communication facility will be takes into its possibility. However, distributed database require one process to be a unique as coordinator to perform some activities. The [5] provides a reference for using election to choose the coordinator as continuing for duty is required when the coordinator process fails.

The election algorithm [6] of TEMPO that running on Berkeley gives another use of choosing a unique process to be coordinator. TEMPO is a distributed system program that adapted on master and slave techniques running on individual process. The reliable communication services for LAN which TEMPO works in and to be sure about its continuity, an election is necessary to elect a new master and should have the ability to perform: withstand the failure of process when the election begin, deal with network partitions, and collect information about the topology of the system by allowing a time daemon.

Chapter 3



This chapter describes the design and implementation of the election algorithm. The main point of the design is the aspect which will be used and adapted in order to finish this project; therefore various assumptions and decisions are made to develop main algorithm. In the dissertation proposal report there was assumption idea about the algorithms to be used, but when some design concepts have reviewed and researched the algorithm that based on Bully Algorithm was adapted.

All following section covers technical and practical explanation, as well as the process of the design which are made. Finally, testing process to shows the algorithm performance

Design Stage

The Problem Specification Design

After finishing the necessary requirement and research, there were many possible idea gathered in order to produce the design, formulate a clear manner of main election algorithm, and how to achieve the efficient solution to the problem.

The design of the communication must take into account the topology of the network, which will be (cluster); the algorithms presented here assume a fully connected topology. Furthermore, the assumption that communications are error free is an abstraction and the assumption of finite but arbitrary transit times for messages is consistent; the algorithm do not to be sensitive to change in the relative speeds of the channel and processors at the nodes, so the correctness will never depend on absolute times.

For each node there will be a unique identification number, the message passing model is consistent with that provided with PVM (Parallel Virtual Machine). There are two statements for communication:

Sending and receiving messages

Send (MessageType, Destination [, Parameters])

Receive (MessageType [, Parameters])

For example below node 1 sends a message of type request to node 2 with two integer parameters values 13 and 27.

integer k= 13

send (request, 2, k, 27)

integer m, n

receive (request, m, n)

node 1 node 2

(Node communication)

When the message is received at node 2, these values are transferred to the variable m and n declared at the node.

The core Algorithm Architecture

Discuss the algorithm process

Send side:

Receive side:

The design of the algorithm will using fully connected, and it is extremely efficient in that any node can send a message directly to any other nodes, but it is extremely expensive because it needs more communication channel. This design allows process to crush during the election where there are three types of messages; an election message is sent to announce an election; an answer message is sent in response to an election message and coordinator message is sent to announce the identity of the elected process.

The drawing below shows four nodes which containing the data structures, that is A node identity and the number chosen by each nod


elected message{}

node 1

elected message {1}






node 4

node 2

elected message{1,2}

elected message{1,2,3}


node 3


Figure (Fully Communication Architecture)

The assumption of the election start from node 1 and the message is send to all other nodes ; and every process marked as nonparticipant in an election, when node 1 initiates/builds the election message placing with it its identifier which is 13 in active list {1} and sends it to all nodes neighbour, when as election message is received by all nodes, they compare the identifier in the message (13) with their own, consequently node 2 sends reply message to all nodes, because other nodes have lower identifier than it does. On another hand, node 4 does not send any reply message, instead adding all other nodes to its active list; also node 3 sends reply to both node 1 and node 4 but not for node 2; node 1 will reply message only to node 4 due to that the identifier number of node 1 is bigger than node 4.

When node 2 sends reply message with its identifier which is the highest one to all nodes, its regarded as coordinator election message, and it will wait a period of time, then if it does not receive any reply from the nodes, thus it sends a message to all other nodes declare itself as coordinator.

Node status is normal except while the node is in the process of joining a new group. Each node that is not the leader of a group call to checks whether the leader of its group is still alive, by sending a message to the leader and waiting for a reply. If the node does not receive a reply within the timeout period, the node invokes a recovery procedure. Each leader i call a check procedure, which sends message to every other node asking whether that node is a leader.

If one or more other nodes reply that it is a leader, node i pauses for a time inversely proportional to its priority (this helps prevent multiple nodes from initiating elections concurrently) and then calls a merge procedure. The merge procedure sends message to all of other leaders, inviting them to join a new group with the inviting node as leader. When the leader i receives an invitation, it forwards the invitation to the other members of its group.

Read also  The Instruction Set Architectures Of Arm Information Technology Essay

A node i that receives an invitation (directly or indirectly), sends an accept message to the proposed leader of the group. If node I receives a reply to its accept message within some time-out period, then node i joins the new group, otherwise, node i calls the recovery procedure.

If two nodes are in the same group, then they have the same leader at all times, for all operational node i and j, if status i = normal and status j = normal.

Reliable Broadcast

To ensure essentially that all correct processes deliver the same message, and that messages broadcast by correct processes are delivered. Furthermore, it ensures that no different messages with the same identifier are delivered.

The consideration that, the reliable broadcast is executed by function broadcast (message) as the following:

Reliable broadcast1 Validity: If a correct process broadcast message M, then some correct processes eventually deliver M.

Reliable broadcast2 Agreement: If a correct process delivers a message M, then some correct processes eventually deliver M.

Reliable broadcast3 Integrity: For any identifier ID, every correct process p delivers at most one message M with identifier ID, and if sender (M) is correct then M was previously broadcast by sender (M).

The election message will use the flag, so the flag is used by application to inform which node is alive or dead, and below there is some assumption for that

Comments on the core Algorithm

First of all, the design in these algorithms have been influenced by

Because the sender

It long words here Amer

Evolution on the core Algorithm



When the choice of the design has made, and finishing the messages passing communication mechanism for the election algorithm. Furthermore, after final manner for the communication topology has chosen and proved , some technical and technology materials are required to reach the goal of this dissertation, like which kind of library and hardware to be used to implement, run and test the algorithm, also which language to be easy for the programming. Thus, the following sections discuss these issues.

The C Language

The simplicity of C language to learn, understand, and the advantage of its widely spread led me to make the choice to be adapted it as a programming language for the code of this project. Moreover, it is origin as the language of UNIX operating systems. This also gives more advantage as PVM library written in C, so it specifies a standard library with an extensive set of function that being powerful and efficient language. It is support low level of both applications the distributed and the network one.

The Cluster Computers

A computer cluster is a set of connected or linked computers that working close to each other through high speed local area networks to form a single computer. A cluster at least has two computer that called nodes, one master which typically has a job scheduler that arrange the work to slave nodes to be carried out. Besides, the cluster’s nodes can communicate via a message passing interface commonly PVM or MPI that are installed to allow programs to be run across all nodes, the website of scalable computing laboratory [9] provides a good source for a cluster computers.

However, the implementation of shared computing resources is used broadly in computer networks, the benefits of using this aspect is to improve performance and provide availability of services when system components fail. For example in case of hardware maintenance, all cluster nodes stay offer a decent performance and availability.

There are many types of computers from single processor to machine with vector processors and parallel computer with thousands of microprocessors.

Here are some examples for these types:

Parallel vector processors, shared memory multiprocessors, distributed memory MPP’s, cluster computers, networks and internet computing, met-computing, and distributed memory systems with SMP.


Parallel Virtual Machine (PVM) is a software system permits a collection of heterogeneous computer either UNIX or Windows to be used as a coherent and flexible concurrent computational resource. PVM consists of a library for a message passing, runtime environment, task and resource management. It has set of functions collections which provide the facility of messages passing communication (send and receive operation) between processes.

The most important goals for PVM are:

It is enables users to exploit their existing to reduce the cost of solving larger problem.

Fault tolerance,

Scalability, heterogeneity and portability.

Moreover, PVM is able to withstand host and network failure; PVM library has the capability to connect computers of different types in a single session.

The reason for choosing PVM for this dissertation because of its attributes and advantages that it is library is written in C and directly supports the most UNIX platforms, C language and C++ application; besides, this virtual environment is easy to understand and it does not need long time to be familiar with. Beside the book PVM tutorial [pvm book], both [www.csm.ornl.go/pvm/] and [] have direct beneficial resources for fast start.

Oxford Brookes School of Technology provides cluster which is available with providing PVM installed on it; the user should capable to run parallel PVM programs, and configure a virtual machine on any host which PVM has. Thus the following shows how to log into cluster account and use PVM:

via network alias butty for the cluster print (

using user-name and password to active the account

([email protected])

ssh to enter main node that is required to log on to another node and possible to stat PVM and add nodes.

The environment is ready for use, to show the core algorithm and some steps that are necessary to write and compile programs as:

$ ls show all programs that user have

$ cd project goes to main algorithm which have two programs, the driver and


$ vi electiondri.c enter to driver program

$ vi electionwor.c enter to worker program

When the user needs enter to another node by type $ pvm at main node, then it is possible to add nodes like pvm> node05, also shows current virtual machine, that is done by type pvm> conf. The book of PVM guide and tutorial provides a complete reference for starting and running PVM programs [pm book].

The compilation of PVM programs was done by used Makefile

However, there are varieties of programming functions that allow any programmer to start write PVM applications and get familiar with them with short period of time. Below is a description of some of these functions:

pvm_mcast() function is used to multicast a message in an active send buffer to a set of tasks in task id (tids) array, and by message tag (msgtag) the content can be distinguished. This function is provided in driver program, so the receiving processes can call pvm_recv() function to receive the copy of a multicast message.

pvm_spawn()function is used to spawn off new processes copies of executable process. This function contains the path of the executable process, the number of processes to be spawning, and return an array which contains task id for every process that has initiated.

pvm_recv()function call used to blocks of processes until sending message with its distinguished message tag has arrived; then pvm_recv() place the message in a new active receive buffer, as well as unpacked all parameters those contained in the message by using pvm_upkint()function.

pvm_send() routine function used to send a message which stored in active send buffer to identified process by task id(tid); where the message tag used to label the content of the message.

pvm_initsend()routine function used to clear default send buffer and prepare a new message packing. The packing message uses default encoding scheme because PVM can not know whether the user is going to add a heterogeneous machine before specific message is sent. When pvm_send is called, the contents are copied directly out of user’s memory.

pvm_nrecv()routine function which used to check whether the message with its message tag has arrived. During an election worker awaits an acknowledgement from the coordinator worker since the acknowledgement may not be received a non-blocking receive using pvm_nrecv should be used; otherwise the receiver will be blocked.  

Order Now

Order Now

Type of Paper
Number of Pages
(275 words)