Friday, 16 September 2016

RAFT and SWIM Protocols



RAFT Protocol
RAFT is a Distributed Consensus Protocol:
  • Consensus is all about an agreement on some data by a group of servers. So, that shared data is available until majority of the agreed servers up and running.
  • RAFT is a Distributed consensus protocol which is easy to understand and it solves the problem of replicating the data submitted by a client to multiple servers consistently.
This is how it works:
  • Consider each node in the system is a database server.
  • Each node can be in one of the 3 states:
    • Follower
    • Candidate
    • Leader
  • Leader Election:
    • Initially all the nodes will be in Follower state and if they don't hear heartbeat from a Leader, then they will become candidate and campaign for the election.
    • The Candidate node will send request for the vote to other nodes. The other nodes will reply with their vote.
    • If the Candidate gets majority of the votes, then it will become Leader.
  • Consensus:
    • After the election of leader, all the requests will go through Leader.
    • Leader replicates the data sent by the client in its log, but before committing, it will send the replicates to the Followers in the next heart beat and waits for acknowledgement from the Followers.
    • When Leader gets reply from majority of the Followers, it will send a response to the client and notifies Followers that it has committed the entry.
    • Now the cluster has come to the consensus about the system state.

SWIM Protocol
SWIM is all for Rapid detection and Dissemination:
  • Scalable Weakly-consistent Infection-style process group Membership Protocol.
  • Scalability is supported since either there exists quadratic growth in network load with increase in group size or compromise the response times or false positive frequency w.r.t detecting process crashes.
  • Changes to the states of the system are propagated in the weakly-consistent manner which use synchronization variables.
  • Infection style deals with spreading of updates in epidemic manner.
  • Group membership is handled by maintaining a membership list at each node.

 


Operation:
  • Each member will maintain a membership list containing list of members in the group, which gets updated whenever a process leaves or joins the group.
  • The protocol has 2 basic components:
    • Failure Detector Component
    • Dissemination component

Failure Detector Component:
  • It detects failure of members in the group.
  • 2 ways:
    • Direct detection where a random node will select a random member from its membership list. Then it will send the ping message to that random member and waits for the acknowledgement. If it does not get the acknowledgement, it will go for indirect detection.
    • Indirect detection where random member will select 'k' random members and request them to send ping message to the one node that does not sent acknowledgement. Those 'k' random members will send the ping message and send the received acknowledgement back to requesting random member.
  • At the end if the random node does not get any acknowledgement directly or indirectly, it will declare that random member as “suspected” in its local membership list considering the network packet losses or the random member may be asleep or the random node is slow.
  
Dissemination Component: 
  • Whenever a process detects information like suspected, failure, joining or voluntarily leaving the group, it will disseminate that information to the rest of the members in the group.
  • The information to be disseminated is piggybacked on the messages generated by Failure Detector component like ping, ack etc.
  • This is an infection-style dissemination mechanism since information will spread in the same way as a gossip or epidemic in the society.
  • Any member that receives the information will mark the same in its local membership list.
  • The suspected node will be treated as non-faulty node and included in the target selection for ping messages.
  • After a particular amount of time, the suspected node is considered as “failed” if it could not be proven to be “alive”.

Similarities
Similarities between SWIM and RAFT:
  • Clients need not be actively participate in the process.
  • Servers exchange the heartbeat messages to detect the faults.

Differences
Differences between SWIM and RAFT:
  • There is no leader selection in SWIM protocol.
  • There is no dedicated service for distributed consensus in SWIM protocol.
  • In SWIM protocol, each storage server will maintain its own view of the system.
  • SWIM protocol will disseminate the updates using epidemic principles.


















1 comment: