Distributed Consensus and the Raft Algorithm
Concepts of Programming Languages
Sebastian Macke
Rosenheim Technical University
About this Lecture: Some Thoughts
- Goal of this lecture is to look at a non trivial example in Go.
- The problem we look at, is language agnostic.
- It can be implemented in almost every language.
- It combines concurrent programming and network programming.
- It contains non trivial logic.
- It is a real world example with high relevance.
2
Distributed Programming
- Concurrent System - multiple execution points (threads, processes or via network)
- Distributed System - processes are running on different networked processors
- Parallel System - like a distributed system, but more closely coupled (shared memory or fast bus)
Terms are not always used consistently.
3
4
The two general problem
- The two generals problem is a thought experiment in distributed computing and unsolvable when the communication might fail.
- We need a pragmatic solution.
5
Discussion: How would you solve the problem?
Three armies, each led by a general, plan to attack a fortified city. The armies may arrive at different times to surround the city, and none of the generals can be certain whether all the armies will arrive at all. Consequently, they have not appointed a single leader.
For the attack to succeed, at least two armies must strike simultaneously. Therefore, the generals need to coordinate and agree on a precise time to launch the attack. They can only communicate by sending messengers.
- Under the assumption of reliable communication.
- Under the assumption of unreliable communication.
6
What is Distributed Consensus?
The consensus problem requires agreement among a number of processes on a single data value.
Distributed consensus protocols can be used for distributed databases which should stay consistent:
- The system stays consistent even if some nodes goes down
- The system stays consistent if a network partition occurs (but could get unavailable)
- The system never responds with wrong or inconsistent data (different responses from nodes)
7
The CAP Theorem
In a distributed system the following requirements can be met:
- Consistency - Data is the same across the cluster, so you can read or write from/to any node and get the same data.
- Availability - The system stays available even if one or more member goes down
- Partition Tolerance - If parts of the system loose connection to other parts, the system stays alive
Pick Two: Only two requirements can be met -> CP, AP are typical (CA does not exists)
codahale.com/you-cant-sacrifice-partition-tolerance/
AP = DNS (Domain Name System)
CP = Cash Machine
8
Securing C and P
- All known algorithms use the concept of a quorum to detect network partition problems
- Only the partition with the majority of nodes stay alive to ensure consistency
- All known algorithms use the concept of a Master or Leader node to control replication
- All known algorithms use a replicated log and implement a two phase commit.
- A replicated log is a list of all changes which are applied to the system
9
Two Phase Commit
Phases:
- Prepare Phase - Data is persisted on all members. Data is not visible but stored.
- Commit Phase - Persisted data will be executed. Data gets visible.
- This is typically implemented with an append only log (transaction log) and a state machine which reads the log entries and loads them into memory.
This is equivalent with two round trips in the two general problem.
10
What is Raft?
- Raft is a protocol for distributed consensus
- Raft is designed to be easy understandable
- Raft predecessor Paxos was highly complex
- Most Paxos implementations are buggy or academic
- Raft was developed 2014 in a PhD thesis at Stanford University
- Zookeeper ZAB is an alternative but more complex solution
The Raft Paper
The ZAB paper
The Paxos Paper
11
Who uses Raft?
- ectd - The Cloud Native key value store -> Part of Kubernetes!
- Docker Swarm - Docker in cluster mode
- dgraph - A Scalable, Distributed, Low Latency, High Throughput Graph Database
- tikv - A Distributed transactional key value database powered by Rust and Raft
- swarmkit - A toolkit for orchestrating distributed systems at any scale.
- chain core - Software for operating permissioned, multi-asset blockchain networks
- Elastic Search - Distributed Search engine (=document oriented database)
- ...
12
The Raft Algorithm
- Raft is based on a replicated state machine with the states: FOLLOWER, CANDIDATE and LEADER
- All nodes start in the FOLLOWER state
- The leader is responsible for sending heartbeat message to all members
- The leader is responsible to replicate data to all other nodes (2 phase commit)
- If a member does not receive a leader message in a random timeout interval, an election starts
- During an election a candidate sends vote messages to all cluster members
- The election is won, if a quorum (>50% = n/2 + 1) of members respond with OK
13
Raft in Action
Raft Explanation
The Raftscope Visualization
14
The Raft State Model
- There is only one Leader which is responsible for consistency and replication
15
Implementing Raft with Go
The most prominent implementation is the Raft implementation of Etcd (Part of Kubernetes)
github.com/etcd-io/etcd/tree/master/raft
This implementation is highly optimized and abstracts from the Raft paper
Other implementations
github.com/hashicorp/raft
github.com/search?q=go-raft
Read the Raft paper for specification
16
Is it possible to build your own Raft implementation?
Requirements and Decisions
- We want to stay close on the original specification, but focus especially on the leader election.
- We limit our use case to only 3 nodes.
- We want to use easy built-in network mechanisms in Go such as HTTP. JSON not strictly necessary
17
Interesting Parts
- State Machine
- Concurrency: Behavior of remote calls, election and heartbeat timers
- Design: Timer implementation
18
Exercise 8.2: Be Creative!
Three armies, each led by a general, plan to attack a fortified city. The armies may arrive at different times to surround the city, and none of the generals can be certain whether all the armies will arrive at all. Consequently, they have not appointed a single leader.
For the attack to succeed, at least two armies must strike simultaneously. Therefore, the generals need to coordinate and agree on a precise time to launch the attack.
To achieve this, they decide to use a simplified version of the Raft algorithm.
- Write your own Raft Implementation!
github.com/s-macke/concepts-of-programming-languages/blob/master/docs/exercises/Exercise8.md
19
Summary
- Go is an excellent choice to implement an distributed protocol like Raft
- You can implement the Raft specification with ca 1000-2000 Loc
- You can learn from the Etcd implementation, which is on Github
- Building a production safe implementation is hard in any language anyway!
20
Thank you
Use the left and right arrow keys or click the left and right
edges of the page to navigate between slides.
(Press 'H' or navigate to hide this message.)