The Byzantine Generals Problem
Motivation of Consensus
Each node has its own copy of the blockchain
Ensuring consistent views of the current state of the blockchain is crucial, and challenging due to crashes and attackers
The distributed consensus problem was studied in 1980s.
Bitcoin proof of work has been a breakthrough.
Byzantine Generals Problem
Coordinated attack leading to victory, uncoordinated attack leading to defeat.
Participants can communicate only through point to point message passing.
Byzantine fault: faulty participants (traitors) can behave arbitrarily, e.g., sending misleading messages.
Correctness:
- All loyal lieutenants obey the same order.
- If the commanding general is loyal, then every loyal lieutenant obeys the order he sends.
No solution is possible if at least one third of participants are traitors in partially synchronous networks.
Distributed Systems
State Machine Replication (SMR)
A general method for implementing a fault-tolerant service by replicating servers and coordinating client interactions with server replicas.
- All replicas start with the same initial value.
- All replicas receiving the same request shall output the same execution result and end up in the same state.
- Byzantine version: replicas are subject to Byzantine faults.
- The main challenge: Guaranteeing that replicas execute the same request in the same order.
Request are transactions and the log of transaction is blockchain.
Network Synchrony
3 kinds
Synchronous networks
- Operations are coordinated in rounds.
- Message delivery is bounded by a fixed delay.
- Not realistic!
Asynchronous networks
- Processes can coordinate only with messages.
- Message delivery is guaranteed, but in unbounded time.
- Most general setting, consensus problem provably unsolvable for deterministic algorithms.
Partially synchronous networks
- Network delay is unbounded.
- Long enough runs approximate synchronous runs.
- Assume that network faults are eventually fixed and DoS attacks eventually terminates
The GST Model
A simple partially synchronous model.
Network is asynchronous, until Global Stabilization Time (GST).
After GST, network behaves synchronously.
Correctness of distributed systems
Safety: something bad never happens
Liveness: something good eventually happens
In BFT consensus,
- safety means two correct replicas never commit conflicting blocks.
- liveness means a new block is eventually produced after GST.
The HotStuff Algorithm
HotStuff
Recent BFT consensus algorithm
Fast leader election
Linear communication cost
Optimistic responsiveness: after GST, if the current leader is correct, consensus is reached at the pace of actual network delay.
A variation of HotStuff was selected as the consensus algorithm of Libra Blockchain.
Preliminaries
\(n\) is the total number of replicas
\(f\) is the umber of Byzantine replicas.
BFT assumption: \(n=3f+1\).
Let \(A\) and \(B\) be two sets of replicas, with at least \(2f+1\) replicas each. Then \(A\cap B\) contains at least one correct replica.
Quorum Certificate (QC): a set of \(2f+1\) signed votes of distinct replicas
View: a progressive number; each view has an associated leader.
Basic HotStuff
Each round is composed of 3 phases
Replicas only speaks with the current leader.
Prepare Phase
Leader
- waits for \(2f+1\) NEW-VIEW messages (sent at the end of previous round), selects the QC with maximal view number.
- creates a new block, with parent the selected block, and broadcast it in a PREPARE message.
Replica
- waits for a PREPARE message, and accepts block if 1. (safety) block extends from lockedQC, or 2. (liveness) m.QC.view > lockedQC.view
- if message was accepted, sends vote
Pre-commit Phase
Leader
- waits for \(2f+1\) PREPARE votes.
- broadcasts a PRE-COMMIT message, with QC of received votes.
Replica
- waits for a PRE-COMMIT message.
- updates prepareQC.
- sends vote.
Commit Phase
Leader
- waits for \(2f+1\) PRE-COMMIT votes
- broadcasts a COMMIT message, with QC of received votes
Replica
- waits for a COMMIT message
- updates lockedQC
- sends vote
Decide Phase
Leader
- waits for \(2f+1\) commit votes
- broadcasts a DECIDE message, with QC of received votes
Replica
- waits for a DECIDE message
- increments current view, and sends a NEW-VIEW message
Timeout
If an expected message doesn't arrive, a timeout is triggered
A timed-out replica increments her current view, and sends a NEW-VIEW message.
Safety of Basic HotStuff
Two conflicting QC of the same type must refer to different views. Otherwise, a correct replica voted twice in the same phase: contradiction!
If two blocks are conflicting, they cannot be both committed, each by a correct replica.
Liveness of Basic HotStuff
If a correct replica is s.t. lockedQC = precommitQC, then at least \(f+1\) correct replicas voted for some prepareQC matching lockedQC. PrepareQC was voted by at least \(2f+1\) replicas.
After GST, there eixsts a bounded time period \(T_f\) s.t. if all correct replicas remain in view v during \(T_f\) and the leader for view v is correct, then a decision is reached.
The Pacemaker
A module handles leader selection and timeout duration. We need:
- A correct leader is eventually elected.
- All correct replicas eventually stay in the same view for \(T_f\).
They can be achieved as follows:
- Leader selection must guarantee that every replica eventually becomes a leader
- At each round, double timeout duration
Chained HotStuff
A vote for a block is an implicit vote for its parent block.
Three chain rule: a block (and its ancestors) is final if it is at the head of three contiguous blocks.
Pros and Cons
Pros:
- Block finality
- Good throughput
- Low energy consumption
Cons:
- Messages grows exponentially in the number of nodes.
- A single entity can impersonate several nodes (sybil attack)
Proof of Work
Hash Puzzle
Start with some value \(v\), fix a set \(Z\) of values. The size of \(Z\) is the difficulty parameter.
A solution to the puzzle is a value \(x\) s.t. \(h(v\circ x)\in Z\)
Finding a solution to the puzzle is a time consuming task (exponential time) if \(Z\) is small enough.
Checking whether \(x\) is a solution of the puzzle is easy.
Proof of Work
A solution to a hash puzzle proves that some amount of computational work has been performed.
Proposed in the 1990s as a counter-measure against spam mails.
With Consensus
Based on hash puzzle $$ h(nounce\circ header\circ prev_{hash}\circ tx\circ tx\circ\cdots\circ tx)<target $$ Nounce is the solution of the puzzle, and it is a field of the block.
Target is the difficulty (lower target requires more effort)
Target can be inferred from the previous blocks.
Nodes participates in the consensus algorithm are called miners.
Miners collect transaction from the network and creates blocks.
In order to create a valid block, miners need to solve the hash puzzle. The first miner that creates a valid block broadcasts it to the network
- If nothing goes wrong, the block is append to the blockchain.
- Notice that block validity can be checked easily by whoever has access to the blockchain.
On Liveness
A new block is eventually produced (after GST)
- Miners produce new blocks all the time.
- Even during the asynchronous phase.
- Liveness holds trivially.
On Safety
Two correct replicas never commit conflicting blocks.
- What if two miners produced two valid blocks at the same time?
- The blockchain forks.
- Thus safety does not hold.
Deal with Forks
Longest chain rule: miners try to append new blocks to the longest chain they know.
Eventually only one chain survives, resolving the fork.
Normally, a fork doesn't survive for more than 5 blocks.
Longest fork have been observed, by they happened after updates of the protocol.
Pros and Cons
Pros
- Successfully used for many years.
- Secure against sybil attacks
- Scales well w.r.t. the number of miners
Cons
- Miners may collude to share rewards (decentralization?)
- Mining consumes lots of energy.
- Does not scale well w.r.t. the number of users.
- Finality is probabilistic and slow.
Proof of Stake
Alternative to proof of work.
Probability of being selected as the next block creator depends on how much cryptocurrency you have.
Rationale: richer participants would lose their wealth in case the cryptocurrency value decreases.
Algorithm
Chained proof of stake. Somewhat inspired to proof of work: the blockchain can fork and the longest chain rule is applied.
BFT proof of stake. Exploits Byzantine fault tolerant algorithms, no forks.
Chained Proof of Stake
- Validators invest some stake
- Stake are frozen until the block is created
- A validator is selected at random
- The selected validator generates the new block
Problems: Trying to fork the blockchain could be profitable (sanction the validators that propose blocks in different forks)
BFT-based Proof of Stake
Nodes invest some stake
A fixed number of nodes are dynamically selected as validators.
Nodes with higher stake are more likely to be selected
A validator is selected as leader and proposes a new block
A BFT consensus algorithm is run by validators: block is append or discarded.
Fast and deterministic block finality.