The Byzantine Generals Problem

Motivation of Consensus

Each node has its own copy of the blockchain

Ensuring consistent views of the current state of the blockchain is crucial, and challenging due to crashes and attackers

The distributed consensus problem was studied in 1980s.

Bitcoin proof of work has been a breakthrough.

Byzantine Generals Problem

Coordinated attack leading to victory, uncoordinated attack leading to defeat.

Participants can communicate only through point to point message passing.

Byzantine fault: faulty participants (traitors) can behave arbitrarily, e.g., sending misleading messages.

Correctness:

All loyal lieutenants obey the same order.
If the commanding general is loyal, then every loyal lieutenant obeys the order he sends.

No solution is possible if at least one third of participants are traitors in partially synchronous networks.

Distributed Systems

State Machine Replication (SMR)

A general method for implementing a fault-tolerant service by replicating servers and coordinating client interactions with server replicas.

All replicas start with the same initial value.
All replicas receiving the same request shall output the same execution result and end up in the same state.
Byzantine version: replicas are subject to Byzantine faults.
The main challenge: Guaranteeing that replicas execute the same request in the same order.

Request are transactions and the log of transaction is blockchain.

Network Synchrony

3 kinds

Synchronous networks

Operations are coordinated in rounds.
Message delivery is bounded by a fixed delay.
Not realistic!

Asynchronous networks

Processes can coordinate only with messages.
Message delivery is guaranteed, but in unbounded time.
Most general setting, consensus problem provably unsolvable for deterministic algorithms.

Partially synchronous networks

Network delay is unbounded.
Long enough runs approximate synchronous runs.
Assume that network faults are eventually fixed and DoS attacks eventually terminates

The GST Model

A simple partially synchronous model.

Network is asynchronous, until Global Stabilization Time (GST).

After GST, network behaves synchronously.

Correctness of distributed systems

Safety: something bad never happens

Liveness: something good eventually happens

In BFT consensus,

safety means two correct replicas never commit conflicting blocks.
liveness means a new block is eventually produced after GST.

The HotStuff Algorithm

HotStuff

Recent BFT consensus algorithm

Fast leader election

Linear communication cost

Optimistic responsiveness: after GST, if the current leader is correct, consensus is reached at the pace of actual network delay.

A variation of HotStuff was selected as the consensus algorithm of Libra Blockchain.

Preliminaries

$n$ is the total number of replicas

$f$ is the umber of Byzantine replicas.

BFT assumption: $n=3f+1$.

Let $A$ and $B$ be two sets of replicas, with at least $2f+1$ replicas each. Then $A\cap B$ contains at least one correct replica.

Quorum Certificate (QC): a set of $2f+1$ signed votes of distinct replicas

View: a progressive number; each view has an associated leader.

Basic HotStuff

Each round is composed of 3 phases

Replicas only speaks with the current leader.

Prepare Phase

Leader

waits for $2f+1$ NEW-VIEW messages (sent at the end of previous round), selects the QC with maximal view number.
creates a new block, with parent the selected block, and broadcast it in a PREPARE message.

Replica

waits for a PREPARE message, and accepts block if 1. (safety) block extends from lockedQC, or 2. (liveness) m.QC.view > lockedQC.view
if message was accepted, sends vote

Pre-commit Phase

Leader

waits for $2f+1$ PREPARE votes.
broadcasts a PRE-COMMIT message, with QC of received votes.

Replica

waits for a PRE-COMMIT message.
updates prepareQC.
sends vote.

Commit Phase

Leader

waits for $2f+1$ PRE-COMMIT votes
broadcasts a COMMIT message, with QC of received votes

Replica

waits for a COMMIT message
updates lockedQC
sends vote

Decide Phase

Leader

waits for $2f+1$ commit votes
broadcasts a DECIDE message, with QC of received votes

Replica

waits for a DECIDE message
increments current view, and sends a NEW-VIEW message

Timeout

If an expected message doesn't arrive, a timeout is triggered

A timed-out replica increments her current view, and sends a NEW-VIEW message.

Safety of Basic HotStuff

Two conflicting QC of the same type must refer to different views. Otherwise, a correct replica voted twice in the same phase: contradiction!

If two blocks are conflicting, they cannot be both committed, each by a correct replica.

Liveness of Basic HotStuff

If a correct replica is s.t. lockedQC = precommitQC, then at least $f+1$ correct replicas voted for some prepareQC matching lockedQC. PrepareQC was voted by at least $2f+1$ replicas.

After GST, there eixsts a bounded time period $T_f$ s.t. if all correct replicas remain in view v during $T_f$ and the leader for view v is correct, then a decision is reached.

The Pacemaker

A module handles leader selection and timeout duration. We need:

A correct leader is eventually elected.
All correct replicas eventually stay in the same view for $T_f$.

They can be achieved as follows:

Leader selection must guarantee that every replica eventually becomes a leader
At each round, double timeout duration

Chained HotStuff

A vote for a block is an implicit vote for its parent block.

Three chain rule: a block (and its ancestors) is final if it is at the head of three contiguous blocks.

Pros and Cons

Pros:

Block finality
Good throughput
Low energy consumption

Cons:

Messages grows exponentially in the number of nodes.
A single entity can impersonate several nodes (sybil attack)

Proof of Work

Hash Puzzle

Start with some value $v$, fix a set $Z$ of values. The size of $Z$ is the difficulty parameter.

A solution to the puzzle is a value $x$ s.t. $h(v\circ x)\in Z$

Finding a solution to the puzzle is a time consuming task (exponential time) if $Z$ is small enough.

Checking whether $x$ is a solution of the puzzle is easy.

Proof of Work

A solution to a hash puzzle proves that some amount of computational work has been performed.

Proposed in the 1990s as a counter-measure against spam mails.

With Consensus

Based on hash puzzle $$ h(nounce\circ header\circ prev_{hash}\circ tx\circ tx\circ\cdots\circ tx)<target $$ Nounce is the solution of the puzzle, and it is a field of the block.

Target is the difficulty (lower target requires more effort)

Target can be inferred from the previous blocks.

Nodes participates in the consensus algorithm are called miners.

Miners collect transaction from the network and creates blocks.

In order to create a valid block, miners need to solve the hash puzzle. The first miner that creates a valid block broadcasts it to the network

If nothing goes wrong, the block is append to the blockchain.
Notice that block validity can be checked easily by whoever has access to the blockchain.

On Liveness

A new block is eventually produced (after GST)

Miners produce new blocks all the time.
Even during the asynchronous phase.
Liveness holds trivially.

On Safety

Two correct replicas never commit conflicting blocks.

What if two miners produced two valid blocks at the same time?
The blockchain forks.
Thus safety does not hold.

Deal with Forks

Longest chain rule: miners try to append new blocks to the longest chain they know.

Eventually only one chain survives, resolving the fork.

Normally, a fork doesn't survive for more than 5 blocks.

Longest fork have been observed, by they happened after updates of the protocol.

Pros and Cons

Pros

Successfully used for many years.
Secure against sybil attacks
Scales well w.r.t. the number of miners

Cons

Miners may collude to share rewards (decentralization?)
Mining consumes lots of energy.
Does not scale well w.r.t. the number of users.
Finality is probabilistic and slow.

Proof of Stake

Alternative to proof of work.

Probability of being selected as the next block creator depends on how much cryptocurrency you have.

Rationale: richer participants would lose their wealth in case the cryptocurrency value decreases.

Algorithm

Chained proof of stake. Somewhat inspired to proof of work: the blockchain can fork and the longest chain rule is applied.

BFT proof of stake. Exploits Byzantine fault tolerant algorithms, no forks.

Chained Proof of Stake

Validators invest some stake
Stake are frozen until the block is created
A validator is selected at random
The selected validator generates the new block

Problems: Trying to fork the blockchain could be profitable (sanction the validators that propose blocks in different forks)

BFT-based Proof of Stake

Nodes invest some stake

A fixed number of nodes are dynamically selected as validators.

Nodes with higher stake are more likely to be selected

A validator is selected as leader and proposes a new block

A BFT consensus algorithm is run by validators: block is append or discarded.

Fast and deterministic block finality.