Generalized distributed consensus in PlusCal: Introduction
29 May 2019 Tags: Distributed systemsFirst in a series describing Howard & Mortier’s generalization of distributed consensus, expressed in PlusCal. The next post describes why consensus is hard.
This post begins a new series, which I expect to interleave with the series on failure tolerance. This series is inspired by the recent paper by Heidi Howard and Richard Mortier, “A Generalized Solution to Distributed Consensus”, which extends results from Howard’s dissertation.
Distributed consensus is a core problem in distributed systems, applied to use cases ranging from storing configuration metadata for a replicated, high-availability service to ensuring consistency in distributed storage systems. Approaches to distributed consensus include the Paxos algorithm, the ZAB protocol, and the Raft algorithm. Although these have typically been presented as distinct (indeed, Raft was proposed as a more understandable alternative to Paxos), Howard treats them all as variants of Paxos.
There is an enormous literature on Paxos. Three particularly good starting points are, in order of reading:
- Tom Cocagne’s “Understanding Paxos”
- The best starting point for those with a focus on the algorithm’s
application. A gradual introduction to the algorithm, followed by
discussion of the variants most commonly used in production. Cocagne
also has a corresponding Python implementations of
the basic algorithm
and the more complex but higher-performance
Multi-Paxos variant. This
latter code could, with further engineering, serve as a basis for
production use.
Cocagne’s description and implementations have one important restriction however: They require a fixed configuration of replicas. Although his Multi-Paxos code provides for a crashed replica to restart and catch up, the only way to bring a new replica into the configuration is to shut them all down and bring up a new set including the new replica. The next paper presents a version that accommodates this process.
- Altınbüken and van Renesse’s “Paxos Made Moderately Complex”
- An excellent second source, this is an interactive version of the authors’ 2015 Computing Surveys paper, providing a readable presentation of the most commonly-used production variant, Multi-Paxos with reconfiguration. They include a Python implementation, although their version is oriented to demonstration and classroom use. In particular, the replicas all run in a single process, communicating via a simulated network. Unlike Cacogne’s version, this one supports reconfiguration, the capacity to replace failing replicas while the algorithm runs.
- Howard’s dissertation, “Distributed Consensus Revised”
- This is a complete, very readable, and systematic introduction to Paxos and its variants. The first two chapters present the basic algorithm, while the later ones explore variants and extensions. Howard’s work extends Paxos, so this paper includes variations not described in the first two. The paper also provides proofs of all its results, making it the most formal of the three. An excellent followup to the the first two, providing a rigorous background to the consensus problem. No code is provided—all algorithms are presented in pseudocode.
All of these include links to other important papers on the topic, providing a view of the major work in the topic.
The goals of this series
Given the plethora of existing descriptions, why add another? I do not aim to present another tutorial—the above three are sufficient. My focus instead is supplementing them, particularly in light of the new Howard and Mortier paper. I want to:
- Bring the introductions up to date: Howard’s doctoral research significantly extended the range of possible algorithms. Her paper with Mortier further extends this work. The first two introductions above do not include any of this work and even the dissertation is slightly out of date.
- Present the algorithms precisely, in machine-evaluable form: The dissertation and paper present some algorithms in pseudocode and others solely via informal description. In contrast, a PlusCal representation is precise and evaluable by a model checker.
- Integrate the invariants and algorithms in a common notation: A key contribution of Howard’s work is the isolation of more basic correctness invariants than those used in earlier papers. These new invariants and their use in proofs are presented separately from the algorithms. Using PlusCal, the invariants can instead be incorporated directly into the algorithms and model checked.
-
Provide examples of model checking the algorithms: Testing continuously-running services such as consensus is particularly difficult. A good test suite should exercise many different state sequences of the algorithm. For consensus, those alternate sequences arise due to timing differences.
Although generating alternate timings is hard for an executable program, this is easy to for a PlusCal algorithm, using the TLC model checker. For small model configurations, TLC can test every state exhaustively, proving correctness of the algorithm for a model of that size. For larger configurations, TLC can test randomly-selected state sequences. This latter result is less conclusive than an exhaustive search but provides more confidence in the algorithm than testing an executable version.
Some limitations of using PlusCal
In addition to the advantages described above, representing the algorithms in PlusCal has disadvantages relative to the pseudocode used in Howard and Mortier’s paper and Howard’s dissertation:
- The PlusCal representation requires a greater level of detail. This includes both representing system features such as the network, as well as the exact contents of messages.
- Given the goal of model-checking the algorithms, the PlusCal representation will be influenced by choices that make the checking tractable on small hardware. For example, the detailed network model that I will present in the initial posts dramatically increases checking time, while a simplistic network model, which can be checked more quickly, imposes a restatement of the actual algorithm. This balance is inherent any time an algorithm is to be model checked; I introduce it here because it influences choices throughout the series.
With that introduction, my next post will describe why distributed consensus can be so challenging.