"Distributed Systems" are not what they used to be
11 Feb 2020 Tags: Distributed systemsThe phrase “distributed systems” has had several meanings over the last 50 years, from a mathematical topic to a style of engineering practice to a suite of technologies. What can we learn from these changes and how do we reconcile these perspectives?
The early days (1970–1995): Algorithms
The initial work on distributed systems had strong overlap with work on concurrency and parallism. Indeed, from the perspective of algorithm correctness, these topics share much in common. Nor was there a widely-agreed definition of the phrase. The earliest researchers were mapping this new territory, not altogether sure of where they were going or how their maps would ultimately be used.
Although early work included the implementation of groundbreaking systems such as SAGE, Sabre, and ARPANet, the work from that era most commonly characterized as “distributed systems” is a series of classic papers focussed on algorithms and their correctness. The literature, even in its early stages, was large and beyond simple summary and I don’t have the background to survey it fairly. Instead, I will list a few papers that I have found powerful and influential. The list is not particularly idiosyncratic, as these papers are also widely cited by others.
In order of publication:
- “Time, Clocks, and the Ordering of Events in a Distributed System”, Lamport (1978).
- “The Byzantine Generals Problem”, Lamport, Shostak & Pease (1982).
- “Randomized Byzantine Generals” (abstract only), Rabin (1983).
- “Distributed Snapshots: Determining Global States of Distributed Systems”, Chandy & Lamport (1985).
- “Impossibility of Distributed Consensus with One Faulty Process”, Fischer, Lynch & Paterson (1985).
- “Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems”, Oki & Liskov (1988).
- “Unreliable Failure Detectors for Reliable Distributed Systems”, Chandra & Toueg (1996).
- “The Part-time Parliament”, Lamport (Published 1998, submitted substantially earlier).
- “Practical Byzantine Fault Tolerance”, Castro & Liskov (1999).
This list is not intended to be complete, sufficient, nor up to date. For learning the practical side of distributed systems, Henry Robinson (who I wish would resume blogging) has developed a useful starter guide, “Distributed systems theory for the distributed systems engineer” (updated as of 2018).
The papers can be grouped into categories that highlight their distinct contributions:
-
Papers that prove fundamental limits on distributed algorithms: “Time, Clocks, and the Ordering of Events”, “Impossibility of Distributed Consensus”, and “Unreliable Failure Detectors”.
-
Papers presenting fundamental algorithms: “Byzantine Generals Problem”, “Distributed Snapshots”, and “Part-Time Parliament”. Notably, every one of these was writtten or co-authored by Leslie Lamport, who was awarded the 2013 Turing Award.
-
Papers presenting first implementations solving important problems in distributed systems: “Viewstamped Replication” and “Practical Byzantine Fault Tolerance”. I think it is significant that both of these were Ph.D. projects under the supervision of Barbara Liskov, who was awarded the 2008 Turing Award for these and other contributions.
In summary, during this first period, “distributed systems” referred to a community concerned with defining fundamental limits, discovering fundamental algorithms, and demonstrating the feasibility of these methods via first implementations.
The middle period (1995–2015): Engineering scalable systems
Where the “early period” lasted roughly 25 years, the “middle period” was much shorter, perhaps 10 years, with some overlap of the “early period”. The literature for this period is even more voluminous and although I do believe I have a good knowledge of the key papers, there is neither time nor space to provide a bibliography anything close to complete. I will instead list a short sample of papers to illustrate the history of its development:
-
“A Note on Distributed Computing”, Waldo, Wyant, Wallroth & Kendall (1994).
-
“Towards Robust Distributed Systems”, Brewer (2000).
-
“MapReduce: Simplified Data Processing on Large Clusters”, Dean & Ghemawat (2004).
-
“Paxos Made Live”, Chandra, Griesemer, & Redstone (2007).
-
“Eventually Consistent”, Vogels (2008).
-
“Designs, Lessons, and Advice from Building Large Distributed Systems”, Dean (2009).
-
“The Tail at Scale”, Dean & Barroso (2013).
During this period, attention shifted towards engineering large-scale, failure-tolerant systems running in a global network of datacentres. The abstract for the Waldo et al. technical report sets the tone for the coming ten years:
These failures [to support basic requirements of robustness and reliability] have been masked in the past by the small size of the distributed systems that have been built. In the enterprise-wide distributed systems foreseen in the near future, however, such a masking will be impossible.
Eric Brewer’s 2000 talk, a keynote address to the “Principles of Distributed Computing” conference, emphasized that his company of the time, an early Web search service, was
- “Based on scalable cluster and parallel distributed computing technology”
- “But [made] very little use of classic DS [Distributed Systems] research”
The talk introduced the CAP “Theorem” but twelve years later Brewer emphasized that “its purpose … was to open the minds of designers to a wider range of systems and tradeoffs” rather than to further distributed systems theory.
The remaining papers sample the development of systems addressing the interconnected engineering challenges of
- failure tolerance,
- geolatency—the lag for round-trip communications across long distances,
- consistency,
- high scalability, and
- high availability.
The systems in this period were purpose-built by organizations in the grip of explosive growth. It is emblematic of the shift that T. D. Chandra, first author of the 1996 paper establishing the theory of weak failure detectors, is also the lead author of the 2007 paper on the challenges of implementing the Paxos distributed consensus algorithm in production. Theory into practice, indeed.
The contemporary view (2015–present): Cowabunga Kubernetes!
The current phase is the inexorable next step, where commercial organizations package, distribute, and support generalized versions of the custom solutions from the previous phase. The experimentation of the early days of large-scale datacentres has settled into established practice, at least for some common use cases.
The tool with broadest reach, virtually ubiquitous in both actual datacentres and discussions about managing them, is Kubernetes. This third-generation orchestration tool, successor to Google’s Omega and Borg, manages applications organized around a similarly ubiquitous technology, containers. Whereas Kubernetes is the clear market winner to the point where it’s hard to even name an alternative, there are many contending container technologies, aimed at different but related levels of the underlying stack. As of this writing, the CNCF Landscape lists 12 container runtimes, including containerd, cri-o, Firecracker, gVisor, and kata. (By the way: Do their trademark staffs think they have to pay extra for capital letters?)
The choice of container runtime is mostly a matter for the platform and operations staff as, the distinctions between runtimes are near-invisible to application developers.
Contemporary discussions of “distributed systems” centre on the practices of architecting systems to run under Kubernetes, managing Kubernetes clusters, and cooperating with legacy systems that remain outside the cluster. Development work focuses on building out those capacities not well-addressed by the current release of Kubernetes and related tools.
It’s a Kubernetes world and we just have to get used to living in it.
Why does this matter?
What does this history suggest? On its face, it is a typical progression from theory to early production use to commoditization as a family of related products. But I think something has been lost in the process: an understanding of the inescapable limits mapped by the early theoretical papers. There is an easy tendency to focus on the technology, with all its details and constant change, and assume that it will in turn address those limits.
That assumption is false. The technology provides mechanisms for readily building, deploying, and operating such services but the designs remain as bound as ever within the theoretical limits. That is what makes these limits fundamental, after all.
Engineers often cite Brewer’s CAP Principle as an example of theory that is widely-applied. Whether we define CAP in terms of Brewer’s original formulation or the two theorems later proven by Gilbert and Lynch, I find CAP’s relationship to actual practice odd and will write a future post analyzing that (heavily influenced by Brewer’s 2012 reflection and Abadi’s PACELC framework). For this post, I only say that I do not see CAP as a fundamental limit that must be explicitly addressed by designs. Rather, current engineering practice will lead the designer to accommodate CAP as a side-benefit.
On the other hand, I do see the Fischer, Lynch, and Paterson (1985) impossibility theorem as fundamental: On realistic distributed systems, the latency of any consensus algorithm is unbounded. Unfortunately, the theorem provides no constructive procedure to follow because it is proved by contradiction. This is typical of impossibility results, which essentially state that no how much you tweak your algorithm, it will never match your specification. The FLP result is fundamentally about reducing your expectations to realistic levels.
The FLP result constrains the entire system design. When implementing a system, it is seductive to see a given problem as a local issue, fixable by local changes to the algorithm. But if the problem is in fact that you are attempting to circumvent FLP, a local fix simply pushes the problem elsewhere. The fix displaces the problem rather than solving it.
The abstraction and totality of this theorem make it easy to miss when attending to implementation details. Yet it cannot be ignored. In future posts, I want to work out how the limitations that the FLP theorem guarantees in systems might appear as localized problems and describe how seeing those problems as aspects of FLP can produce better solutions.