Lamport's 1978 mutual exclusion algorithm in PlusCal
15 Apr 2019 Tags: Distributed systemsFourth in a series on using the TLA Toolbox to explore the failure tolerance of distributed algorithms. The previous post was a specialized discussion of the fairness semantics of PlusCal. The post preceding that was a general discussion of writing distributed algorithms in PlusCal. The next post presents the results of simulation runs on the algorithm.
The series will be organized around Lamport’s mutual exclusion algorithm, described in his 1978 paper. This algorithm is not so much a primary contribution of that paper as it is a demonstration of the use of the paper’s main contribution, logical clocks. The algorithm is not suited to production use, due to its assumption of perfect instances communicating over a perfect network. Yet these assumptions make it ideal for the purposes of this series, as we can use TLC to demonstrate how the algorithm fails when they are violated.
Stephen Merz contributed a TLA+ version of the algorithm to the TLA example library. The version here is derived from his, rewritten in PlusCal and slightly modified.
The algorithm, with the bodies of the define
s and macros elided
(they are provided in the Appendix), follows:
--------------------- MODULE LogicalClocks ------------------------
(*
Lamport's mutual exclusion algorithm from "Time, Clocks, and the
Ordering of Events in a Distributed System".
The algorithm assumes perfect, in-order message transmission and
reliable (i.e., non-Byzantine, non-crashing) processes.
See also the formalization of the same algorithm by Stephan Merz:
github.com/tlaplus/Examples/tree/master/specifications/lamport_mutex
*)
EXTENDS Naturals, Sequences, FiniteSets, TLC
CONSTANTS N, \* Number of processes
MaxClock \* Maximum clock value
\* -------------------------- Types -------------------------------
\* Process identifier
Pid == 1 .. N
(*
Range of acceptable clock values
One more than MaxClock because receiving a message with timestamp
MaxClock can push a process's clock to MaxClock+1.
*)
ClockVal == 0 .. MaxClock+1
\* A message (with three subtypes)
Message == [time: ClockVal, type: {"Request", "Release", "AckReq"}]
\* The 1-1 communication channels between every process
Channel == [src \in Pid |-> [dst \in Pid |-> Seq(Message)]]
(******************************************************************
--algorithm LogicalClocks
variables
(*--------------------------- Network model----------------------
Messages in transit between processes. Messages are guaranteed
to be delivered in the order in which they were sent.
*)
channel = [source \in Pid |-> [destination \in Pid |-> <<>>]],
(*--------------------------System Model-------------------------
All processes currently in the critical section
*)
crit = {}
\*-----------------------------Operators---------------------------
define
\* Total order of clocks using Pid to break ties
LogClockLt(reqs, p, q) == \* ...
\* All processes whose head msg to dst is of this type
ChanHead(dst, type) == \* ...
\* Max of two times
Max(a, b) == \* ...
end define
\*-----------------------------Network operations------------------
\* Wait for a `type` message and return its contents
macro Receive(type, clock, src, time) begin
\* ...
end macro
\* Broadcast a message
macro Broadcast(clock, msg) begin
\* ...
end macro
\* Send a message to a specific process
macro SendTo(clock, dst, msg) begin
\* ...
end macro
\*-----------------------------System model operations------------
macro EnterCritSec() begin
crit := crit \union {self}
end macro
macro ExitCritSec() begin
crit := crit \ {self}
end macro
\*-------------------------------Algorithm------------------------
process Proc \in Pid
variables
\* --- Algorithm state ---
\* Logical clock
clock = 1,
(*
The processes that have acknowledged this process's request,
if it has one
*)
acks = {},
\* Request times this process has received (including its own)
requests = [pid \in Pid |-> 0],
\* --- Instrumentation ---
\* Number of times this process has entered critical section
ownerships = 0,
\*--- Temporary variables---untyped, uninitialized
time,
src
begin \* process
loop: while TRUE do
either
\* --- Manage this process's entry to crit sect ---
\* Request ownership
when requests[self] = 0;
Broadcast(clock, [time |-> clock, type |-> "Request"]);
requests := [requests EXCEPT ![self] = clock];
acks := {self}
or \* Receive acknowledgement of this process's request
Receive("AckReq", clock, src, time);
clock := Max(clock, time);
acks := acks \union {src}
or \* If at head of request queue, take ownership
when /\ self \notin crit
/\ acks = Pid
/\ \A p \in Pid: p # self =>
LogClockLt(requests, self, p);
EnterCritSec();
ownerships := ownerships + 1
or \* Release ownership
when self \in crit;
requests := [requests EXCEPT ![self] = 0];
ExitCritSec();
acks := {};
Broadcast(clock, [time |-> clock, type |-> "Release"])
\* --- Record other process' requests to enter crit sect ---
or \* Receive a request and acknowledge receipt
Receive("Request", clock, src, time);
requests := [requests EXCEPT ![src] = time];
clock := Max(clock, time);
(*
increment clock in msg to reflect addition that will
occur at loop's end
*)
L2: SendTo(clock, src, [time |-> clock+1, type |-> "AckReq"])
or \* Receive a release
Receive("Release", clock, src, time);
clock := Max(clock, time);
requests := [requests EXCEPT ![src] = 0];
end either;
tic: clock := clock + 1
end while;
end process
end algorithm
The algorithm is written based on conventions for PlusCal I have developed:
- Global variables are never accessed directly but only through macros
- Complex, regularly-occurring TLA expressions are given
named
define
s - Conventional layout for every alternative
- Categorization of variables by function
The first two were detailed in an earlier post. I will describe the last two in more detail next.
Conventional layout for every alternative
Lamport’s algorithm follows a common structure for distributed algorithms: An infinite loop whose body is a list of alternatives, with each alternative comprising a guard followed by updates to process variables and messages. To highlight the dependency of the updates on the guards, I indent the updates a further two spaces from the guards.
The guards built from when
statements are pure expressions that do
not modify any state. On the other hand, guards using the network
Receive
macro both wait on a condition and change the global network
state when that condition is satisfied, removing the frontmost message
to the process and returning the id of its sender and its sending time
in the local logical clock as process local variables src
and time
,
respectively.
Categorization of variables according to function
My last PlusCal convention is to categorize the global and local variables according to distinct functions. I find this makes the program more readable and I use the language more effectively.
For global variables:
- Network model
- Global variables representing the state of the network.
- System model
- Variables that model the state of the system as a whole, apart from
the network. These variables are often used to represent parts of
system state that are implied elsewhere but not directly
represented. Representing these elements of the state as system
model variables allows the analyst to write invariants and
behaviours that check them.
For example, the global
crit
is added to the Lamport algorithm to indicate the set of all processes in their critical section. In the formal algorithm, this variable is not used but is only implied. Making this implied state an explicit global variable allows the analyst to represent the important safety property that at most one process is ever in the critical section by the simple invariantCardinality(crit) < 2
.
For local variables:
- Algorithm state
- The state of a distributed algorithm is represented in local variables (by definition, a distributed algorithm cannot rely on global state).
- Instrumentation
- Variables used to analyze and debug the algorithm. In this
algorithm,
ownerships
counts the number of times a given process has entered the critical section. - Temporary variables
- Variables local to individual actions, recording temporary state rather than the state of the algorithm.
A PlusCal program also has runtime variables, implied in the source code and made explicit by the PlusCal translator in the resulting TLA code:
pc
- The program counter, indicating the currently-enabled state. This is present in the translation of any PlusCal program.
stack
- The stack of return states and local variable values. This is only
present in translations of PlusCal programs that include one or more
procedure
s.
You can use the distinction between algorithm variables versus
instrumentation and temporary variables to accelerate TLC model
checking using a view. You only need the variables
essential to the system and algorithm’s state, together with any
runtime variables inserted by the PlusCal translator; by not including
the instrumentation or temporaries in the view, you reduce the size of
the state space the checker must explore. For this algorithm, the view
definition includes the network model channel
; the system model
crit
; the algorithm state clock
, acks
, and requests
; and
the runtime variable pc
:
\* Essential variables to be monitored by TLC
View == <<channel, crit, clock, acks, requests, pc>>
The algorithm
The basics of the algorithm are described in Lamport’s paper, so I
will not repeat them here. The basic structure is an infinite loop
with an either
selecting between six alternatives. I have structured
this version so that the first four alternatives deal with the
successive stages of a process using the critical section (request
ownership, receive an acknowledgement that another process has
received that request, take ownership when it’s this process’s turn,
release ownership), followed by the two alternatives for processing
requests from other processes (acknowledge receipt of a request,
record an ownership release).
All of the alternatives but one are a single, atomic action. The
lone exception is the action to record a request from another
process. Because that action requires both receipt of the request
message and sending of an acknowledgement message, the channel
variable in the
network model is updated twice. Consecutive assignments to the same
variable have to occur in different steps, so the statement sending
the reply must be prefixed by a label, L2
.
The extra label is an artifact of my use of macros. If the step were
written instead using basic PlusCal, the two updates to channel
could be merged and the step would become atomic. I think the macros
substantially improve readability of the code, readability that is not
worth giving up for modest gains in speed of model checking from
removing the extra state. Furthermore, as the series proceeds I will
take advantage of the macros to insert failure testing without
modifying the text of the algorithm.
Instrumentation
In addition to the variables essential for the process,
this version has some modest instrumentation. The Print
macro
displays the current process’s state on the TLC console. The
PrintDebug
TLA operator specified in the define
statement controls
whether the macro actually prints. The default is not to print but it
can be overridden in the TLC console for specific runs.
The process also includes a variable ownerships
that records the number
of times this process has entered the critical section. This can be
output via the Print
macro, although none of the current macro calls
include the variable.
Assumptions
The algorithm assumes an idealized system, in which processes never fail and all messages are delivered in order. Although these assumptions are fine for its intended use in the original paper, where it served as an example of the style of reasohing about distributed clocks that Lamport presented, these assumptions make it of only the most limited practical use. I cannot imagine many actual message-passing environments where no process will ever fail and message delivery is perfect. Perhaps some of the MPI-based systems used in the High Performance Computing community are that reliable but in the community building high-throughput distributed systems for modern datacentres, designers are taught to expect processes to fail and networks to be unreliable. I do not see how any new layer, whether below or above the Lamport algorithm, could allow the algorithm to maintain safety in the presence of these errors.
Testing the algorithm’s behaviour
The next step is to run the algorithm in the TLC model checker. With that baseline established, I will begin running the algorithm in environments where the assumptions of perfect processes and networks are relaxed. The ways that the algorithm breaks will give insights on how the algorithm works.
Appendix: Defines and macros
Bodies of the define
s and macros:
define
\* Total order of clocks using Pid to break ties
LogClockLt(reqs, p, q) ==
\/ reqs[q] = 0
\/ reqs[p] < reqs[q]
\/ reqs[p] = reqs[q] /\ p < q
\* All processes whose head msg to dst is of this type
ChanHead(dst, type) ==
{src \in Pid: /\ Len(channel[src][dst]) > 0
/\ Head(channel[src][dst]).type = type
}
\* Max of two times
Max(a, b) == IF a <= b THEN b ELSE a
\* Instrumentation control
PrintDebug == FALSE
end define
\*---------------------------Instrumentation Macros------------------
\* Print if debugging turned on
macro Print(clock, vars) begin
if PrintDebug then
print <<self, clock, vars>>
end if
end macro
\*-----------------------------Network operations--------------------
\* Wait for a `type` message and return its contents
macro Receive(type, clock, src, time) begin
with s \in ChanHead(self, type) do
src := s;
time := Head(channel[src][self]).time;
Print(clock, << <<"Receiving", type>>, src, time>>);
channel :=
[channel EXCEPT ![src][self] = Tail(channel[src][self])]
end with
end macro
\* Broadcast a message
macro Broadcast(clock, msg) begin
Print(clock, <<"Broadcasting", msg.type>>);
channel :=
[channel EXCEPT ![self] =
[dst \in Pid |->
IF dst = self THEN channel[self][self]
ELSE Append(channel[self][dst], msg)]]
end macro
\* Send a message to a specific process
macro SendTo(clock, dst, msg) begin
Print(clock, <<"Sending", dst, msg.type>>);
channel :=
[channel EXCEPT ![self][dst] = Append(channel[self][dst], msg)]
end macro