Lamport's 1978 mutual exclusion algorithm in PlusCal

Fourth in a series on using the TLA Toolbox to explore the failure tolerance of distributed algorithms. The previous post was a specialized discussion of the fairness semantics of PlusCal. The post preceding that was a general discussion of writing distributed algorithms in PlusCal. The next post presents the results of simulation runs on the algorithm.

The series will be organized around Lamport’s mutual exclusion algorithm, described in his 1978 paper. This algorithm is not so much a primary contribution of that paper as it is a demonstration of the use of the paper’s main contribution, logical clocks. The algorithm is not suited to production use, due to its assumption of perfect instances communicating over a perfect network. Yet these assumptions make it ideal for the purposes of this series, as we can use TLC to demonstrate how the algorithm fails when they are violated.

Stephen Merz contributed a TLA+ version of the algorithm to the TLA example library. The version here is derived from his, rewritten in PlusCal and slightly modified.

The algorithm, with the bodies of the defines and macros elided (they are provided in the Appendix), follows:

--------------------- MODULE LogicalClocks ------------------------

(*
  Lamport's mutual exclusion algorithm from "Time, Clocks, and the
  Ordering of Events in a Distributed System".
  
  The algorithm assumes perfect, in-order message transmission and
  reliable (i.e., non-Byzantine, non-crashing) processes.

  See also the formalization of the same algorithm by Stephan Merz:
  github.com/tlaplus/Examples/tree/master/specifications/lamport_mutex
*)

EXTENDS Naturals, Sequences, FiniteSets, TLC

CONSTANTS N,          \* Number of processes
          MaxClock    \* Maximum clock value

\* -------------------------- Types -------------------------------
\* Process identifier
Pid == 1 .. N
(*
  Range of acceptable clock values
  One more than MaxClock because receiving a message with timestamp
  MaxClock can push a process's clock to MaxClock+1.
*)
ClockVal == 0 .. MaxClock+1
\* A message (with three subtypes)
Message == [time: ClockVal, type: {"Request", "Release", "AckReq"}]
\* The 1-1 communication channels between every process
Channel == [src \in Pid |-> [dst \in Pid |-> Seq(Message)]]

(******************************************************************
--algorithm LogicalClocks
variables
  (*--------------------------- Network model----------------------
    Messages in transit between processes.  Messages are guaranteed
    to be delivered in the order in which they were sent.
  *)
  channel = [source \in Pid |-> [destination \in Pid |-> <<>>]],

  (*--------------------------System Model-------------------------
    All processes currently in the critical section
  *)
  crit = {}

\*-----------------------------Operators---------------------------

define
  \* Total order of clocks using Pid to break ties
  LogClockLt(reqs, p, q) == \* ...
  \* All processes whose head msg to dst is of this type
  ChanHead(dst, type) == \* ...
  \* Max of two times
  Max(a, b) == \* ...
end define

\*-----------------------------Network operations------------------

\* Wait for a `type` message and return its contents
macro Receive(type, clock, src, time) begin
\* ...
end macro

\* Broadcast a message
macro Broadcast(clock, msg) begin
\* ...
end macro

\* Send a message to a specific process
macro SendTo(clock, dst, msg) begin
\* ...
end macro

\*-----------------------------System model operations------------

macro EnterCritSec() begin
  crit := crit \union {self}
end macro

macro ExitCritSec() begin
  crit := crit \ {self}
end macro

\*-------------------------------Algorithm------------------------

process Proc \in Pid
variables
  \* --- Algorithm state ---
  \* Logical clock
  clock = 1,
  (*
    The processes that have acknowledged this process's request,
    if it has one
  *)
  acks = {},
  \* Request times this process has received (including its own)
  requests = [pid \in Pid |-> 0],
  \* --- Instrumentation ---
  \* Number of times this process has entered critical section
  ownerships = 0,
  \*--- Temporary variables---untyped, uninitialized
  time,
  src

begin \* process
  loop: while TRUE do
     either
     \* --- Manage this process's entry to crit sect ---
        \* Request ownership
        when requests[self] = 0;
          Broadcast(clock, [time |-> clock, type |-> "Request"]);
          requests := [requests EXCEPT ![self] = clock];
          acks := {self}
     or \* Receive acknowledgement of this process's request
        Receive("AckReq", clock, src, time);
          clock := Max(clock, time);
          acks := acks \union {src}          
     or \* If at head of request queue, take ownership
        when /\ self \notin crit
             /\ acks = Pid
             /\ \A p \in Pid: p # self =>
                                  LogClockLt(requests, self, p);
          EnterCritSec();
          ownerships := ownerships + 1
     or \* Release ownership
        when self \in crit;
          requests := [requests EXCEPT ![self] = 0];
          ExitCritSec();
          acks := {};
          Broadcast(clock, [time |-> clock, type |-> "Release"])

     \* --- Record other process' requests to enter crit sect ---
     or \* Receive a request and acknowledge receipt
        Receive("Request", clock, src, time);
          requests := [requests EXCEPT ![src] = time];
          clock := Max(clock, time);
          (*
            increment clock in msg to reflect addition that will
            occur at loop's end
          *)
      L2: SendTo(clock, src, [time |-> clock+1, type |-> "AckReq"])
     or \* Receive a release
        Receive("Release", clock, src, time);
          clock := Max(clock, time);
          requests := [requests EXCEPT ![src] = 0];
     end either;
     tic: clock := clock + 1
   end while;
end process

end algorithm

The algorithm is written based on conventions for PlusCal I have developed:

The first two were detailed in an earlier post. I will describe the last two in more detail next.

Conventional layout for every alternative

Lamport’s algorithm follows a common structure for distributed algorithms: An infinite loop whose body is a list of alternatives, with each alternative comprising a guard followed by updates to process variables and messages. To highlight the dependency of the updates on the guards, I indent the updates a further two spaces from the guards.

The guards built from when statements are pure expressions that do not modify any state. On the other hand, guards using the network Receive macro both wait on a condition and change the global network state when that condition is satisfied, removing the frontmost message to the process and returning the id of its sender and its sending time in the local logical clock as process local variables src and time, respectively.

Categorization of variables according to function

My last PlusCal convention is to categorize the global and local variables according to distinct functions. I find this makes the program more readable and I use the language more effectively.

For global variables:

Network model
Global variables representing the state of the network.
System model
Variables that model the state of the system as a whole, apart from the network. These variables are often used to represent parts of system state that are implied elsewhere but not directly represented. Representing these elements of the state as system model variables allows the analyst to write invariants and behaviours that check them.

For example, the global crit is added to the Lamport algorithm to indicate the set of all processes in their critical section. In the formal algorithm, this variable is not used but is only implied. Making this implied state an explicit global variable allows the analyst to represent the important safety property that at most one process is ever in the critical section by the simple invariant Cardinality(crit) < 2.

For local variables:

Algorithm state
The state of a distributed algorithm is represented in local variables (by definition, a distributed algorithm cannot rely on global state).
Instrumentation
Variables used to analyze and debug the algorithm. In this algorithm, ownerships counts the number of times a given process has entered the critical section.
Temporary variables
Variables local to individual actions, recording temporary state rather than the state of the algorithm.

A PlusCal program also has runtime variables, implied in the source code and made explicit by the PlusCal translator in the resulting TLA code:

pc
The program counter, indicating the currently-enabled state. This is present in the translation of any PlusCal program.
stack
The stack of return states and local variable values. This is only present in translations of PlusCal programs that include one or more procedures.

You can use the distinction between algorithm variables versus instrumentation and temporary variables to accelerate TLC model checking using a view. You only need the variables essential to the system and algorithm’s state, together with any runtime variables inserted by the PlusCal translator; by not including the instrumentation or temporaries in the view, you reduce the size of the state space the checker must explore. For this algorithm, the view definition includes the network model channel; the system model crit; the algorithm state clock, acks, and requests; and the runtime variable pc:

\* Essential variables to be monitored by TLC
View == <<channel, crit, clock, acks, requests, pc>>

The algorithm

The basics of the algorithm are described in Lamport’s paper, so I will not repeat them here. The basic structure is an infinite loop with an either selecting between six alternatives. I have structured this version so that the first four alternatives deal with the successive stages of a process using the critical section (request ownership, receive an acknowledgement that another process has received that request, take ownership when it’s this process’s turn, release ownership), followed by the two alternatives for processing requests from other processes (acknowledge receipt of a request, record an ownership release).

All of the alternatives but one are a single, atomic action. The lone exception is the action to record a request from another process. Because that action requires both receipt of the request message and sending of an acknowledgement message, the channel variable in the network model is updated twice. Consecutive assignments to the same variable have to occur in different steps, so the statement sending the reply must be prefixed by a label, L2.

The extra label is an artifact of my use of macros. If the step were written instead using basic PlusCal, the two updates to channel could be merged and the step would become atomic. I think the macros substantially improve readability of the code, readability that is not worth giving up for modest gains in speed of model checking from removing the extra state. Furthermore, as the series proceeds I will take advantage of the macros to insert failure testing without modifying the text of the algorithm.

Instrumentation

In addition to the variables essential for the process, this version has some modest instrumentation. The Print macro displays the current process’s state on the TLC console. The PrintDebug TLA operator specified in the define statement controls whether the macro actually prints. The default is not to print but it can be overridden in the TLC console for specific runs.

The process also includes a variable ownerships that records the number of times this process has entered the critical section. This can be output via the Print macro, although none of the current macro calls include the variable.

Assumptions

The algorithm assumes an idealized system, in which processes never fail and all messages are delivered in order. Although these assumptions are fine for its intended use in the original paper, where it served as an example of the style of reasohing about distributed clocks that Lamport presented, these assumptions make it of only the most limited practical use. I cannot imagine many actual message-passing environments where no process will ever fail and message delivery is perfect. Perhaps some of the MPI-based systems used in the High Performance Computing community are that reliable but in the community building high-throughput distributed systems for modern datacentres, designers are taught to expect processes to fail and networks to be unreliable. I do not see how any new layer, whether below or above the Lamport algorithm, could allow the algorithm to maintain safety in the presence of these errors.

Testing the algorithm’s behaviour

The next step is to run the algorithm in the TLC model checker. With that baseline established, I will begin running the algorithm in environments where the assumptions of perfect processes and networks are relaxed. The ways that the algorithm breaks will give insights on how the algorithm works.

Appendix: Defines and macros

Bodies of the defines and macros:

define
  \* Total order of clocks using Pid to break ties
  LogClockLt(reqs, p, q) ==
    \/ reqs[q] = 0
    \/ reqs[p] < reqs[q]
    \/ reqs[p] = reqs[q] /\ p < q
  \* All processes whose head msg to dst is of this type
  ChanHead(dst, type) ==
    {src \in Pid: /\ Len(channel[src][dst]) > 0
                  /\ Head(channel[src][dst]).type = type
    }

  \* Max of two times   
  Max(a, b) == IF a <= b THEN b ELSE a

  \* Instrumentation control
  PrintDebug == FALSE
end define

\*---------------------------Instrumentation Macros------------------
\* Print if debugging turned on
macro Print(clock, vars) begin
  if PrintDebug then
    print <<self, clock, vars>>
  end if
end macro

\*-----------------------------Network operations--------------------

\* Wait for a `type` message and return its contents
macro Receive(type, clock, src, time) begin
  with s \in ChanHead(self, type) do
    src := s;
    time := Head(channel[src][self]).time;
    Print(clock, << <<"Receiving", type>>, src, time>>);
    channel :=
	  [channel EXCEPT ![src][self] = Tail(channel[src][self])]
  end with
end macro

\* Broadcast a message
macro Broadcast(clock, msg) begin
  Print(clock, <<"Broadcasting", msg.type>>);
  channel :=
    [channel EXCEPT ![self] =
      [dst \in Pid |->
        IF dst = self THEN channel[self][self]
                      ELSE Append(channel[self][dst], msg)]]
end macro

\* Send a message to a specific process
macro SendTo(clock, dst, msg) begin
  Print(clock, <<"Sending", dst, msg.type>>);
  channel :=
    [channel EXCEPT ![self][dst] = Append(channel[self][dst], msg)]
end macro