Fairness semantics of PlusCal

Third in a series on using the TLA Toolbox to explore the failure tolerance of distributed algorithms. The second post presented some PlusCal conventions for writing distributed algorithms. The next post presents Lamport’s (1978) mutual exclusion algorithm in PlusCal.

Note: This post covers an esoteric aspect of PlusCal semantics. I include it here because I need to understand this topic myself before writing the rest of the series. It is not necessary to know this topic before reading the rest of the series, however. On first reading, I suggest moving directly to the next post, returning to this post only when you need these details.

The TLA translation of PlusCal

PlusCal isn’t the native language of the TLA Toolbox. As the name implies, TLA is the common notation used by the editor, the TLC model checker, and the TLAPS proof checker. In particular, an algorithm in PlusCal must be translated to TLA before it can be checked by TLC. When checking an algorithm, all of TLC’s reports, including errors, traces, and coverage, are presented in terms of the underlying TLA.

Consequently, you can only use TLC effectively if you understand how a PlusCal algorithm is represented in TLA. The full explanation is beyond the scope of this series. In short, you need to learn both PlusCal and TLA, then read the TLA produced by the translation of your PlusCal algorithms. The details take some time to learn but the translation is straightforward and can be mastered with modest practice.

However, there is one subtle point of PlusCal semantics that is not straightforward: Its representation of fairness. For many PlusCal use cases, you do not need to understand this. If you want to check termination or a liveness property in TLC, simply prefix each process with the fair keyword and every transition is asserted to be weakly fair. For typical uses of model checking, this is sufficient.

However, in this series, I will be exploring the consequences of algorithms for which only some transitions are fair. This models such behaviours as instances that crash while in a critical section. For these use cases, it helps to understand how the TLA produced by the translation of an algorithm represented in PlusCal differs from how that same algorithm might be represented in “classic” TLA.

A PlusCal algorithm and its TLA equivalent

As an example of the difference between the two notations, consider the following PlusCal algorithm. It doesn’t do anything particularly interesting, consisting of a loop with two alternatives. The loop increments a from 1 to 3 to 4, then terminates:

--algorithm PlusCalFairnessSemantics
variable
  a = 1;

fair process Proc = 1
begin
P1: while a <= 3 do
    either
      await a = 1;
        a := a + 2
    or
      await a = 3;
        a := a + 1
    end either
  end while
end process
end algorithm

The equivalent version of this algorithm in “classic” TLA would name each alternative and provide an explicit termination transition:

VARIABLES a

vars == << a >>

Init == a = 1

A1 == /\ a = 1
      /\ a' = a + 2

A3 == /\ a = 3
      /\ a' = a + 1

Tm == /\ a > 3
      /\ UNCHANGED vars
      
Next == A1 \/ A3 \/ Tm

Spec == Init /\ [][Next]_vars /\ WF_vars(Next)

Termination == <>([](a > 3))

In the TLA version, each state of a is represented by a separate named formula, Init, A1, A3, and Tm, comprising a guard that enables the action for a specific value of a and an equality specifying the new value replacing it. In this version, the algorithm is sequenced by the successive values of a.

Now consider the TLA equations produced by the translation of the PlusCal algorithm:

VARIABLES a, pc

vars == << a, pc >>

ProcSet == {1}

Init == (* Global variables *)
        /\ a = 1
        /\ pc = [self \in ProcSet |-> "P1"]

P1 == /\ pc[1] = "P1"
      /\ IF a <= 3
            THEN /\ \/ /\ a = 1
                       /\ a' = a + 2
                    \/ /\ a = 3
                       /\ a' = a + 1
                 /\ pc' = [pc EXCEPT ![1] = "P1"]
            ELSE /\ pc' = [pc EXCEPT ![1] = "Done"]
                 /\ a' = a

Proc == P1

Next ==
  \/ Proc
  (* Disjunct to prevent deadlock *)
  \/ /\ (\A self \in ProcSet: pc[self] = "Done")
     /\ UNCHANGED vars

Spec == /\ Init
        /\ [][Next]_vars
        /\ WF_vars(Proc)

Termination ==
    <>(\A self \in ProcSet: pc[self] = "Done")

This representation only has two named definitions, Init and P1 and introduces a variable not present in the PlusCal code, pc. Unlike the “classic” TLA version, in the PlusCal translation, the algorithm is sequenced by a variable, pc, whose only purpose is sequencing.

There is a second difference between the two TLA versions. In the PlusCal translation, the definition of P1 combines the formulas A1, A3, and Tm from the “classic” TLA version. Though their structure appears different, they are equivalent. After stripping the references to pc, the definition of P1 becomes

IF a <= 3
THEN \/ /\ a = 1
        /\ a' = a + 2
     \/ /\ a = 3
        /\ a' = a + 1

which, after removing the redundant test, is just

\/ /\ a = 1
   /\ a' = a + 2
\/ /\ a = 3
   /\ a' = a + 1

which in turn is simply A1 \/ A3 from the “classic” version.

So in this case, the PlusCal version is simply the “classic” version, annotated with sequencing by the pc variable.

Unfair transitions: When PlusCal differs from “classic”

Although in the above case “classic” TLA design and PlusCal produce equivalent TLA code that differs only in whether sequencing is done implicitly (in the “classic” version) or explicitly (in PlusCal), they are not equivalent in other cases. When we specify that some transitions are unfair, the forms become subtly different.

Suppose we want to indicate that the A3 transition is unfair. In the “classic” version, we do not change the transitions at all but simply change the specification so that only A1 and Tm are marked as fair:

Spec == /\ Init /\ [][Next]_vars
        /\ WF_vars(A1 \/ Tm)

The change to the PlusCal code is even smaller: Simply prefix the guard with a label suffixed by -:

L1:- await a = 3;
  a := a + 1

This small change to PlusCal however induces a substantive change in its TLA translation:

P1 == /\ pc[1] = "P1"
      /\ IF a <= 3
         THEN /\ \/ /\ a = 1
                    /\ a' = a + 2
                    /\ pc' = [pc EXCEPT ![1] = "P1"]
                 \/ /\ pc' = [pc EXCEPT ![1] = "L1"]
                    /\ a' = a
         ELSE /\ pc' = [pc EXCEPT ![1] = "Done"]
              /\ a' = a

L1 == /\ pc[1] = "L1"
      /\ a = 3
      /\ a' = a + 1
      /\ pc' = [pc EXCEPT ![1] = "P1"]

Proc == P1 \/ L1

Next ==
  \/ Proc
  (* Disjunct to prevent deadlock *)
  \/ /\ (\A self \in ProcSet: pc[self] = "Done")
     /\ UNCHANGED vars

Spec == /\ Init /\ [][Next]_vars
        /\ WF_vars((pc[1] # "L1") /\ Proc)

The code has changed in three ways:

  1. A new state has been introduced, denoted by pc = "L1".
  2. A new labelled formula has been created, L1, which is enabled by the new value of pc.
  3. The fairness condition now excludes the new state. Consequently, when transition L1 is enabled, it is not guaranteed to ever be executed.

Logically, this is equivalent to the “classic” TLA version. In both versions, the transition when a = 3 is not guaranteed to be taken even if it is enabled infinitely many times. However, they differ in their number of states: The PlusCal version introduces an extra state to mark the unfair transition, whereas the unfair TLA version has only as many states as its fair counterpart.

The size of the state space representing potential executions of the algorithm has a direct effect on the time required to model check it. This effect is amplified when checking liveness properties, which require extra passes over the state space after it has been completely generated. Although in this example, the space has only increased by a single state, if the label is placed within a frequently-executed loop, the increase could be substantial.

Conclusion

The TLA translation of a PlusCal algorithm differs from how one would represent the same algorithm in “classic” TLA. All fair transitions have their common conditions merged to create a single labelled transition, while any unfair transitions have a new state defined for pc and the transition defined as a separately-labelled transition.

Although this representation appears quite different from the distinct, focused definitions used in “classic” TLA, their logic is equivalent. They do perform differently in model checking, however. The new values of pc introduced for unfair transitions by the PlusCal translation increase the time required by TLC to check the model. Given that this choice is built in to the PlusCal translator, we cannot remove this problem but it is useful to be aware of its impact.