Fairness semantics of PlusCal
18 Jan 2019 Tags: Distributed systemsThird in a series on using the TLA Toolbox to explore the failure tolerance of distributed algorithms. The second post presented some PlusCal conventions for writing distributed algorithms. The next post presents Lamport’s (1978) mutual exclusion algorithm in PlusCal.
Note: This post covers an esoteric aspect of PlusCal semantics. I include it here because I need to understand this topic myself before writing the rest of the series. It is not necessary to know this topic before reading the rest of the series, however. On first reading, I suggest moving directly to the next post, returning to this post only when you need these details.
The TLA translation of PlusCal
PlusCal isn’t the native language of the TLA Toolbox. As the name implies, TLA is the common notation used by the editor, the TLC model checker, and the TLAPS proof checker. In particular, an algorithm in PlusCal must be translated to TLA before it can be checked by TLC. When checking an algorithm, all of TLC’s reports, including errors, traces, and coverage, are presented in terms of the underlying TLA.
Consequently, you can only use TLC effectively if you understand how a PlusCal algorithm is represented in TLA. The full explanation is beyond the scope of this series. In short, you need to learn both PlusCal and TLA, then read the TLA produced by the translation of your PlusCal algorithms. The details take some time to learn but the translation is straightforward and can be mastered with modest practice.
However, there is one subtle point of PlusCal semantics that is not
straightforward: Its representation of fairness. For many PlusCal use
cases, you do not need to understand this. If you want to check
termination or a liveness property in TLC, simply prefix each process
with the fair
keyword and every transition is asserted to be weakly
fair. For typical uses of model checking, this is sufficient.
However, in this series, I will be exploring the consequences of algorithms for which only some transitions are fair. This models such behaviours as instances that crash while in a critical section. For these use cases, it helps to understand how the TLA produced by the translation of an algorithm represented in PlusCal differs from how that same algorithm might be represented in “classic” TLA.
A PlusCal algorithm and its TLA equivalent
As an example of the difference between the two notations, consider
the following PlusCal algorithm. It doesn’t do anything particularly
interesting, consisting of a loop with two alternatives. The loop
increments a
from 1 to 3 to 4, then terminates:
--algorithm PlusCalFairnessSemantics
variable
a = 1;
fair process Proc = 1
begin
P1: while a <= 3 do
either
await a = 1;
a := a + 2
or
await a = 3;
a := a + 1
end either
end while
end process
end algorithm
The equivalent version of this algorithm in “classic” TLA would name each alternative and provide an explicit termination transition:
VARIABLES a
vars == << a >>
Init == a = 1
A1 == /\ a = 1
/\ a' = a + 2
A3 == /\ a = 3
/\ a' = a + 1
Tm == /\ a > 3
/\ UNCHANGED vars
Next == A1 \/ A3 \/ Tm
Spec == Init /\ [][Next]_vars /\ WF_vars(Next)
Termination == <>([](a > 3))
In the TLA version, each state of a
is represented by a separate
named formula, Init
, A1
, A3
, and Tm
, comprising a guard that
enables the action for a specific value of a
and an equality
specifying the new value replacing it. In this version, the
algorithm is sequenced by the successive values of a
.
Now consider the TLA equations produced by the translation of the PlusCal algorithm:
VARIABLES a, pc
vars == << a, pc >>
ProcSet == {1}
Init == (* Global variables *)
/\ a = 1
/\ pc = [self \in ProcSet |-> "P1"]
P1 == /\ pc[1] = "P1"
/\ IF a <= 3
THEN /\ \/ /\ a = 1
/\ a' = a + 2
\/ /\ a = 3
/\ a' = a + 1
/\ pc' = [pc EXCEPT ![1] = "P1"]
ELSE /\ pc' = [pc EXCEPT ![1] = "Done"]
/\ a' = a
Proc == P1
Next ==
\/ Proc
(* Disjunct to prevent deadlock *)
\/ /\ (\A self \in ProcSet: pc[self] = "Done")
/\ UNCHANGED vars
Spec == /\ Init
/\ [][Next]_vars
/\ WF_vars(Proc)
Termination ==
<>(\A self \in ProcSet: pc[self] = "Done")
This representation only has two named definitions, Init
and P1
and introduces a variable not present in the PlusCal code, pc
.
Unlike the “classic” TLA version, in the PlusCal translation, the
algorithm is sequenced by a variable, pc
, whose only purpose is
sequencing.
There is a second difference between the two TLA versions. In the
PlusCal translation, the definition of P1
combines the formulas A1
,
A3
, and Tm
from the “classic” TLA version. Though their structure
appears different, they are equivalent. After stripping the references to
pc
, the definition of P1
becomes
IF a <= 3
THEN \/ /\ a = 1
/\ a' = a + 2
\/ /\ a = 3
/\ a' = a + 1
which, after removing the redundant test, is just
\/ /\ a = 1
/\ a' = a + 2
\/ /\ a = 3
/\ a' = a + 1
which in turn is simply A1 \/ A3
from the “classic” version.
So in this case, the PlusCal version is simply the “classic” version,
annotated with sequencing by the pc
variable.
Unfair transitions: When PlusCal differs from “classic”
Although in the above case “classic” TLA design and PlusCal produce equivalent TLA code that differs only in whether sequencing is done implicitly (in the “classic” version) or explicitly (in PlusCal), they are not equivalent in other cases. When we specify that some transitions are unfair, the forms become subtly different.
Suppose we want to indicate that the A3
transition is unfair. In the
“classic” version, we do not change the transitions at all but simply
change the specification so that only A1
and Tm
are marked as fair:
Spec == /\ Init /\ [][Next]_vars
/\ WF_vars(A1 \/ Tm)
The change to the PlusCal code is even smaller: Simply prefix the
guard with a label suffixed by -
:
L1:- await a = 3;
a := a + 1
This small change to PlusCal however induces a substantive change in its TLA translation:
P1 == /\ pc[1] = "P1"
/\ IF a <= 3
THEN /\ \/ /\ a = 1
/\ a' = a + 2
/\ pc' = [pc EXCEPT ![1] = "P1"]
\/ /\ pc' = [pc EXCEPT ![1] = "L1"]
/\ a' = a
ELSE /\ pc' = [pc EXCEPT ![1] = "Done"]
/\ a' = a
L1 == /\ pc[1] = "L1"
/\ a = 3
/\ a' = a + 1
/\ pc' = [pc EXCEPT ![1] = "P1"]
Proc == P1 \/ L1
Next ==
\/ Proc
(* Disjunct to prevent deadlock *)
\/ /\ (\A self \in ProcSet: pc[self] = "Done")
/\ UNCHANGED vars
Spec == /\ Init /\ [][Next]_vars
/\ WF_vars((pc[1] # "L1") /\ Proc)
The code has changed in three ways:
- A new state has been introduced, denoted by
pc = "L1"
. - A new labelled formula has been created,
L1
, which is enabled by the new value ofpc
. - The fairness condition now excludes the new state. Consequently,
when transition
L1
is enabled, it is not guaranteed to ever be executed.
Logically, this is equivalent to the “classic” TLA version. In both
versions, the transition when a = 3
is not guaranteed to be taken
even if it is enabled infinitely many times. However, they differ in
their number of states: The PlusCal version introduces an extra state
to mark the unfair transition, whereas the unfair TLA version has only
as many states as its fair counterpart.
The size of the state space representing potential executions of the algorithm has a direct effect on the time required to model check it. This effect is amplified when checking liveness properties, which require extra passes over the state space after it has been completely generated. Although in this example, the space has only increased by a single state, if the label is placed within a frequently-executed loop, the increase could be substantial.
Conclusion
The TLA translation of a PlusCal algorithm differs from how one would
represent the same algorithm in “classic” TLA. All fair transitions have
their common conditions merged to create a single labelled transition,
while any unfair transitions have a new state defined for pc
and the
transition defined as a separately-labelled transition.
Although this representation appears quite different from the
distinct, focused definitions used in “classic” TLA, their logic is
equivalent. They do perform differently in model checking, however.
The new values of pc
introduced for unfair transitions by the
PlusCal translation increase the time required by TLC to check the
model. Given that this choice is built in to the PlusCal translator,
we cannot remove this problem but it is useful to be aware of its
impact.