Simulating&nbsp;Tenderbake

If you’re impatient, you are welcome to read the guide and jump right to the simulator now. See also the implemented algorithms below.

If you find that reading about it helps you get excited before using your simulator, then read on …

Background

The consensus algorithm is a crucial part of any blockchain project. Because of the distributed nature of blockchains, different nodes can have different ideas of what the current state of the blockchain is supposed to be. The role of the consensus algorithm is to decide which of these possible states, called forks or branches, will be selected globally. In the world of distributed systems, there are two distinctive families of consensus algorithms: Nakamoto-style and classical BFT-style. Most blockchain solutions use Nakamoto-style algorithms that allow the existence of any number of forks of any length, but make longer forks increasingly unstable, so that they eventually collapse to a single branch. We say that these algorithms have probabilistic finality. Classical Byzantine fault tolerance (classical BFT) algorithms refer to a class of consensus algorithms which offer deterministic finality over asynchronous networks (you can check these two classical papers for more details The Byzantine Generals Problem and Practical Byzantine Fault Tolerance). They stipulate definite conditions that must be fulfilled for a block to become final.

Nomadic Labs intends to propose Tenderbake (blog post; paper) — a classical BFT-style algorithm — as the next consensus algorithm of Tezos. Its deterministic finality allows us to make solid claims about the period of time that should pass for a transaction to become final. In Tenderbake, a block becomes final when there are two blocks on top of it; this is the only condition. So, if the system produces one block per 15 seconds, a transaction will become final in about 30 seconds. This is the kind of performance that users expect from a successful blockchain solution.

Historically, Tenderbake started as a variant of Tendermint and was subsequently adapted to fit into the existing Tezos system. However, the description in the Tenderbake paper and its nascent implementation in the Tezos codebase started to drift apart. Thus, for a person starting working on Tenderbake, reading the paper is not enough. It is also not ideal to study the algorithm by reading the Tezos code, since the consensus algorithm is by design implemented in close cooperation with other components of the system.

This is where the Tenderbake simulator project steps in.

The simulation framework

Nomadic Labs asked Tweag to develop a framework that would be general enough to model any consensus algorithm (be it BTF-style, Nakamoto-style or in styles yet to be invented) in a clear way to facilitate onboarding of newcomers and for exploration of consensus algorithms in the future. The results of our work can be found in this repository. The language of choice at Nomadic Labs is OCaml and it made sense to use it for this project. The simulator is distributed under the MIT license, like the Tezos codebase.

Principles of operation

Here’s a (simplified) description that should give the reader a flavour of the simulation framework. The framework comes with a guide that explains in detail how the simulation framework works and how to implement a consensus algorithm in it.

The framework allows us to observe the evolution of a system that comprises a collection of nodes — independent processes that do not share memory but can exchange messages by means of asynchronous broadcast that can be fine-tuned by the user if desired.

Consensus algorithms in the framework are implemented as event handlers — functions that are called when an event occurs at a particular node. A call of an event handler is called an iteration. Event handlers have the following type signature:

type event_handler =
  Algorithm.params ->
  Time.t ->
  Event.t ->
  Signature.private_key ->
  Algorithm.node_state ->
  Effect.t list * Algorithm.node_state

Let’s go over the arguments of the function:

Algorithm.params is the parameters of the algorithm such as e.g. round duration in seconds.
Time.t is the current time.
Event.t is the event the node needs to react to. Currently there are two kinds of events: reception of a message and a “wake up” call that the node can schedule for itself. Message types are defined per consensus algorithm.
Signature.private_key is the private key that every node magically knows. It is used for signing of messages. This is important because the framework allows us to program and use Byzantine versions of nodes, too.
Algorithm.node_state is the node state. The type is defined per consensus algorithm.

The return type of an event_handler is an effect list and an updated node state. An effect in the list can be one of the following:

Broadcast a message to all nodes.
Schedule a wake up call.
Shut down.

Testing

We have written two kinds of tests: stress tests and scenario tests.

Stress tests are about letting an algorithm run for a number of iterations with a large enough network and realistic propagation of messages, including messages getting lost and messages arriving out of order. The framework allows us to specify a set of predicates that must hold at each iteration in such a test. This way we can determine if an algorithm satisfies liveness and safety properties (see also the Liveness and Safety Wikipedia articles). According to the tests, all models that we have written satisfy both Liveness and Safety.

Scenario tests are about adjusting propagation of messages and/or lifetime of nodes in order to model a situation of interest. We can then inspect execution logs and check whether the nodes behaved in the expected way. It is easy to do because simulations return typed logs that we can pattern match on.

Implemented algorithms

We have implemented four consensus algorithms (listed below roughly in order of increasing complexity), and applied stress tests and scenario tests as discussed above:

Leader election, see src/leader_election.
Ouroboros (the simple BFT version), see src/ouroboros.
Emmy⁺, see src/emmy_plus; this is the current consensus algorithm used by Tezos.
Tenderbake, see src/tenderbake; this is the algorithm Nomadic Labs is planning to propose as a future amendment to the Tezos blockchain.

Every algorithm is explained in its README.md. Our focus was not only explaining how a particular algorithm works in principle, but also how it translates to code that the simulator framework can run.

Conclusion

The simulator has already proven useful. People who have tried it out report that it has helped them to understand the algorithm of Tenderbake and experiment with it. In the future, we can expect new consensus algorithms to be implemented and explored using this framework. Please, feel free to give it a try and contribute!

Simulating Tenderbake