A Deep Dive into the Octez&nbsp;Prevalidator

In this blog post we’ll describe recent work on improving the Tezos Octez prevalidator by making it faster and more resilient, and outline our plans for the future.

A brief map of Tezos, situating the prevalidator
Updating the prevalidator
- How we had to …
- … because of consensus operations
A primer on Tezos Prevalidation
Why the propagation of consensus operations should be as fast as possible
An operation on Tezos (from the shell’s point of view)
Propagation of endorsements arriving too early
Making the prevalidator harder
Testing the prevalidator
The future of the Octez prevalidator

A brief map of Tezos, situating the prevalidator

Let’s outline how the prevalidator fits into the Tezos blockchain workflow:

The Tezos blockchain is a chain of blocks — going right back to the genesis block on 30 June 2018 at just after 5pm.
Each block is a sequence of operations (e.g. “Credit 1 tez from my account to yours”; “credit 5 tez from my account to the supermarket”; “grab this nifty NFT from my favorite marketplace”; …)
Users propose operations, which are hashed, and the hashes are bundled into small packages called mempools, which are gossiped¹ across the Tezos peer-to-peer (p2p) network.
The Tezos prevalidator is a software component that mediates between the peer-to-peer (p2p) network communications layer (which gossips information across the network) and the Tezos economic protocol (which decides which operations are valid and thus potential candidates for inclusion in blocks on the blockchain).
A Tezos baker packages operations up into blocks, and then adds the block to the chain.
The Tezos consensus algorithm does the work of organizing consensus across the network on which blocks get included in the final, official blockchain history.

In a little more detail, the prevalidator does several things:

It scans incoming mempool data for unknown operation hashes, and if it sees one such hash then it may request the full contents of that operation.
It may invoke the economic protocol to assess the prevalidity of operations, and accordingly store them in appropriate reservoirs.
It stores known (but not-yet-included-in-blocks) operations as described below.
It may pass operations back to the p2p layer for distribution, with or possibly without a prevalidity check.
A Tezos baker may dip into the prevalidator’s reservoir of prevalidated operations, looking for operations to bake into new blocks.

To paraphrase Animal Farm: every part of the Tezos blockchain is equally critical, but the prevalidator is more critical than most. So it’s important that the Tezos Octez prevalidator should be correct, efficient, and resilient.

Updating the prevalidator

How we had to …

Since the Octez code went live with the launch of the Tezos mainnet over 3 years ago, we have tended to avoid changing the prevalidator’s code, except for bug-fixes and some code linting.

The prevalidator is a critical component and an error in it could crash the network. It ain’t broke, so we didn’t try to fix it. Nevertheless, while the prevalidator has served us well, it does need to be improved.

A prototype re-implementation of the prevalidator was considered, but we decided it would be safer to make careful, incremental improvements to the existing code base, and since April 2020 we have undertaken significant efforts to modernize its implementation and (just as importantly) to increase the prevalidator’s test coverage.

One of the key functions of the Octez prevalidator is to make sure that consensus operations — endorsements² — are propagated sufficiently quickly and efficiently, even in adverse conditions. This can be subtle because it has to do with the behavior of complex interacting systems, and this turns out to be a rich source of failure modes of the Tezos blockchain overall. Regrettably, the propagation of endorsements has been affected by a number of bugs recently, which became evident “in the wild” after activation of the Granada protocol which reduced the minimal block time from 60s to 30s.

It may seem counter-intuitive that making the time between blocks shorter could make the network overall slower — but this is the joy of designing blockchain systems. We will explain how this works below (a key point is here).

Fixing these bugs has motivated several recent releases of Octez: notably v9.2, v9.4 and v9.7, and to a lesser degree, v9.6.

… because of consensus operations

Propagation of consensus operations will be the particular focus of this article, because it is these operations’ particular interaction with the prevalidator that caused us issues over the summer.

We hope this article will explain what we are doing to get this right, and will shed some technical light on what happened with the Tezos network last summer, specifically during the activation of the Granada protocol. We will also describe current and future endeavors to make the prevalidator Harder, Better, Faster, Stronger!

Or to put it another way: we’ll explain what went wrong, what role the prevalidator played in this, and what we’re doing to update the prevalidator to ensure that this particular failure mode will not be repeated.

A primer on Tezos Prevalidation

The mempool data structure

On the Tezos blockchain, operations are gossiped using a container data structure called mempool, which can roughly³ be described as a collection of operation hashes. These operation hashes are advertised in a single CurrentHead network message which contains

a block header — which is usually the one from the head of the Tezos chain as seen by the sender node when it broadcasts its current head — and
a mempool data structure containing the hashes of some operations that the sender node wants to broadcast.

Initial processing: to `Pending` or not to `Pending`

On reception of this message, a Tezos node running the Octez Tezos implementation does two things:

it passes the block header to its block validator — the shell component in charge of block validation — and
it passes the mempool to its prevalidator.

The prevalidator processes the incoming mempool, finds any hashes which are unknown to the receiving node, and sends a request to the sender node for the full content of the corresponding operations. This saves on bandwidth, since network hashes are far more compact than full operations.

The prevalidator performs some basic checks on a new operation that it is well-formed according to the currently-active economic protocol: for example it parses the operation and checks that the data is not gibberish and that the operation would pay enough fees. (What it does not do at this stage is apply the larger and more expensive apply_operation method from the economic protocol; see next paragraph.) If the incoming operation passes these basic checks, then it is marked as Pending by the prevalidator.

Further classification of `Pending` operations into reservoirs

Operations which have passed basic checks to become Pending operations, are collected in a set of operations⁴ called pending — so pending is the set of operations with Pending status — that are next in line to be prevalidated using the apply_operation method of the economic protocol.⁵ To ensure fairness with respect to the other components of the node, pending operations are prevalidated in batches of fixed maximal size.

The result returned by apply_operation enables the prevalidator to classify each operation in a batch as follows:⁶

Applied: the operation can be applied in the current context (meaning: the ledger state, as seen by the node in question). However, the application of the operation might still fail later, in the event a baker decides to include the operation in a block.
Branch delayed: the operation cannot be included in the next head of the chain, but it could be included in a descendant. Example: a manager operation⁷ with a counter value in the future of the expected one.
Branch refused: the operation cannot be included in the next head of the chain, nor in a successor, but it might be applied on a different branch if a reorganization happens. Example: a manager operation whose counter value is in the past of the current counter.
Refused: the operation is impossible and should be rejected. It is not valid, and there is no alternative chain in which it might be. Example: an operation with an invalid signature.

The prevalidator stores the operations in different reservoirs⁸ according to this classification.

Flushing the reservoirs

Regularly — which in normal operation of the current economic protocol means every 30 seconds — the node’s chain will change (e.g. by adding another block), and the block validator will update its head block. This means that the current block state has changed so that prevalidity of operations — and the ensuing classification — in the current state of the prevalidator might be outdated and need to be revised. The node triggers the prevalidator to perform a flush event (also called a recycling event) in which:

each operation classified as Applied or Branch delayed and not yet included in a block is reverted to a Pending state, and
each operation classified as Branch refused is reverted to a Pending state, provided that the new head is not a direct successor of the previous one; that is, if the node switched branches — otherwise they remain classified as Branch refused.

These newly pending operations should then be re-prevalidated.⁹

And the cycle repeats …

Our description so far has been a high-level, somewhat idealized account of the prevalidator’s execution logic. It omits some details, including protections and counter-measures against DDoS attacks from malicious peers. One of these protections will feature in what follows, because it impacts the propagation of consensus operations and will turn out to be fundamental to our story.

Why the propagation of consensus operations should be as fast as possible

The Tezos protocol has relied to date on Proof of Stake consensus algorithms from the Emmy family. The currently-active consensus protocol is Emmy*, which has been running on-chain since the activation of the Granada protocol (at block 1,589,248 on 6 August 2021).

Consensus algorithms à la Emmy come with a special kind of operation called an endorsement, whose purpose is to facilitate reaching consensus faster. This blog post digs deeper into how consensus algorithms work and the role endorsements play in effectively securing the Tezos blockchain. The details relevant to us here are:

An endorsement operation included in a block at level \(n+1\),¹⁰ endorses the predecessor block at level \(n\). Hence, a baker cannot include an endorsement in its block without having seen and validated its predecessor block at level \(n\).
The minimal block delay (the protocol-mandated time gap between two consecutive blocks at levels \(n\) and \(n + 1\))¹¹ depends on the number of endorsements included in the block at level \(n + 1\). With Emmy*, the time between two blocks is \(30\) seconds for a priority \(0\) block that contains at least \(60\%\) of all possible \(256\) endorsements — that is \(154\) endorsements. With (say) \(150\) endorsements, the delay escalates to \(60 + 4\cdot (192-150) = 228\) seconds — nearly \(4\) minutes!

Thus including more endorsements in a block means

a shorter minimal block delay (i.e. the block can be baked earlier), and so
a greater chance that this block (rather than any competing blocks from other block producers) will get included in the final blockchain history, and also
higher block rewards payed to the baker if the block gets included in the final blockchain history.

If endorsements are validated and propagated quickly across the network, from endorsers to bakers, then blocks can be produced with optimal speed. And as a corollary: it is very important that endorsements arrive promptly relative to the minimal block delay, because if endorsements consistently fail to arrive in time for the minimal block delay then — by design — the block delays as imposed by the consensus algorithm may substantially increase.

And this was how reducing the minimal block delay could potentially make the network slower, if the endorsements start arriving “late” relative to this new minimal block delay. Thus, as we reduce the minimal block delay in search of greater speed and efficiency, it becomes increasingly important that we expedite the passage of endorsements across the network. This in turn requires us to tweak the prevalidator to recognize and prioritize endorsement operations, and — such is the reality of programming complex systems — this requires much careful and precise work which we will now discuss.

An operation on Tezos (from the shell’s point of view)

The Tezos protocol distinguishes between

the (economic) protocol, and
the shell.

This is summed up in a well-known octopus diagram in the Tezos architecture description.¹²

The economic protocol defines the blockchain protocol’s execution logic (consensus algorithms, what kinds of operations are available, smart contracts, …). This can be updated and is subject to self-amendment.
The shell handles the lower level, more mundane tasks: p2p gossiping, context storage, maintaining a distributed database of known blocks and operations. And — key to our story today — the shell triggers the validation of blocks and the prevalidation of operations, by choosing blocks and operations and applying appropriate validation functions from the economic protocol.

The prevalidator is a component of the shell. It makes calls to functions from the economic protocol to help it decide whether operations are valid — but the prevalidator itself is agnostic about precisely how the economic protocol decides.

This agnosticism implies that the Octez suite must implement the shell parametrically over the economic protocol and in such a way that it can deal seamlessly with updates to the protocol. One particular design consequence of this is that the shell must take a rather high-level and generic view of the internal content of certain data structures like operations and blocks, so that there will be an abstract and generic view for the shell, and then a specific view for each specific economic protocol.¹³ Thus, from the shell’s perspective, we have roughly the following (permalink):

(* The branch of an operation is the block hash of the block upon which
   the operation was forged. *)
type shell_header = {branch : Block_hash.t}

(* An operation from the shell point of view is a branch and a list of bytes.
   Only the economic protocol knows how to interpret those bytes. *)
type t = {
    shell : shell_header; proto : Bytes.t}

The branch of an operation is then a block hash. It determines the lifespan of an operation: if the block hash is too old, the operation can be discarded. The current lifespan on Emmy* is \(120\) blocks,¹⁴ which (assuming a normal 30s per block) translates to about one human hour.

For the prevalidator, the branch field is also used as an anti-spam filter: if the block hash is unknown, then the operation can be safely discarded. In this context, unknown means “a block that has not been validated yet”.

The proto field is then — in the prevalidator’s eyes — a mere list of Bytes. The economic protocol has the key to unlock the secret of whether that operation is an endorsement or something else. And this detail is central to the plot of the next section.

Propagation of endorsements arriving too early

Granada (the currently active economic protocol) and also the Hangzhou protocol proposal, require that the branch field of an endorsement operation point to the block being endorsed. This requirement is due to a pre-existing legacy format for endorsements,¹⁵ but the requirement can hinder the propagation of endorsements by the prevalidator, in the following corner case:

Suppose that the node receives a new CurrentHead message from the network: this consists of a block header and a mempool. Suppose that this block header is not known to the receiving node, and suppose that the mempool includes operations which are endorsements of that very block. In this corner case, it may be that the prevalidation of the received mempool completes earlier than the validation of the received block (even if only by a second or two).¹⁶ The block header only needs to be validated once, when it is first seen — but this is that first time. In this case, any endorsements for that block that may be contained in the mempool message, will be discarded and not propagated, since the branch of the endorsement is precisely the not-quite-yet-validated block.

Other kinds¹⁷ of operations, like manager operations, avoid this problem by not branching on the immediate predecessor, but rather on an ancestor block three or four levels behind.

Addressing the race between the validator and the prevalidator in a scenario of simultaneously receiving a new block header and an endorsement operation for that same block, required small patches to the shell and to the economic protocol:

On the shell side, the prevalidator now asks the economic protocol whether the freshly-received operation is a valid endorsement, before checking if the operation was branched on a known block — that is, before checking if the block header targeted by the branch field of the endorsement operation is indeed a member of the set of recent live_blocks known to the node. In this case the prevalidator classifies the endorsement as Branch refused, and triggers the advertisement of its current head and of a mempool including the not-yet-validated newly-found endorsement.
On the protocol side, the apply_operation function now checks whether the signature of an endorsement has been verified, before checking whether its branch is known. This avoids propagating endorsements with bad signatures (i.e. it avoids sending spam).

These changes were both included in Octez v9.2, which in addition to the new version of the node also included the “protocol snapshot” for Granada, effectively advertising to the network (and to bakers in particular) that there was a protocol candidate which could be injected.

After the release of Octez v9.2, fewer endorsements were missed. However, the small patches mentioned above created two unforeseen issues:¹⁸

The network could propagate endorsements that were too old and were already included in previous blocks (more than \(60\) blocks ago by then, \(120\) nowadays¹⁴). This is wasteful, as these operations would be Refused by the receiving peer’s prevalidator.
The execution cost of checking the signature of an endorsement, in the case that the endorsement’s level is not the one expected, is significantly higher than the execution cost of checking the signature of a well-branched endorsement (meaning one included in the successor block to its target). This is because only the public keys of endorsers for the current level are cached by the economic protocol. For the rest, checking their signatures requires a hard-drive read, which may be computationally expensive.

The first issue was fixed in v9.4, and the second issue — which is what caused nodes to slow down when Granada was activated on August 6th 2021 at 09:36 UTC — was fixed with v9.7.¹⁸

The idea behind the fixes is this: in Octez, there are filters specific to an economic protocol which can classify an operation based on its content. We tweaked those filters to only consider endorsements which are at the appropriate level.

In order to prevent such issues from recurring, we have structured our work on the prevalidator around answering three questions:

How can we test the prevalidator beyond integration tests?
How can we reliably benchmark the prevalidator?
How can we make the prevalidator faster?

We tackle test coverage first, then benchmarking, and then speed, because: test coverage catches functional regressions, benchmarking permits us to profile and measure speed, and then we are in a position to both measure performance and more safely change code to improve it.

Thus during the summer of 2021 (and learning from the experience with the activation of Granada) we focused on the first task above, which we call hardening the prevalidator — or to fit the story, making it harder. We aim to tackle the remaining two tasks during autumn 2021.

Making the prevalidator harder

The prevalidator is part of the shell as we mentioned above — but it calls functions from the economic protocol to parse operations and validate them, so that it can decide whether to propagate them. This interaction between the shell and the economic protocol, and how this separation is implemented in the codebase, is fruitful but it does make the prevalidator tricky to test — especially with unit tests.

Hence, the first stage of our work was to isolate the interface of the prevalidator with the economic protocol, by refactoring the prevalidator to separate

those components and those parts of the execution logic that just have to do with the shell, from
those components that are in the shell but interact directly with the economic protocol.

This refactoring enabled us to identify and fix a number of minor bugs, including: several memory leaks, and some minor corner cases where valid operations were not being propagated.

This was nice, but the main achievement of this work was to isolate the core of the prevalidator: a software component, reified as an OCaml module, which classifies operations correctly.¹⁹

Testing the prevalidator

Any sufficiently large software project should employ multiple testing frameworks to provide increased confidence in the implementation. This is especially true for a complex project like the Tezos Octez implementation, each of whose different components — from the lower-level grit of the distributed network to the higher-level Michelson interpreter — is an unruly beast of software in its own right, just waiting to spring bugs and regressions on the unsuspecting software engineer.

Of the many testing frameworks available to the working Tezos dev,²⁰ two are particularly relevant to our story here:

Unit and Property Based Tests for the Prevalidator using, respectively, the Alcotest and QCheck OCaml testing libraries.
Integration tests using Tezt, a custom-made testing framework for system and integration tests for Tezos, focusing on the interaction between Octez nodes and clients.

Prior to April 2021 there were few unit tests for the prevalidator because of the technical issues described in the first paragraph of this section. Our refactoring allowed us to be more rigorous and to add many unit tests for the prevalidator, enabling us to detect and fix legacy bugs. Most of the new unit tests are moreover property-based tests: they not only assert the prevalidator invariant, but also test that the root cause of certain bugs is no longer there.

An example of the latter property is the following:

Given any operation, at any time, an operation cannot be classified twice with two different classifications.

That is, an operation cannot be at the same time Applied and Branch refused.

In summary we have added:

18 property-based tests asserting correctness properties of the classification of operations, and
12 integration tests covering different scenarios in the prevalidator’s life-cycle.

Of the latter, an important integration test we have included asserts that consensus operations which arrive too early are still propagated by the node.

The future of the Octez prevalidator

After making the prevalidator harder, the next step is to make it better. To this end we aim to enhance the prevalidator, working towards two main objectives:

We aim to minimize the elapsed time between when an operation is received (either injected via RPC from a client, or gossiped from the network), and when we propagate the operation to our peers.
We aim to prioritize the handling of pending operations to give more priority to pending consensus operations, and thus prevalidate them as quickly as possible (the current prevalidator treats all pending operations equally).

The first objective will help improve operation throughput — increasing the TPS metric (transactions per second). This will require non-trivial changes to the economic protocol, which we hope to include in the forthcoming protocol proposal (for protocol I).

The second objective will empower the node to discard operations sooner, if their priority is too low. This will help ensure that critical operations (e.g. endorsements, and consensus operations in general, and also high-fee operations) will be retained by the prevalidator and propagated more effectively.

Looking further into the future, making Octez scale to cope with the ongoing increase in network traffic will require ever more efficient validation and propagation time for operations and blocks, and this will require a comprehensive effort across the Octez architecture.

Our work so far has created a harder prevalidator, as a foundation to build a better one. Faster and stronger will follow.²¹ Stay tuned!

Gossip here is a technical p2p term for ”(unsolicited) broadcast to a subset of known peers”. Gossiping contributes to global network coverage of knowledge — like an office rumor (or an epidemic). ↩
Tezos is a Proof of Stake blockchain which requires participants to vote for block candidates. These votes, called endorsements, are included in blocks and recorded on-chain, just as “regular” transactions are. We expand on this below. ↩
The current implementation (permalink) is a record, with two fields: a known_valid list of operation hashes, presumably prevalidated by the node; and a pending set consisting of operations of unknown prevalidity, or invalid-yet-safe-to-broadcast operations. ↩
We mean set here in the mathematical sense: an unsorted, collection with unique members. The current specification does not impose an order on the set of pending operations. However, given the nature of the implementation with pending : Operation.t Operation_hash.Map.t, in practice iteration over pending follows the lexicographical order on operation hashes. ↩
In Tezos we refer to this phase as operation prevalidation to distinguish it from the validation of operations done by the baker when producing a new block, and by a node’s (block) validator when receiving a new block. The entry point to the economic protocol is the same in both cases: the apply_operation method from the protocol API. ↩
The apply_operation function is so called because it checks whether it can apply the operation to the current ledger state (as seen by the node on which this is running) — in Tezos jargon, this ledger state is called the (blockchain) context. If apply_operation succeeds, it returns an Ok value; if it cannot do this, then it returns an Error value of a certain kind. This returned value enables the prevalidator to further classify each operation depending on whether it could be included in the next head of the chain, on the same branch later on, or possibly on some other branch, or never. ↩
Manager operations include transfers, smart contract originations, calls to smart contracts, etc. In short, any fee-paying operation in competition for block space. Manager operations have a counter associated (unsurprisingly) to their manager, which is the Tezos account which signs the operation and which pays the fees. ↩↩
The pun with “mempool” is deliberate. Readers diving into this text with a low tolerance for wordplay can fish out the word “reservoir”, and splash in the word “container” instead, and they will get on swimmingly. ↩
“Reprevalidated”: this blog post’s contribution to the English language. ↩
The level of a block is simply how many blocks are between it and the genesis block (which is at level 0). ↩
The minimal block delay is defined as a function of a block’s priority and the number of included endorsements. See the full definition in the documentation. ↩
This is also the motif in a limited series of coveted laptop stickers. ↩
You can read this article for a holistic account of the life cycle of operations, including the perspective from inside the economic protocol. ↩
The constant regulating this behavior is called max_op_ttl (“maximal operation time-to-live”). It was bumped from \(60\) to \(120\) when Granada was activated, to compensate for the halving in minimal block time and thus keeping operation lifespan (approximately) constant. ↩↩
In it, the proto field contained only the level endorsed. The rationale behind this design was to save space, as endorsements must be branched in the block immediately following the target. More compact endorsements means more operations inside blocks. And that seemed a clever win at the time. ↩
You may be wondering how the block validator can validate a block if it just has its header. The trick is that the block_header carries a lot of information (unlike the branch of operations), including the hash of the resulting context and the hash of the Merkle tree of operations. By design, a block header and the full contents of its operations, is all the information that the block validator actually needs. More on this in the documentation for the validation subsystem. ↩
From genesis to the latest Tezos protocol there have been four kinds (or classes) of operations: consensus (e.g., endorsements), voting (e.g., protocol injection and ballots), anonymous (e.g., double-baking or double-endorsement accusations), and the manager operations described above.⁷ ↩
A brief chronology is as follows: Octez v9.2 with the aforementioned patches and advertisment of Granada; then 9.3 and 9.4; then Granada activation on August 6th; then Octez v9.5, 9.6, and 9.7. ↩↩
That is, we isolated the core of the execution logic of the prevalidator, and reified this as an OCaml module called prevalidator_classification — in fact, this is technically just a data structure, albeit one containing an API, functions, and internal data structures. OCaml is an object-oriented functional programming language in which modules are first-class, so this kind of encapsulation is practical and all in a good day’s work. If you’re reading this and it blows your mind, but in a good way, then feel free to be in touch to discuss employment opportunities. ↩
More details can be found on Testing in Tezos — the entry point in the Tezos Developer Documentation for all things tests. ↩
If you liked this blog post you may also like a previous blog post on optimizing gas consumption. It describes a distinct set of optimizations in a distinct part of the Tezos universe, but it is interesting to note a resemblance in the overall workflow, described there as “make it work, make it right, make it fast”. Thank you for reading. ↩

A Deep Dive into the Octez Prevalidator