Introducing Snapshots and History Modes for the Tezos&nbsp;Node

In this article, we introduce two new features for the Tezos node: snapshots and history modes.

A snapshot is a file that contains everything necessary to restore the state of a node at a given block. A node restored via a snapshot can synchronise and help other nodes synchronise in the existing network. The only difference is that you cannot query the chain context (balances, baking rights, etc.) before the restoration point, but you can still get the full chain history.

In conjunction, we also introduce history modes, which represent different policies for determining which past data a node should maintain. We propose three modes: archive (the current mode which keeps everything), full (the new default) and rolling. For now, snapshots can fire up a node in either full or rolling mode.

These new features allow a user to spawn and synchronise a Tezos node in a few minutes, from a single, untrusted file of about 150MB compressed with a truncated history, or 800MB with a full history. You can test all of that by using the mainnet-snapshots branch on Nomadic Labs’ Gitlab.

Be aware that this is not yet production ready, it would not be wise to replace your current infrastructure with nodes from this branch at this date. However, you are very welcome to experiment with it, and all reports will be useful if we want this feature to be merged in mainnet as soon as possible.

History modes

History modes allow the node to run without maintaining the full archives of the chain.

Here are the three first modes:

full nodes store all chain data since the beginning of the chain, but drop the archived contexts below the current checkpoint. In other words, you can still query any block or operation at any point in the chain, but you cannot query the balances or staking rights too far in the past.
rolling nodes are currently the most lightweight, only keeping a minimal rolling fragment of the chain and deleting everything before this fragment (blocks, operations and archived contexts).
archive nodes store everything. This corresponds to the current behaviour of Tezos nodes.

Full nodes will be the new default, as they are sufficient for almost everyone. We plan to introduce new modes in the future.

An important thing to note is that running a full node is enough to maintain the full chain history. Indeed, archive nodes do not need to use archive peers to bootstrap their archive, but only full peers, as the chain data is enough to apply the chain and construct the context archives. In other words, the network does not lose any security by switching to full as the default.

Checkpoints

To understand the technical details of history modes, let us first recall what checkpoints are in Tezos.

In the current protocol, an automatic checkpoint is cemented by the node to the block at position 0 of the 5th previous cycle.

The aim of checkpoints is to anchor the consensus in the real world at regular intervals. The block hashes from this point can easily be shared, propagated and saved outside the chain. You may have already heard of them since checkpoints were mentioned in the Tezos position paper.

When the checkpoint (CP in the picture below) is updated (currently at each cycle), alternative branches that do not contain the checkpoint (i.e those that diverged from the main chain before the new checkpoint’s level) are marked as invalid and can be safely deleted. The node does not accept reorganisations below this point.

visual representation of the checkpoint with discarded old branches

A new RPC is available in order to request the current checkpoint (as well as additional new information) of a chain.

$ tezos rpc get /chains/main/checkpoint

Full mode, the new default

The full-mode is the default mode when starting a node from scratch, or from a full snapshot.

A node running in full-mode stores the full chain data for all blocks, even the ones older than the current checkpoint. More precisely, it keeps the headers and the operations for these blocks. However, it discards the archived context and the operation and block receipts. We say that such a block information is “pruned”: we keep only the necessary bits that we got from the network, and drop everything that can be reconstructed from them.

In practice, we introduce two new tagged blocks in the history: the save point and the caboose. The save point currently mirrors the checkpoint and references the oldest block that contains all the data, i.e the oldest one that is not pruned. The caboose corresponds to the oldest pruned block. Here is a picture illustrating the full-mode initialisation.

full node history at initialisation

The save point (SP) is first initialized with the checkpoint (CP) referenced by the snapshot, and the caboose (OO) with the oldest pruned block included by the snapshot.

Each time the chain checkpoint is updated, we also update the save point and the blocks older than the new save point are pruned. Finally, the caboose stays unchanged. Here is a picture illustrating the state of the chain in full-mode after a checkpoint update.

full node history at a checkpoint update

How to use

Using tezos-run command line arguments:

$ tezos-node run --history-mode full

Or the configuration file:

{ "shell": {"history_mode": "full"} }

If you start your node for the first time, this argument is not necessary as it is now the default. However, if you upgrade from an existing archive state and want to switch to full mode, you can pass the argument to convert your archive node to a full one.

Rolling mode, the lightest

The rolling-mode is the lightest mode for now. It only conserves a pruned history for a minimal period of blocks before the current save point. The difference with the full-mode is that a rolling node also updates the caboose and deletes blocks that are older than this one.

When starting a node configured in rolling-mode, the caboose and the save point are initialized the same way as for the full-mode. Here is a picture illustrating the rolling-mode initialisation.

rolling node history at initialisation

Whenever the current checkpoint is updated, the node will also update its caboose and its save point in such a way that the distance between the new save point and the new caboose corresponds to the lifetime of operations (required to ensure proper validation of reorganisations just after the checkpoint). It will then purge its store by deleting all block information for those older than the new caboose and pruning all blocks between the new caboose (included) and the save point (excluded). Here is a picture illustrating the state of the chain in rolling-mode after a checkpoint update.

rolling node history during a checkpoint update

How to use

Using tezos-run command line arguments:

$ tezos-node run --history-mode rolling

Or the configuration file:

{ "shell": {"history_mode": "rolling"} }

In that mode, the new checkpoint RPC will also give you the save point and caboose.

$ tezos rpc get /chains/main/checkpoint

Archive mode

The archive mode aims to save all the chain data, starting necessarily at the genesis block. It corresponds to the one the nodes are currently using. The archive mode can be useful in the context of block explorers for instance.

How to use

Using tezos-run command line arguments:

$ tezos-node run --history-mode archive

Or the configuration file:

{ "shell": {"history_mode": "archive"} }

If you want to start an archive node, it is now mandatory to pass this argument the first time you launch your node.

From a mode to another

There are some restrictions when one wants to switch from a mode to another.

Going from archive to full or rolling or from full to rolling is allowed, as it is just dropping data. It is not allowed to switch from the full or rolling to archive, since the last one would require to rebuild dropped archives.

We have plan to leverage that restrictions in the future.

Snapshots

As the chain invariably grows every day, retrieving a full chain from the peer-to-peer network can be a very long process. Thanks to the implementation of history modes, it is now possible to propose an import/export feature: snapshots. This procedure allows to gather all the data necessary to bootstrap a node from a single file.

Starting a node from a snapshot

When bootstrapping from a snapshot, the first thing that you want to do is check the point in history from when you start.

The snapshot format does not (and cannot) provide any evidence that the imported block is actually a part of the current main chain of the Tezos network. To avoid to be fooled by a fake chain, it is necessary to carefully check that the block hash of the imported block is included in the chain. This can be done by comparing the hash to one provided by another node under the user’s control, or by relying on social cues to obtain a hash from a large number of trusted parties which are unlikely to be colluding.

As the Tezos position paper states:

“Occasional checkpoints can be an effective way to prevent very long blockchain reorganizations[…]. Forming a consensus over a single hash value over a period of months is something that human institutions are perfectly capable of safely accomplishing. This hash can be published in major newspapers around the world, carved on the tables of freshmen students, spray painted under bridges, included in songs, impressed on fresh concrete, tattooed on pet ferrets… there are countless ways to record occasional checkpoints in a way that makes forgery impossible.”

This same wisdom must be applied when using a snapshot.

After that careful selection or verification of the imported block hash, you can trust the node with the rest of the procedure. In particular, you need not trust the source of the file, the snapshot format contains everything necessary for the node to detect any inconsistency, malicious or not.

This safety comes from the fact that block headers are designed to make sure that applying a block has the same result for everyone in the network. To achieve this, they include hashes of their operations and predecessor, as well as the resulting chain state. The import process makes the same checks, recomputing and checking all the hashes it encounters in the snapshot.

How to

To bootstrap a Tezos node from a file FILE.full (running this command from an already synchronised node will not work), run:

$ tezos-node snapshot import FILE.full

Don’t forget to check the hash of the imported block displayed by the node when importing.

Exporting a snapshot

To export a snapshot, we first select a block hash which will represent the point in history at which consumers of this snapshot will start bootstrapping. By default, if no block hash is provided, we automatically choose a block which was included in the chain a few dozens of blocks ago. This is important as nodes bootstrapped from this snapshot will not be able to reorganise their chain below this block (they will set their checkpoint to this block).

Depending on the snapshot export option, additional history may also be put in the snapshot file.

How to create a snapshot

By default, the snapshot export command will create a full snapshot. Such a snapshot will contain all the blocks from a given block hash to the genesis. The whole chain will be exported into a snapshot, from the beginning to the selected point. This kind of snapshot can only be created from a full or archive node.

$ tezos-node snapshot export --block BLOCK_HASH FILE.full

How to create a `rolling` snapshot

This is the preferred use case if you want to deploy a node really quickly or for test and experimentation purposes (such as a classroom) as they are much smaller. However, to bootstrap a long running node on the network, we recommend using full snapshots to participate into the network wide preservation and sharing of chain history.

$ tezos-node snapshot export --block BLOCK_HASH FILE.rolling --rolling

On Garbage Collection and storage optimisations

The mechanism is in place for the node to run properly without the archived context for a given block. But actually dropping these contexts is another matter, usually known in the programming language world as garbage collection (or GC). In garbage collection, when resources are no longer needed, they are marked as such, and a garbage collector process comes from time to time to clean them up and free the memory space they used.

This branch comes with an experimental GC, that can be run on startup using the --gc option. Calling it on an archive node will do almost nothing except cleaning up old discarded reorganisations. Calling it on a rolling node after a cycle end should drop the archived contexts before the newly set checkpoint. This GC is still slow, and cannot be called while the node is running.

Several ongoing efforts (e.g. see plebeia, Irmin 2 and irontez) to provide an efficient GC that can run transparently, and a better and more optimised storage in general, are in progress. We shall discuss that matter more in a later post.

Towards other history modes and node variants

Nodes able to start and run with only a partial history open up many variants and possibilities to adapt the disk and memory footprints to different use cases.

Today, archive nodes have to start from the beginning, and validate the entire chain. They don’t get synchronised until they have munched the entire chain. We want to provide a version where nodes start in full mode, but reconstruct their archives little by little from the genesis while running. This way, we get both fast bootstrap from a snapshot, and eventually archiving.

Another variant that we imagine is a node that keeps a sparse archive, with an archive block at regular intervals. This way, it would be possible to access intermediate archive contexts by validating only a short sequence of blocks from the previous full block in memory. Much like how compressed video formats work.

Finally, at some point we would like to release a node that requires even less memory by not even keeping their full recent archives. Instead, they would maintain and exponentially sparse history, and would have to reconstruct their recent past when a reorganisation arises.

Introducing Snapshots and History Modes for the Tezos Node