In this article, we introduce two new features for the Tezos node: snapshots and history modes.
A snapshot is a file that contains everything necessary to restore the state of a node at a given block. A node restored via a snapshot can synchronise and help other nodes synchronise in the existing network. The only difference is that you cannot query the chain context (balances, baking rights, etc.) before the restoration point, but you can still get the full chain history.
In conjunction, we also introduce history modes, which represent different
policies for determining which past data a node should maintain. We propose
three modes: archive
(the current mode which keeps everything), full
(the new default) and rolling
. For now, snapshots can fire up a node
in either full
or rolling
mode.
These new features allow a user to spawn and synchronise a Tezos node in a
few minutes, from a single, untrusted file of about 150MB compressed
with a truncated history, or 800MB with a full history. You can test
all of that by using the mainnet-snapshots
branch on Nomadic Labs’ Gitlab.
Be aware that this is not yet production ready, it would not be wise to replace your current infrastructure with nodes from this branch at this date. However, you are very welcome to experiment with it, and all reports will be useful if we want this feature to be merged in mainnet as soon as possible.
History modes
History modes allow the node to run without maintaining the full archives of the chain.
Here are the three first modes:
full
nodes store all chain data since the beginning of the chain, but drop the archived contexts below the current checkpoint. In other words, you can still query any block or operation at any point in the chain, but you cannot query the balances or staking rights too far in the past.rolling
nodes are currently the most lightweight, only keeping a minimal rolling fragment of the chain and deleting everything before this fragment (blocks, operations and archived contexts).archive
nodes store everything. This corresponds to the current behaviour of Tezos nodes.
Full nodes will be the new default, as they are sufficient for almost everyone. We plan to introduce new modes in the future.
An important thing to note is that running a full node is enough to maintain the full chain history. Indeed, archive nodes do not need to use archive peers to bootstrap their archive, but only full peers, as the chain data is enough to apply the chain and construct the context archives. In other words, the network does not lose any security by switching to full as the default.
Checkpoints
To understand the technical details of history modes, let us first recall what checkpoints are in Tezos.
In the current protocol, an automatic checkpoint is cemented by the node to the block at position 0 of the 5th previous cycle.
The aim of checkpoints is to anchor the consensus in the real world at regular intervals. The block hashes from this point can easily be shared, propagated and saved outside the chain. You may have already heard of them since checkpoints were mentioned in the Tezos position paper.
When the checkpoint (CP in the picture below) is updated (currently at each cycle), alternative branches that do not contain the checkpoint (i.e those that diverged from the main chain before the new checkpoint’s level) are marked as invalid and can be safely deleted. The node does not accept reorganisations below this point.
A new RPC is available in order to request the current checkpoint (as well as additional new information) of a chain.
$ tezos rpc get /chains/main/checkpoint
Full mode, the new default
The full
-mode is the default mode when starting a node from scratch, or
from a full snapshot.
A node running in full
-mode stores the full chain data for all
blocks, even the ones older than the current checkpoint. More
precisely, it keeps the headers and the operations for these
blocks. However, it discards the archived context and the
operation and block receipts. We say that such a block
information is “pruned”: we keep only the necessary bits that we got
from the network, and drop everything that can be reconstructed from them.
In practice, we introduce two new tagged blocks in the history: the save point and the caboose. The save point currently mirrors the checkpoint and references the oldest block that contains all the data, i.e the oldest one that is not pruned. The caboose corresponds to the oldest pruned block. Here is a picture illustrating the full-mode initialisation.
The save point (SP) is first initialized with the checkpoint (CP) referenced by the snapshot, and the caboose (OO) with the oldest pruned block included by the snapshot.
Each time the chain checkpoint is updated, we also update the save point and the blocks older than the new save point are pruned. Finally, the caboose stays unchanged. Here is a picture illustrating the state of the chain in full-mode after a checkpoint update.
How to use
Using tezos-run
command line arguments:
$ tezos-node run --history-mode full
Or the configuration file:
{ "shell": {"history_mode": "full"} }
If you start your node for the first time, this argument is not
necessary as it is now the default. However, if you upgrade from an
existing archive state and want to switch to full mode, you can pass
the argument to convert your archive
node to a full
one.
Rolling mode, the lightest
The rolling-mode is the lightest mode for now. It only conserves a pruned history for a minimal period of blocks before the current save point. The difference with the full-mode is that a rolling node also updates the caboose and deletes blocks that are older than this one.
When starting a node configured in rolling-mode, the caboose and the save point are initialized the same way as for the full-mode. Here is a picture illustrating the rolling-mode initialisation.
Whenever the current checkpoint is updated, the node will also update its caboose and its save point in such a way that the distance between the new save point and the new caboose corresponds to the lifetime of operations (required to ensure proper validation of reorganisations just after the checkpoint). It will then purge its store by deleting all block information for those older than the new caboose and pruning all blocks between the new caboose (included) and the save point (excluded). Here is a picture illustrating the state of the chain in rolling-mode after a checkpoint update.
How to use
Using tezos-run
command line arguments:
$ tezos-node run --history-mode rolling
Or the configuration file:
{ "shell": {"history_mode": "rolling"} }
In that mode, the new checkpoint RPC will also give you the save point and caboose.
$ tezos rpc get /chains/main/checkpoint
Archive mode
The archive mode aims to save all the chain data, starting necessarily at the genesis block. It corresponds to the one the nodes are currently using. The archive mode can be useful in the context of block explorers for instance.
How to use
Using tezos-run
command line arguments:
$ tezos-node run --history-mode archive
Or the configuration file:
{ "shell": {"history_mode": "archive"} }
If you want to start an archive
node, it is now mandatory to pass
this argument the first time you launch your node.
From a mode to another
There are some restrictions when one wants to switch from a mode to another.
Going from archive
to full
or rolling
or from full
to rolling
is allowed, as it is just dropping data. It is not allowed to switch
from the full
or rolling
to archive
, since the last one would
require to rebuild dropped archives.
We have plan to leverage that restrictions in the future.
Snapshots
As the chain invariably grows every day, retrieving a full chain from the peer-to-peer network can be a very long process. Thanks to the implementation of history modes, it is now possible to propose an import/export feature: snapshots. This procedure allows to gather all the data necessary to bootstrap a node from a single file.
Starting a node from a snapshot
When bootstrapping from a snapshot, the first thing that you want to do is check the point in history from when you start.
The snapshot format does not (and cannot) provide any evidence that the imported block is actually a part of the current main chain of the Tezos network. To avoid to be fooled by a fake chain, it is necessary to carefully check that the block hash of the imported block is included in the chain. This can be done by comparing the hash to one provided by another node under the user’s control, or by relying on social cues to obtain a hash from a large number of trusted parties which are unlikely to be colluding.
As the Tezos position paper states:
“Occasional checkpoints can be an effective way to prevent very long blockchain reorganizations[…]. Forming a consensus over a single hash value over a period of months is something that human institutions are perfectly capable of safely accomplishing. This hash can be published in major newspapers around the world, carved on the tables of freshmen students, spray painted under bridges, included in songs, impressed on fresh concrete, tattooed on pet ferrets… there are countless ways to record occasional checkpoints in a way that makes forgery impossible.”
This same wisdom must be applied when using a snapshot.
After that careful selection or verification of the imported block hash, you can trust the node with the rest of the procedure. In particular, you need not trust the source of the file, the snapshot format contains everything necessary for the node to detect any inconsistency, malicious or not.
This safety comes from the fact that block headers are designed to make sure that applying a block has the same result for everyone in the network. To achieve this, they include hashes of their operations and predecessor, as well as the resulting chain state. The import process makes the same checks, recomputing and checking all the hashes it encounters in the snapshot.
How to
To bootstrap a Tezos node from a file FILE.full
(running this command
from an already synchronised node will not work), run:
$ tezos-node snapshot import FILE.full
Don’t forget to check the hash of the imported block displayed by the node when importing.
Exporting a snapshot
To export a snapshot, we first select a block hash which will represent the point in history at which consumers of this snapshot will start bootstrapping. By default, if no block hash is provided, we automatically choose a block which was included in the chain a few dozens of blocks ago. This is important as nodes bootstrapped from this snapshot will not be able to reorganise their chain below this block (they will set their checkpoint to this block).
Depending on the snapshot export
option, additional history may also
be put in the snapshot file.
How to create a snapshot
By default, the snapshot export
command will create a full
snapshot. Such a snapshot will contain all the blocks from a given
block hash to the genesis. The whole chain will be exported into a
snapshot, from the beginning to the selected point. This kind of
snapshot can only be created from a full
or archive
node.
$ tezos-node snapshot export --block BLOCK_HASH FILE.full
How to create a rolling
snapshot
This is the preferred use case if you want to deploy a node really
quickly or for test and experimentation purposes (such as a classroom)
as they are much smaller. However, to bootstrap a long running node on
the network, we recommend using full
snapshots to participate into
the network wide preservation and sharing of chain history.
$ tezos-node snapshot export --block BLOCK_HASH FILE.rolling --rolling
On Garbage Collection and storage optimisations
The mechanism is in place for the node to run properly without the archived context for a given block. But actually dropping these contexts is another matter, usually known in the programming language world as garbage collection (or GC). In garbage collection, when resources are no longer needed, they are marked as such, and a garbage collector process comes from time to time to clean them up and free the memory space they used.
This branch comes with an experimental GC, that can be run on startup
using the --gc
option. Calling it on an archive node will do almost
nothing except cleaning up old discarded reorganisations. Calling it on
a rolling node after a cycle end should drop the archived contexts
before the newly set checkpoint. This GC is still slow, and cannot be
called while the node is running.
Several ongoing efforts (e.g. see plebeia, Irmin 2 and irontez) to provide an efficient GC that can run transparently, and a better and more optimised storage in general, are in progress. We shall discuss that matter more in a later post.
Towards other history modes and node variants
Nodes able to start and run with only a partial history open up many variants and possibilities to adapt the disk and memory footprints to different use cases.
Today, archive nodes have to start from the beginning, and validate the entire chain. They don’t get synchronised until they have munched the entire chain. We want to provide a version where nodes start in full mode, but reconstruct their archives little by little from the genesis while running. This way, we get both fast bootstrap from a snapshot, and eventually archiving.
Another variant that we imagine is a node that keeps a sparse archive, with an archive block at regular intervals. This way, it would be possible to access intermediate archive contexts by validating only a short sequence of blocks from the previous full block in memory. Much like how compressed video formats work.
Finally, at some point we would like to release a node that requires even less memory by not even keeping their full recent archives. Instead, they would maintain and exponentially sparse history, and would have to reconstruct their recent past when a reorganisation arises.