TL;DR: To achieve millions of TPS, Smart Rollups currently rely on publishing data outside of the Tezos protocol. This blog post introduces the next step: a protocol-level solution for rollup data that is both highly scalable and fully decentralized.
With the activation of Smart Rollups in the Mumbai protocol upgrade, the Tezos ecosystem took a major step towards massive scalability. We are thrilled to see the ecosystem already building and deploying Smart Rollups on testnets, and we look forward to seeing them in action on Mainnet.
Meanwhile, at Nomadic Labs, Marigold, Functori and Trilitech, protocol developers are busy building a next-level data solution for Smart Rollups.
The challenge is to have the Tezos protocol guarantee availability of rollup transaction data without storing it in Layer 1 blocks. In this blog post we explain our approach, starting with an overview of the challenge at hand and then moving into a more technical explanation.
What’s the problem?
The main feature of rollups is that they move transaction and smart contract execution off-chain (to Layer 2). Layer 1 nodes can then focus on tasks that require high decentralization but are computationally lightweight, such as consensus.
While Smart Rollups are designed so that anyone can carry out execution for any rollup, execution does not need to be widely decentralized. That is because rollup operators are required to regularly post receipts, commitments, to Layer 1 with the result of the execution. These can be verified by anyone, and dishonest or erroneous commitments can be challenged by honest actors to ensure integrity.
However, rollup operators must of course be able to receive instructions from the rollup’s users to carry out any execution in the first place.
One way is to include these instructions in Layer 1 blocks, which is done on Tezos using the shared rollup inbox. This ensures that the original transaction data is available for verifying the execution done by rollup operators. But even when applying compression techniques, Layer 1 block size will eventually become a bottleneck. A rough estimate puts the maximum throughput using this approach on Tezos at approximately 3400 TPS1 with current protocol parameters (block size, block time, etc).
For handling millions of transactions per second (TPS) it’s necessary to keep transaction data out of Layer 1 blocks. But once it is off-chain, how can we trust it, and how can we be sure it is available for others to verify the work of rollup operators?
This is known as the data-availability problem and is a widely recognized issue among those trying to scale blockchains with Layer 2 solutions.
Data-availability committees are already here
For achieving millions of TPS on Tezos we currently rely on Data-Availability Committees, or DACs for short.
A DAC is a group of data providers that host data for rollups off-chain and make it available via the reveal data channel, an interface enabling Smart Rollups to access data external to the Tezos blockchain. For a deeper dive into DACs, see this article.
The DAC can provide a high degree of security as long as a few DAC participants are honest, with the tradeoff that a few dishonest participants can limit the throughput of the rollup.
But ultimately the Tezos protocol does not control or monitor what data is stored on a DAC and can provide no guarantees about the data being available. As a result, using DACs involves some trust in its members actually storing the data and providing it on request.
DACs remain practical for many applications, especially ones that don’t require a high level of decentralization or guarantees of rollup data being publicly available. This can be gaming, ticketing, or other use cases where a centralizing actor is already involved. DACs are also practical when private control of the data is required for confidentiality or legal reasons.
But Tezos’ enshrined rollups are intended as an integrated, decentralized scaling stack. Thus, the next step is to make sure rollup data’s availability is guaranteed by the Tezos protocol itself.
The fully decentralized solution: A data-availability layer
A Data-Availability Layer, or DAL, stores data and provides guarantees about its availability by relying on Layer 1 consensus, i.e. bakers.
It’s an independent peer-to-peer (P2P) network, running in parallel with Tezos’ Layer 1, where data can be submitted and retrieved. Bakers continuously monitor the DAL and attest on Layer 1 whether a given piece of data is available on the DAL.
Unlike the P2P protocol employed by Layer 1, where each node receives all data, the P2P protocol utilized by the DAL is designed such that DAL nodes receive only part of the data. Each data “chunk” is accompanied by an erasure code, which makes it possible to reconstruct the original data, even if parts of the data are lost or never properly published.
In addition to attestations by bakers, DAL nodes perform data availability sampling2, a technique that ensures, with a high probability, that the data is available while only downloading a small portion of the entire data. This means that anyone can ascertain data availability merely by running a node, without needing to trust bakers.
Overall, this approach is similar to what is proposed for Ethereum under the name ‘danksharding’, and what is planned for Celestia. It effectively circumvents the limitations inherent to Layer 1, massively scaling bandwidth and storage capacity for the Tezos network, while maintaining a high level of security and decentralization.
A quick overview of roles and responsibilities:
- Adding data: Anyone can submit new data to the DAL, though pre-approval by Layer 1 is required to deter spamming.
- Storing data: Anyone can contribute to storing data. The more people contributing, the higher the resiliency and efficiency of the DAL.
- Verifying availability: Bakers continuously publish attestations on Layer 1, declaring the availability of the data. Other DAL nodes perform data availability sampling.
- Retrieving data: Anyone can retrieve any data from the DAL.
Smart Rollups were designed for future compatibility with a DAL and will be able to leverage it as soon as it is activated through a protocol upgrade.
Notes on infrastructure
Given the different P2P protocol, we’ve decided to implement a separate node for connecting to the DAL network. We aim to make the command line interface and the configuration closely resemble those of the Octez node.
It means that both rollup operators and bakers will need to run DAL nodes, though based on feedback from the community, the setup with separate DAL nodes may change in the future. Hardware recommendations for participating in the DAL will be provided at a later stage, when the DAL has been evaluated on test networks.
Bakers should be aware that attesting requires downloading the data, which can be demanding in terms of bandwidth. Similarly to Layer 1 consensus, the amount of DAL attestations by a baker will be proportional to their stake, and bandwidth requirements will therefore also depend on their stake.
Status and roadmap
An early version of the DAL is now available on the Mondaynet testnet. Concurrently, we are preparing the DAL for production, including stress testing its P2P protocol.
The current DAL version does not have data-availability sampling implemented, and participation is optional for testnet bakers. We expect both to change before Mainnet release, which is estimated for early 2024.
Those interested in the technical aspects of the DAL can explore our design document. Also, more blog posts will follow, explaining design decisions and providing in-depth technical descriptions.
We consider the DAL a ground-breaking innovation when it comes to providing decentralized data-availability solutions. Upon activation through a protocol upgrade, the DAL would solidify Tezos’ position as a technical leader – and a blockchain that doesn’t compromise on decentralization.
Notes
-
Assuming a Layer 2 transaction takes up 10 Bytes of Layer 1 block space. Roughly, we can put at most 512KB of transactions in a Layer 1 block, which is equivalent to 51.2K transactions. As block time is 15 seconds, this means we can achieve a throughput of about 3413 TPS. ↩
-
Data availability sampling is not implemented in the DAL version currently available on test networks. ↩