1. Research & development
  2. > Blog >
  3. We Discovered a Flaw in the Sapling Protocol Integration — a fix is ready for J.

We Discovered a Flaw in the Sapling Protocol Integration — a fix is ready for J.

15 March 2022
Nomadic Labs

Summary: A design flaw in the SAPLING_VERIFY_UPDATE Michelson opcode renders unshielding transactions from the shielded pools implemented in sapling_contract.tz malleable. This affects any deployment of this sample contract on the Tezos mainnet, and could potentially extend to other contracts following a similar pattern. We have implemented a new, strengthened, version of this primitive which should become the required opcode for safer integration of Sapling transactions in Michelson smart contracts. This new version will be available as a part of the upcoming J protocol proposal. In the meantime, we advise against originating new Sapling contracts on mainnet, and to avoid interacting with the existing contracts.

This blog post follows a discussion in Tezos Agora, where a vulnerability is revealed in sapling_contract.tz, a sample Michelson smart contract distributed as a part of the test suite of the Tezos protocol. This flaw could potentially apply to other contracts implementing a similar pattern.

The bug in the contracts arises from a design flaw in one of the Michelson opcodes provided to integrate Sapling transactions to Tezos smart contracts, SAPLING_VERIFY_UPDATE. That is, this is not a vulnerability in the Sapling protocol, its implementation, or even the implementation of the Michelson opcode per se. Instead, it is rather the result of its specification, and the provided interface, not being sufficiently strong to write safe smart contracts.

Over the last few days, we have implemented a patch addressing this issue on tezos!4589, which will be included with the next protocol proposal for protocol J, strengthening the support for shielded transactions.

In the meantime we advise the Tezos community to avoid originating new contracts creating shielded pools, and to avoid interacting with the deployed contracts. Even if we have not observed any exploit of this vulnerability, we are confident an attack is indeed plausible.

A recent inspection of deployed contracts on the Tezos mainnet at level 2155024, reveals only 2 contracts originated using a copy of the vulnerable shielded pool implemented by sapling_contract.tz: KT1BUZy6x and KT1E3kx2W1. At the time of writing, the outstanding balance in the two shielded pools is under 6 tez. We recommend users to avoid interacting with them.

Given the low exposure in locked liquidity in the shielded pool, as this feature has not been used extensively on mainnet so far, we consider that shipping this fix with the J protocol proposal is the best alternative among other available options.

In the sequel, we provide further detail on the vulnerability and we describe the efforts undertaken to lift the current limitations of the Sapling protocol integration.

Sapling and unshielding transactions

Since the activation of the Edo2 protocol proposal on block 1,343,489 a little over a year ago, the Tezos economic protocol has provided integration for a Sapling protocol, which enables privacy-preserving transactions. The design and implementation of Sapling in Tezos follows closely the original design by the Electric Coin Company for ZCash. Its integration into the Tezos economic protocol provides also two Michelson opcodes, SAPLING_EMPTY_STATE and SAPLING_VERIFY_UPDATE, to enable application developers to seamlessly integrate Sapling transactions into their smart contracts. This opened the door to new type of privacy-preserving applications, in particular asset transactions with selective disclosure — as implemented in sapling_contract.tz.

In order to support such applications, the SAPLING_VERIFY_UPDATE provides a unified interface for both: shielding (i.e., depositing) tez into a shielded pool, unshielding (i.e., withdrawing) tez from a pool, and transferring shielded tokens between shielded accounts — all this using a single interface. The specification reads:

SAPLING_VERIFY_UPDATE / t : s : S => Some (Pair b s') : S

In words, it prescribes that the opcode consumes a shielded transaction t and a Sapling state s from the top of the execution stack. It will then verify the transaction and, if it succeeds, it returns Some outstanding balance b, and a new state Sapling state s'2. The balance b is a signed integer value which determines an outstanding balance that needs to be consolidated with the smart contract’s unshielded balance:

  • A strictly-positive balance value entails unshielding tokens — e.g., to match burned shielded tokens in a pool.
  • A strictly-negative balance value entails shielding tez — e.g., to compensate minted shielded tokens in a pool.
  • A zero-valued balance denotes an internal shielded operation which doesn’t prescribe a change in the unshielded balance of the contract — e.g. an internal transfer between shielded accounts.

Thus, in the case of non-zero balances, a strictly-positive b entails that b tez need to be debited from the contract to restore the application’s invariant. Respectively, a strictly-negative b entails that b tez have to be credited to the contract to preserve the invariant.

The flaw in the current integration lies within the interaction between the Sapling protocol and the Tezos protocol, in particular when withdrawing tokens from a shielded pool. As a general rule, when designing an operation that transfers ownership of a token, it is crucial that the key owning the tokens signs a permission to debit a certain amount, together with the address of the recipient. If the recipient is not signed by the owner, then an attacker can intercept the transaction, and swap the recipient with its own address. For example in the case of a shielding transaction, the Tezos account that owns the tez signs a transfer which includes the Sapling recipient. Unfortunately, the converse is not true. In the case of an unshield, there is no way for a Sapling key to sign the Tezos recipient, given the current format of Sapling operations as introduced in Edo.

In short: when someone withdraws tez from the shielded pool, an attacker observing the pending smart contract call can steal the Sapling transaction parameter, and use it to inject a new smart contract call, with their own address as recipient. This can lead to a loss of funds, but it doesn’t compromise the privacy of the shielded pool.

Now, in order to illustrate this design flaw, we take a closer look at sapling_contract.tz. This contract manages a shielded pool where tokens are pegged 1 to 1 to tez. This tokens can be shielded to and unshielded from the pool, or transferred between shielded accounts, using Sapling transactions. We focus on a few key selected lines.

On line 5, we observe the parameter declaration. The first parameter is the Sapling transaction to be verified by the protocol, denoted by (sapling_transaction 8) type — just ignore the 8 value there3. The second has a key hash option type:

parameter (list (pair (sapling_transaction 8) (option key_hash) ) );

The latter is an optional implicit account, which denotes the intended destination, when unshielding the tokens from the pool. As you can see, the two arguments are given separately and there is no cryptographic signature binding the two. In other words, the combination of Tezos smart contract call and Sapling unshield is malleable. An attacker can intercept an unshield operation, and proceed to resubmit it with its own address as destination instead, effectively stealing the unshielded tez. Let’s see why:

The whole interaction with the Sapling protocol on this contract, as we hinted before, is concentrated on the SAPLING_VERIFY_UPDATE opcode on line 21:

# We verify the transaction and update the storage if the transaction is
# valid. The shielded transactions are handled here.
# The new state is pushed on top of the stack in addition to the balance
# of the transaction. If the rest of the script goes well, this state
# will be the new state of the smart contract.

The remark in the docstrings are worth revisiting: the pushed state will be the new state of the smart contract, making proof replay impossible. In the case of an unshield — a positive balance being pushed to the stack as a result — , the interesting bit of the contract’s effect takes place jumping forward to the IFGT opcode expanding over lines 36 — 45:

              IMPLICIT_ACCOUNT };
       DIP { UNIT;
             # Stack manipulation to order.
             # The operations will consist of the# TRANSFER_TOKEN operation.
             DIP {CONS} ;};

If on line 36 the value on the top of the stack is greater than zero, then, on line 41, we will TRANSFER_TOKENS (that is, unshielded tez) to the address pushed on the stack to restore the 1 to 1 peg, in this case the second parameter provided to the contract.

Now, we see clearly that there is no binding of the destination consumed as an argument by TRANSFER_TOKENS, and the Sapling transaction verified by the protocol, as we mentioned above. Enforcing the operational correctness of unshield, that is, ensuring that tokens arrive to their intended destination, is a burden of the rest of the smart contract — and ultimately, its developer. This contract does not implement any such mechanism, and therefore is indeed malleable. An adversary could plausibly exploit this by observing a valid call to this contract, and subsequently replace the destination with the attacker’s public key hash.

Indeed, this is a critical bug in the contract, and any other contract following this pattern on mainnet is vulnerable as well. This is clearly an unintended design error on the integration of the Sapling protocol, and not in the implementation of the Michelson API — which is correct with regard to its specification — nor, on the Sapling protocol — which is still, safe and sound.

That said, the limitations of the current existing API are clear, and the Tezos protocol needs an improved set of primitives for safe, privacy-preserving transactions. We discuss our immediate first step in the next section.

Strengthening shielded transactions

In order to address this issue, we have implemented the following changes to the Sapling integration in Tezos and Michelson, which will be included in the upcoming J protocol proposal. Adopting the proposed changes will prevent the issues described above, and enable Tezos smart contract developers to write safer privacy-preserving applications. In further detail:

  • We add an extra field to Sapling transactions, called bound_data. This field is signed by the Sapling spending key, as a part of the whole transaction. In the case of an unshield, it is intended to contain its recipient.

  • We update the Michelson type sapling_transaction, and we overload and extend the SAPLING_VERIFY_UPDATE opcode to handle the updated transaction type. When provided the new transaction type, SAPLING_VERIFY_UPDATE performs the same checks done by the previous version, and it additionally checks the signature on the supplied bound_data field. The latter is returned together with the balance and the updated state.

  • We provide a new sapling_contract.tz reference smart contract which expects the Tezos address of the recipient to be included in the bound_data field of the Sapling transaction. Moreover, the address should be encoded as a Micheline public_key_hash type. Thus, instead of having the recipient as a separate parameter to the smart contract, it is extracted from the Sapling transaction.

  • We update the Octez Tezos client to correctly populate the bound_data of a Sapling unshield transaction with the Tezos address of the recipient.

  • We deprecate the original Michelson type — which is also renamed to sapling_transaction_deprecated. As a result, it will continue to work on mainnet for previously originated contracts, but after (and if) protocol proposal J is activated, new smart contracts using the deprecated transaction type will no longer able to be originated, and the origination operation will fail.

We are confident this strengthening will avoid similar issues arising from loose bindings between Sapling transactions, and their sources and destinations.

It should be noticed that these changes will only be effective on the Tezos mainnet once — and only if — the J protocol proposal is accepted by the community, and after it is activated on the Tezos mainnet. Before then, we advise the community against deploying new contracts using the current support for Sapling transactions, and to avoid interacting with pre-existing deployed contracts. Shielded funds locked in the vulnerable contracts are safe, but attempting to unshield tokens from these pools entails a risk of being targeted.

Looking ahead

It is regretful that this critical issue with the integration of the Sapling protocol exists, as we were very much looking forward to seeing increased adoption of privacy-preserving features. We are reevaluating our development process to make it harder and harder for such issues to slip through in the future and we renew our commitment to develop technologies that are innovative and trustworthy.

If there is a silver lining, it is that this was caught by testing and reviewing a contract before it was massively adopted by the community. We are also reassured by the fact that a patch could be implemented quickly.

We invite the community, specially developers of applications building upon the Sapling protocol’s infrastructure, to test the new set of primitives on upcoming test networks. A teztnet for the J proposal will be announced soon, following the usual process for launching protocol test networks. Another option is to join the rolling Dailynet and Mondaynet teztnets — feel free to reach out to us if you need help deploying your contracts on the test networks.

  1. We have observed three contracts which use the Sapling integration provided by SAPLING_VERIFY_UPDATE. In addition to the two contracts mentioned above, there is a third deployed contract, KT1UmxfNX. This contract implements different functionality, it has not been used since its origination, and thus never held any tez balance. 

  2. We elide here the failing cases which would push None to the stack, as they are not needed to illustrate the bug. 

  3. The value denotes the size of the memo, encrypted arbitrary bytes attached to each output, and only accessible to the owner of the viewer/spending keys.