Why it’s so important to go stateless

One of Eth1’s biggest problem is the current state size. Estimated at around 10-100GB (depending on how exactly it is stored), it is impractical for many nodes to keep in working memory, and is thus moved to slow permanent storage. However, hard disks are way too slow to keep up with Ethereum blocks (or god forbid, sync a chain from genesis), and so much more expensive SSDs have to be used. Arguably, the current state size isn’t even the biggest problem. The biggest problem is that it is relatively cheap to grow this state, and state growth is permanent, so even if we can raise the cost for growing state, there is no way to make someone pay for the actual impact on the network, which is eternal.

A solution space, largely crystallizing around two ideas, has emerged:

State rent – the idea that in order to keep a state element in active memory, a continuous payment is required, and
Statelessness – blocks come with full witnesses (e.g. Merkle proofs) and thus no state is required to validate whether a block is valid

On the spectrum to statelessness, there are further ideas worth exploring:

partial statelessness – reducing the amount of state required to validate blocks, by requiring witnesses only for some (old) state
weak statelessness – validating block requires no state, but proposing blocks requires the full state

Vitalik has written some ideas how to put these into a common framework here, showing that partial statelessness and state rent are very similar in that both require some form of payment for introducing something into active state, and a witness to reactivate state that has become inactive.

If you come from the Eth1 world, then you may think that partial statelessness with a remaining active state of 1 GB or even 100 MB is a great achievement, so why work so much harder to go for full statelessness? I argue that full (weak) statelessness unlocks a huge potential that any amount of partial statelessness cannot, and thus that we should work very hard to enable full statelessness.

Understanding Eth2 validators

Eth1 has been criticised in the past for having very high hardware requirements, and though not all of these criticisms are fair (it is still very possible to run an Eth1 node on moderate but well chosen consumer hardware), they are to be taken seriously, especially since we want to scale Ethereum without compromising decentralization. For Eth2, we have thus set ourselves a very ambitious goal – to be able to run an Eth2 node and validator on very low-cost hardware, even a Raspberry Pi or a smartphone.

This is not the easy route, but the hard route to scaling. Other projects, like EOS and Solana, instead require much more performant hardware and internet connections. But I think for decentralization it is essential to keep the requirements on consensus nodes, as well as P2P nodes, very low.

In Eth2, the consensus node is the validator. There is an important difference with the consensus nodes in Eth1 and Eth2:

In Eth1, the consensus nodes are miners. To “vote” for a chain, you have to produce a block on it. In other words, the consensus nodes and block producers are inseparable.
In Eth2, or rather its current first phase, the beacon chain, proposing blocks and forming consensus are two different functions: Blocks are proposed every 12 seconds by a randomly selected validator, but consensus is formed via attestations, with every validator voting for a chain every epoch, that is, every 6.4 minutes. Yes, at the moment, that is already almost 100,000 validators casting one vote every few minutes. Block producers have (almost ¹) no influence on consensus, they only get to select what is included in a block²

The property that block proposers are irrelevant for consensus opens up a significant design space. While for the beacon chain, block proposers are simply selected at random from the full validator set, for the shard chains, this doesn’t have to be true:

One interesting possibility would be that for a shard, especially an Eth1 execution shard, there is a way for a validator to enter a list that they are capable of producing blocks. These validators may require better hardware and may need to have “full” state
Another possibility, which we are currently implementing for the data shards, is that anyone can be selected for proposing blocks, but the actual content of the block isn’t produced by the proposer; instead, different entities can bid on getting their pre-packaged blocks proposed.

In both cases, weakly stateless validation means that all the other validators, who are not proposing blocks or preparing block content, do not need the state. That is a huge difference to Eth1: In Eth1, the consensus forming nodes (the miners) have high requirements anyway, so requiring them to keep full state seems fine. But with Eth2, we have the possibility of significantly lowering this requirement, and we should make use of it, to benefit decentralization and security.

So why is it ok to have expensive proposers?

An important objection may be that it defeats decentralization if block proposing becomes expensive, even if we get cheap validators and P2P nodes. This is not the case. There is an important difference between “proposers” and “validators”:

For validators, we need an honest supermajority, i.e. more than 2/3 of the total staked ETH must be honest. A similar thing can be said about P2P nodes – while there isn’t (as far as I know) a definite fraction of P2P nodes that must be honest, there is the requirement that everyone is connected to at least one honest P2P node in order to be able to be sure to always receive the valid chain; this could be 5% but in practice it is probably higher.
For proposers, we actually get away with much lower honesty requirements; note that unlike in Eth1, in Eth2 proposers do not get to censor past blocks (because they do not vote), but only get to decide about the content of their own block. Assuming that your transaction is not highly time critical, if 95% of proposers try to censor it, then the 20th proposer would still be able to get it safely included. (Low-latency censorship resistance is a different matter however, and in practice more difficult to achieve)

This is why I am much less worried about increasing hardware requirements for proposers than for validators. I think it would be fine if we need proposers to run a PC with 128GB RAM that is fully capable of storing even a huge state, if we can keep normal validator requirements low. I would be worried if a PC that can handle these requirements costs 100,000$, but if we can keep it to under 5,000$, it seems unconscionable that the community would not react very quickly by introducing more proposers if censorship were detected.

Finally, let’s not neglect that there are other reasons why block proposing will likely be done by those with significant hardware investments anyway, as they are better at exploiting MEV.

Note that I am using the word “proposer” here for the entities that package blocks, which is not necessarily the same as the one who formally signs it and introduces them; they could be “sequencers” (for rollups) etc. For simplicity I call them proposers here, because I do not think any of the system would fundamentally break if we simply introduced a completely new role into the system that only proposes blocks and nothing else.

The benefits of going stateless

So far I haven’t argued why (at least weak, but not partial) statelessness is such a powerful paradigm; in the executable beacon chain proposal, reducing state from 10 GB to 1 GB or 100MB seems to unlock a lot of savings for validators, so why do we have to go all the way?

Because if we go all the way, the executable Eth1 blocks can become a shard. Note that in the executable beacon chain proposal, all validators have to run the full Eth1 execution all the time (or they risk signing invalid blocks). A shard should not have this property; the point of a shard is that only a committee needs to sign a block (so only 1/1024 of all validators); and the others don’t have to trust that the majority of this committee is honest ³, but only that it has at least one honest member, who would blow the whistle when it tries to do something bad. This is only possible if Eth1 becomes stateless:

We want the load on all validators to be rougly equal, and free of extreme peaks. Thus, sending a validator to become an Eth1 committee member for a long time, like an hour or a day, is actually terrible: It means the validator still has to be dimensioned to be able to keep up with the full Eth1 chain in terms of bandwidth requirements. This is in addition to committees becoming much more attackable if they are chosen for a long time (for example through bribing attacks)
We want to be able to have easy fraud proofs for Eth1 blocks, because otherwise the other validators can’t be sure that the committee has done its work correctly. The easiest way to get fraud proofs is if a block can be its own fraud proof: If a block is invalid, you simply have to broadcast the block itself to show that fraud has happened.

So Eth1 can become a shard (that requires much less resources, like 1/100, to maintain) only if it becomes fully stateless. And at the same time, only then can we introduce more execution shards, in addition to the data shards.

Aren’t caches always good?

So what if we go to full statelessness but introduce a 10 MB cache? Or 1 MB? That can easily be downloaded even if you only want to check one block, because you are assigned to a committee or you received it as a fraud proof?

You can do this, but there is a simple way to see that this is very unlikely to be optimal, if the majority of validators only validate single blocks. Let’s say we target 1 MB blocks and in addition, we have a 1 MB cache. That means, every time a validator wants to validate a block, they have to download 2 MB – both the block and the cache. They have to download the cache every time, except if they download all blocks to also keep the cache up to date, which is exactly what we want to avoid.

This means, at the same cost of having blocks of size 1 MB with a cache of 1 MB, we could set the cache to 0 and allow blocks of 2 MB.

Now, it’s clear that a block of 2 MB is at least as powerful as having 1 MB blocks with 1 MB cache. The reason is that the 2 MB block could simply include a 1 MB cache if that’s what we thought was optimal – you simply commit to the cache at every block, and reintroduce the full cache in the next block. This is unlikely to be the best use of that 1 MB block space, but you could, so it’s clear that a 2 MB block is at least as powerful as a 1 MB block with a 1 MB cache. It’s much more likely that the extra 1 MB would be of better use to allow more witnesses to be introduced.

Binary tries or verkle tries?

I think overall, the arguments for shooting for full weak statelessness, and not partial statelessness or state rent, are overwhelming. It will impact users much less: They simply don’t have to think about it. The only thing they have to do, when constructing transactions, is to add witnesses (so that the P2P network is able to validate it’s a valid transaction). Creating these witnesses is so cheap that it’s unimaginable that there won’t be a plethora of services offering it. Most wallets, in practice, already rely on external services and don’t require users to run their own nodes. Getting the witnesses is a trivial thing to add⁴.

Partial statelessness, or state rent, adds a major UX hurdle on the way to full weak statelessness, where it would disappear again. It has some merit when you consider how difficult statelessness is to achieve using just binary Merkle tries, and that the gas changes required to allow Merkle trie witnesses will themselves be detrimental to UX.

So in my opinion, we should go all the way to verkle tries now. They allow us to have manageable, <1 MB witnesses, with only moderate gas repricings as proposed by EIP-2929 and charging for code chunks. Their downsides are well contained and of little practical consequence for users:

A new cryptographic primitive to learn for developers
adding more non post-quantum secure cryptography The second sounds scary, but we will already introduce KZG commitments in Eth2 for data availability sampling, and we are using elliptic curve based signatures anyway. Several post quantum upgrades of the combined Eth1 and Eth2 chain are required, because there simply aren’t practical enough post quantum alternatives around now. We can’t stop progress because of this. The next 5 years are extremely important in terms of adoption. The way forward is to implement the best we can now, and in 5-10 years, when STARKs are powerful enough, we will introduce a full post quantum upgrade of all primitives.

In summary, verkle tries will solve our state problems for the next 5 years to come. We will be able to implement full (weak) statelessness now, with almost no impact on users and smart contract developers; we will be able to implement gas limit increases (because validation becomes faster) and more execution shards – and all this comes with little downside in terms of security and decentralization.

The big bullet to bite is for everyone to learn to understand how KZG commitments and verkle tries work. Since Eth2 will use KZG commitments for data availability, most of this work will soon be required of most Ethereum developers anyway.

Almost no influence, because there is now a small modification to improve resilience against certain balancing attacks, that does give block proposers a small amount of short term influence on the fork choice ↩
To be precise, they can have an influence, if they start colluding and censoring large numbers of attestations, but single block producers have a completely negligible effect on how consensus is formed ↩
A dishonest committee can do some annoying things, that could impact the network and introduce major latency, but it cannot introduce invalid/unavailable blocks ↩
Users who do want to run their own node can still use an external service to get witnesses. Doing so is trustless, since the witnesses are their own proof if you know what the latest state root is ↩