What everyone gets wrong about 51% attacks

Excuse the provocation in the title. Clearly not everyone gets it wrong. But sufficiently many people that I think it’s good to write a blog post about the topic.

There is a myth out there that if you control more than 50% of the hashpower in Bitcoin, Ethereum, or another blockchain, then you can do whatever you want with the network. A similar restatement for Proof of Stake is that if you control more than two thirds of the stake, you can do anything. You can take another person’s coins. You can print new coins. Anything.

This is not true. Let’s discuss what a 51% attack can do:

They can stop you from using the chain, i.e. block any transaction they don’t like. This is called censorship
They can revert the chain, i.e. undo a certain number of blocks and change the order of the transactions in them.

What they cannot do is change the rules of the system. This means for example:

They cannot simply print new coins, outside of the provisions of the blockchain system; e.g. Bitcoin currently gives each new block producer 6.25 BTC; they cannot simply turn this into one million BTC
They cannot spend coins from an address for which they don’t have the private key
They cannot make larger blocks than consensus rules allow them to do

Now this is not to say that 51% attacks aren’t devastating. They are still very bad attacks. Reordering allows double spending of coins, which is quite a big problem. But there are limits on what they can do.

Now how do most Blockchains, including Bitcoin and Ethereum, ensure this? What happens if a miner mines a block that goes against the rules? Or a majority of the stake signs a block that goes against the rules?

The blockchain security model

Sometimes people claim that the longest chain is the valid Bitcoin or Ethereum chain. This is somewhat incomplete. The proper definition of the current chain head is

The valid chain with the highest total difficulty.

So there are two properties that a client verifies before accepting that a chain should be used to represent the current history:

It has to be valid. This means that all state transitions are valid; for example in Bitcoin, that means that all transactions only spent previously unspent transaction outputs, the coinbase only receives the transaction fees and block rewards, etc.
It has to be the chain with the highest difficulty. Colloquially, that’s the longest chain, however not measured in terms of blocks but how much total mining power was spent on this chain.

This may all sound a bit abstract. It is legitimate to ask who verifies that first condition, that all blocks on the chain should be valid? Because if it’s just the miners that also verify that the chain is valid, then this is a tautology and we haven’t really gained anything.

But blockchains are different. Let’s see why. Start with a normal client/server database architecture:

Database user and server

Note that for a typical database, the user trusts the database server. They don’t check that the response is correct; the client makes sure that it is validly formatted according to the protocol, and that’s it. The client, here represented by an empty square, is “dumb”: It can’t verify anything.

A blockchain architecture however, looks like this:

Blockchain architecture

So let’s summarise what happens here. There are miners (or stakers) that produce the chain. There is a peer to peer network – its role is to make sure that a valid chain is always available to everyone, even if some of the nodes aren’t honest (you need to be connected to at least one honest and well-connected P2P node, to ensure that you will always be up to date with the valid chain). And there is a client, who sends transactions to the P2P network and receives the latest chain updates (or the full chain, if they are syncing) from other nodes in the network. They are actually part of the network and will also contribute by forwarding blocks and transactions, but that’s not so important here.

The important part is that the user is running a full node, as represented by the cylinder in their client. Whenever the client gets a new block, just like any other node, whether it’s a miner or just a node in the P2P network, they will validate whether that block is a valid state transition.

And if it’s not a valid state transition, the block will just be ignored. That’s why there is very little point in a network for miners to ever try to mine an invalid state transition. Everyone would just ignore it.

Many users run their own node to interact with blockchains like Ethereum or Bitcoin. Many communities have made this part of their culture and place a great emphasis on everyone running their own node, so that they are part of the validation process. Indeed, you could say that it’s really important that the majority of users, especially those with a lot of value at stake, run full nodes; if the majority of users become too lazy, then suddenly miners can be tempted to produce invalid blocks, and this model would not hold anymore.

Analogy: Separation of powers

You can think of this a bit like the separation of powers in liberal democracies – there are different branches of the government, and just because you have a majority in one of them (say, the legislation) does not mean you can simply do anything you like and ignore all laws. Miners or stakers have the power to order transactions in blockchains; they don’t have the power to simply dictate new rules on the community.

But do all blockchains work like this?

That’s a good question. And what’s important to note is that this only works if a full node is easy to run. As an average user, you will simply not do it if it means having to buy another computer for 5,000$ and needing 1 GBit/s of internet connection permanently. Even if you can get such a connection in some places, having it permanently clogged by your blockchain node is probably not very convenient. In this case, you will probably not run your own node (unless your transactions are exceptionally valuable), which means that you will trust someone else to do it for you.

Imagine a chain that is so expensive to run that only stakers and exchanges will run a full node. You have just changed the trust model, and a majority of stakers and the exchanges could come together and change the rules. There would be no debate with the users about this – users cannot lead a fork if they literally have no control over the chain, at all. They could insist on the old rules, but unless they start running full nodes, they would have no idea if their requests are answered using a chain that satisfies the rules that they want.

That’s why there are always huge debates around increasing the block size of say, Ethereum or Bitcoin – everytime you do this, you increase the burden for people running their own nodes. It’s not much of a problem for miners – the cost of running a node is tiny compared to actual mining operations – so it shifts the balance of power away from users and to the miners (or stakers).

How about light clients?

All right, but what if you just want to pay for your coffee using cryptocurrencies? Are you going to run a full node on your phone?

Of course, nobody expects that. And users don’t. Here, light clients come into play. Light clients are simpler clients that do not verify the full chain – they only verify the consensus, i.e. the total difficulty or the amount of stake that has voted for it.

In other words, light clients can be tricked into following a chain that contains invalid blocks. There are remedies for this, in the form of data availability checks and fraud proofs. As far as I know, no chain has implemented these at this point, but at least Ethereum will do this in the future.

So using light clients with data availability checks and fraud proofs, we will be able to make the blockchain security model available without requiring all users to run a full node. This is the ultimate goal, that any phone can easily run an Ethereum light client.

And what about sidechains?

Sidechains are a hot topic right now. It would seem that they are an easy way to provide scaling, without the complexity of rollups. Simply speaking

Create a new Proof of Stake chain
Create a two-way bridge with Ethereum
…
Profit! Note that the security of the sidechain relies pretty much entirely on the bridge – that is the construction that allows one chain to understand another chain’s state. After all, if you can trick the bridge on the main chain that all the assets on the bridged chain now belong to Mr. Evil, then it doesn’t matter if full nodes on the Proof of Stake chain think differently. So it’s all in the bridge.

Unfortunately, the state of bridges is the same as with light clients. They don’t verify correctness, but only the majority part of the consensus condition. However, there are two things that are worse than light clients

Bridges are used for very high value transactions, where most users would choose a full node if they could
Unfortunately, there is no way to fortify bridges as we can do for light clients – the reason is that they cannot verify data availability checks

The second point is quite subtle and could easily fill another blog post or two. But in short, bridges cannot do data availability checks, and without these, fraud proofs are also mostly useless. Using zero knowledge proofs, you can get an improvement by requiring bridges to include proofs of all blocks being correct – unfortunately, this still suffers from some data availability attacks, but it is an improvement.

In summary, sidechains have a different, much weaker security model than a blockchain like Ethereum and Bitcoin. They cannot protect against invalid state transitions.

Does this all have to do something with sharding?

In fact, all of this has a lot to do with sharding. The reason why we need sharding to scale is because it is the only way to scale without raising the bar for running a full node, while maintaining the full security guarantees of blockchains as closely as possible.

But what if you just undo all of history? Then you can still just steal all the Bitcoin/Ether/etc.

From a theoretical point of view, on a non-checkpointed Proof of Work chain, it is true that by not reverting some transactions, but all transactions ever, you could still get all the Bitcoins. OK, so you cannot print a trillion Bitcoin, but you can still get all the Bitcoins in existence, so that’s pretty good, right?

I think this point is very theoretical. The probability that either of these communities would accept a fork that revises years (or even just hours) of its history is precisely zero. There would be massive scrambling together on all possible channels, with the pretty quick conclusion that people should reject this and just agree that the valid chain should be the one that is already in existence.

With Proof of Stake and finalization, this mechanism will become formalized – clients simply never revert on finalized blocks, ever.