Special thanks to Justin Drake, Tina Zhen, and Yoav Weiss for their feedback and review.
The Ethereum project initially aimed for a minimalist core, relying on protocols for additional functionality. The debate over L1 vs. L2 solutions extends beyond scaling, impacting various user needs like asset exchange, privacy, security, and more. Recently, there’s been some interest in incorporating more of these features into the core Ethereum protocol. This post explores the historical philosophy of minimal enshrinement and presents new perspectives on this topic, aiming to establish a framework for identifying suitable features to include in the protocol.
Initial principles of protocol minimalism
Early on in the development of what was then known as “Ethereum 2.0,” there was a strong motivation to design a clear, straightforward, and aesthetically pleasing protocol that attempted to do as little as possible for itself and left practically everything to users to build on top of. The protocol should only consist of a single virtual machine call, exactly like confirming a block should.
An extremely rough recreation from memory of a whiteboard sketch Gavin Wood and I did discussing Ethereum 2.0 back in early 2015.
All additional functionality would occur through contracts: a few system-level contracts, but most contracts provided by users. The “state transition function” (the function that processes a block) would only require a single VM call. The block processor contract, which would be approved by either offchain or onchain governance and then run with elevated permissions, could be thought of as a single transaction for the purpose of this model, which is one of its extremely appealing features. Even a full hard fork might be defined in this way.
In particular, account abstraction and scaling—two topics that were on our minds in 2015—were covered by these debates. In the case of scaling, the goal was to develop a form of scaling that was as abstracted as possible while yet appearing to be a logical extension of the previous model.
The protocol would recognize this and resolve the call using some very general scaled-computation capabilities. A contract may make a call to a piece of data that was not stored by the majority of Ethereum nodes. From the virtual machine’s perspective, the call would be forwarded to a different sub-system and then miraculously return with the right answer later.
We briefly considered this idea but quickly dropped it because we were too busy making sure that any form of blockchain scaling was even conceivable. The use of ZK-EVMs and data availability sampling, however, means that one potential scenario for Ethereum scaling might actually resemble that vision remarkably well, as we will explore later. Abstraction of accounts
However, as we were aware that some sort of implementation was feasible from the outset, research was started right away to try to bring the purist premise that “a transaction is just a call” as close to reality as possible.
Between processing a transaction and actually calling the underlying EVM from the sender address, there is a ton of boilerplate code that takes place, and there is even more boilerplate code that follows. How can we make this code as small as it may possibly be?
One of the key lines of code in this section is validate_transaction(state, tx), which performs tasks like verifying the accuracy of the transaction’s nonce and signature. Account abstraction was designed with the practical purpose of enabling users to easily employ multisig wallets and social recovery by allowing them to substitute their own validation logic for the fundamental ECDSA and nonce-incrementing validation. Therefore, finding a way to redesign apply_transaction so that it is just a simple EVM call was not just a matter of cleaning up the code for the sake of doing so; rather, it involved putting the logic into the user’s account code in order to provide users the flexibility they required.
However, the obsession on attempting to make apply_transaction include the least amount of entrenched logic feasible ended up posing a lot of difficulties. To understand why, let’s focus on EIP 86, one of the earliest account abstraction proposals:
Specification
If
block.number >= METROPOLIS_FORK_BLKNUM
, then: 1. If the signature of a transaction is(0, 0, 0)
(ie.v = r = s = 0
), then treat it as valid and set the sender address to2**160 - 1
2. Set the address of any contract created through a creation transaction to equalsha3(0 + init code) % 2**160
, where+
represents concatenation, replacing the earlier address formula ofsha3(rlp.encode([sender, nonce]))
3. Create a new opcode at0xfb
,CREATE_P2SH
, which sets the creation address tosha3(sender + init code) % 2**160
. If a contract at that address already exists, fails and returns 0 as if the init code had run out of gas.
Basically, a transaction truly becomes “just a call” if the signature is set to (0, 0, 0). For an early example of that code, see here, and for the very similar validate_transaction code that this account code would be replacing, see here. The account would be in charge of having code that parses the transaction, extracts and verifies the signature and nonce, and pays fees.
The added burden of executing additional logic for only accepting and forwarding transactions that go to accounts whose code is configured to actually pay fees is given to miners (or, in today’s terms, block proposers) in exchange for this protocol layer simplicity. What logic is that? To be honest, EIP-86 did not give it much thought:
It should be noted that miners would require a plan for accepting these transactions. If not, they run the risk of accepting transactions for the validate_transaction code, which this pre-account code would be replacing, that do not pay them any fees and perhaps even transactions that have no effect (for example, because the transaction was already included and the nonce is no longer current). One straightforward strategy is to create a whitelist of accounts for the codehash that they accept transactions to be sent to; authorized code would contain logic that pays miners transaction fees. A less stringent but equally successful approach would be to accept any code that adheres to the same general style asthe aforementioned, using a little amount of gas to execute nonce and signature verification, and providing assurance that the miner will be paid transaction fees. Another tactic is to try to process any transaction that only requires 250,000 gas or less, alongside other methods, and only include it if the miner’s balance has increased correctly as a result of the transaction.
EIP-86 would have introduced entirely new classes of weirdness, such as the possibility that the same transaction with the same hash might appear multiple times in the chain, not to mention the multi-invalidation problem, if it had been included as-is. However, doing so would have drastically increased the complexity of other parts of the Ethereum stack, necessitating the writing of essentially the same code elsewhere.
The account abstraction multi-invalidation problem. The mempool can be easily and inexpensively flooded since one transaction that is added to the chain could invalidate thousands of other transactions in the mempool.
From there, account abstraction developed gradually. This ethresear evolved from EIP-86, which eventually became EIP-208.This study was inspired by the Ch article on “tradeoffs in account abstraction proposals.”ch post six months later. All of this culminated in the actually somewhat-workable EIP-2938.
However, EIP-2938 was by no means minimalist. The EIP contains:
- a new form of transaction
- Three brand-new global transaction variables
- Two new opcodes, including the cumbersome PAYGAS opcode, which simultaneously serves as an execution breakpoint for the EVM, checks the gas price and gas limit, and temporarily stores ETH for fee payments.
a series of sophisticated mining and broadcasting techniques, including a list of prohibited opcodes for the transaction’s validation phase
EIP-2938 was later redesigned into the fully extra-protocol ERC-4337 in order to launch account abstraction without involving Ethereum core developers who were engaged making herculean efforts to optimize the Ethereum clients and implement the merging.
ERC-4337. It really does rely entirely on EVM calls for everything!
Because it is an ERC, it officially exists “outside of the Ethereum protocol” and does not call for a hard fork. So, issue resolved? Actually, not quite, it turns out. It is a helpful instructional example to see the reasons why this option is being examined. The current medium-term roadmap for ERC-4337 actually does involve eventually putting substantial portions of ERC-4337 into a series of protocol features.
Enshrining ZK-EVMs
Now let’s turn our attention to ZK-EVMs, another prospective candidate for inclusion in the Ethereum protocol. At the moment, a lot of ZK-rollups are required to develop code that is quite comparable in order to validate the execution of Ethereum-like blocks inside a ZK-SNARK. The ecosystem of independent implementations is quite diverse and includes the PSE ZK-EVM, Kakarot, the Polygon ZK-EVM, Linea, Zeth, and so on.
How to handle the potential for ZK-code errors has been one of the recent hot-button issues in the EVM ZK-rollup domain. Currently, every one of these operational systems has a “security council” mechanism that, in the event of a problem, can take precedence over the proving system. In this post from the previous year, I attempted to provide a standardized structure to encourage projects to be explicit about the degree of trust they place in the proving system and the security council, and move toward gradually giving the security council fewer and fewer capabilities.
- Rollups might rely on numerous proving systems in the medium term, and the security council would only have any authority in the most extreme scenario in which two separate proving systems disagree with one another.
Can’t we somehow make “verify EVM execution in ZK” into a protocol feature, and deal with exceptional circumstances like bugs and upgrades by simply applying Ethereum’s social consensus, the same way we do for base-layer EVM execution itself? Since these L2 ZK-EVMs are essentially using the exact same EVM as Ethereum.
This is an important and challenging topic. There are a few nuances:
- Being multi-client compatible with Ethereum is important to us. This indicates that we intend to permit various clientele to utilize various proving systems. This implies that we need a guarantee that the underlying data is accessible for any EVM execution that is demonstrated using a ZK-SNARK system so that proofs can be produced for other ZK-SNARK systems.
- Despite the technology’s infancy, auditability is probably what we want. In actuality, this means the same thing: if any execution is verified, we want the underlying data to be accessible so that users and developers can review it if something goes wrong.
- In order for other types of proof to be generated rapidly enough for other clients to validate them after one sort of proof has been made, we need substantially faster proving times. Making a precompile with an asynchronous response after a time span greater than a slot (like three hours) could be a workaround for this, but it adds complexity.
- We wish to support “almost-EVMs” in addition to copies of the EVM. The opportunity to improve upon the execution layer and add to the EVM is one of L2s’ allurements. It would be ideal if a given L2 could continue to utilize a native in-protocol ZK-EVM for the parts that are identical to the EVM and just rely on their own code for the parts that differ from the EVM if the L2’s VM differs from the EVM by a small amount. This might be accomplished by incorporating a feature into the ZK-EVM precompile that allows the caller to specify a bitfield, list of opcodes, or address that will be
handled by an external table rather than the EVM itself.
Statefulness is a possible area of disagreement with regard to data accessibility in a native ZK-EVM. If ZK-EVMs are not required to carry “witness” data, they are substantially more data-efficient. This means that we can presume that provers have access to a particular piece of data if it was read or written in a prior block and we don’t need to make it available again. It turns out that if a rollup successfully compresses data, the stateful compression provides for up to 3x data savings compared to the stateless compression, which extends beyond not re-loading storage and code.
This means that for a ZK-EVM precompile, we have two options:
- All data must be accessible in the same block in order for the precompile to work. This opens up the possibility of stateless provers, but it also increases the cost of ZK-rollups compared to rollups that use bespoke code.
- Pointers to data utilized or produced by earlier executions are permitted by the precompile. This makes ZK-rollups nearly ideal, but it also makes things more difficult and provides a new type of state that provers must store.
Enshrining liquid staking
Pointers to data utilized or produced by earlier executions are permitted by the precompile. This makes ZK-rollups nearly ideal, but it also makes things more difficult and provides a new type of state that provers must store.
By far the most straightforward “interface” for staking that can meet both of these requirements is just an ERC20 token: change your ETH into “staked ETH”, hold it for a while, and then convert back. And in fact, companies like Lido and Rocketpool that offer liquid staking have developed to accomplish just that. Liquid staking does, however, have some inherent centralizing mechanics at work: people tend to use the largest version of staked ETH because it is the most liquid and well-known (as well as the version that applications support the most because it is the version that the majority of users will be familiar with).
The ability to choose the underlying node operators for each version of staked ETH requires some sort of method. It can’t be unrestricted because if it were, other attackers would join in and utilize user money to increase their attacks. The main two at the moment are Rocket Pool and Lido, where anyone can run a node for 8 ETH (or 1/4 of the capital) as a deposit. Rocket Pool has a DAO whitelisting node operators. The Rocket Pool strategy forces users to shoulder the majority of the costs while allowing attackers to assault the network at a 51% rate. The DAO strategy results in a single, potentially hackable governance device controlling a very substantial chunk of all Ethereum if a single such staking token dominates.
To lessen the possibility that any one liquid staking provider may grow to be too big to represent a systemic danger, one solution is to socially incentivize ecosystem players to utilize a variety of liquid staking providers. Long-term, nevertheless, this is an unsteady equilibrium, and relying excessively on moralistic pressure to resolve issues can be dangerous. One logical question is whether it would be wise to provide some sort of capability within the protocol to reduce the centralization of liquid staking.
What kind of in-protocol functionality is important here. The issue with simply generating a fungible “staked ETH” currency within the protocol is that it would need to either have an established Ethereum-wide governance to determine who runs the nodes, or it would need to be open-entry, making it a target for attackers.
Writings by Dankrad Feist on liquid staking maximalism are one intriguing concept. First, we have to face the fact that just 5% of the attacking ETH may be sliced if Ethereum is attacked by a 51% attack. This is a fair trade-off because there are currently over 26 million ETH staked, and an attack cost of 1/3 of that (about 8 million ETH) is much overkill, especially given the variety of “outside-the-model” attacks that can be conducted for considerably less. In fact, the “super-committee” proposal for adopting single-slot finality already considered a comparable tradeoff.
Over 90% of staked ETH would be immune to slashing if we assume that only 5% of attacking ETH is slashed. As a result, 90% of staked ETH might be converted into an in-protocol fungible liquid staking currency that can be used by other applications.
This route is fascinating. However, it still begs the question of what exactly would be memorialized. Similar to how RocketPool operates, each node operator contributes a portion of the money, and liquid stakers contribute the remaining funds. By limiting the maximum slashing penalty to, say, 2 ETH, we could make a few small adjustments and make Rocket Pool’s current rETH risk-free.
With minor protocol changes, we can accomplish additional creative things. For instance, suppose we want a system with two “tiers” of staking: node operators (high collateral requirement) and depositors (no minimum, can join and leave at any time). However, suppose we also want to prevent node operator centralization by granting a randomly-sampled committee of depositors powers like suggesting lists of transactions that must be included (for anti-censorship reasons), controlling the fork choice during an inactivity leak, or needing to simultaneously join By modifying the protocol to require each validator to give (i) a regular staking key and (ii) an ETH address that can be contacted to output a secondary output, this may be accomplished mostly outside of the protocol.