In a recent development within the Ethereum community, a debate is brewing over whether to incorporate more features into the core Ethereum protocol or continue relying on layer 2 solutions. This discussion raises questions about the philosophy of protocol minimalism and the evolving needs of Ethereum users.
Early Emphasis on Protocol Minimalism
Since its inception, Ethereum has adhered to a philosophy of keeping the core protocol as simple as possible and encouraging users to build additional functionalities on top of it. The original vision was to have the protocol act primarily as a virtual machine, with most of the logic executed through smart contracts developed by users. This approach aimed to enable flexibility and innovation within the Ethereum ecosystem.
A very approximate reconstruction-from-memory of a whiteboard drawing Gavin Wood and I made back in early 2015, talking about what Ethereum 2.0 would look like.
Account Abstraction and Scaling Challenges
This philosophy was particularly relevant to two key areas: account abstraction and scaling. The idea was to abstract the scaling process so that contracts could interact with data not stored on most Ethereum nodes, allowing for a more scalable system. Similarly, account abstraction aimed to enable users to replace basic transaction validation with their own logic, enhancing security and flexibility.
However, the quest for protocol minimalism led to challenges. Early proposals, such as EIP-86, suggested reducing the complexity of the Ethereum Virtual Machine (EVM) but introduced complexity elsewhere in the stack. This complexity included the need to manage transaction revalidation, potential issues with transaction duplication, and other complications.
EIP-2938 and ERC-4337: A Shift Towards Enshrinement
As a response to these challenges, EIP-2938 emerged, introducing several new features, including a new transaction type, global variables, and new opcodes. To address gas efficiency and code bug risks, ERC-4337 was introduced as an extra-protocol solution. However, the Ethereum community is now considering bringing elements of ERC-4337 back into the core protocol.
There is a lot of boilerplate code that occurs in between processing a transaction and making the actual underlying EVM call out of the sender address, and a lot more boilerplate that comes after. How do we reduce this code to as close to nothing as possible?
Specification
If
block.number >= METROPOLIS_FORK_BLKNUM
, then: 1. If the signature of a transaction is(0, 0, 0)
(ie.v = r = s = 0
), then treat it as valid and set the sender address to2**160 - 1
2. Set the address of any contract created through a creation transaction to equalsha3(0 + init code) % 2**160
, where+
represents concatenation, replacing the earlier address formula ofsha3(rlp.encode([sender, nonce]))
3. Create a new opcode at0xfb
,CREATE_P2SH
, which sets the creation address tosha3(sender + init code) % 2**160
. If a contract at that address already exists, fails and returns 0 as if the init code had run out of gas.
Reasons for Enshrining ERC-4337
Several compelling reasons have been put forth for enshrining ERC-4337 into the protocol:
- Gas Efficiency: Enshrining certain features in the protocol can significantly improve gas efficiency, eliminating overhead associated with executing these features on the EVM.
- Code Bug Mitigation: Enshrining code in the protocol allows for community-wide responses to code bugs, reducing the risk of fund loss due to vulnerabilities.
- Support for EVM Opcodes: Features like transaction origin (tx.origin) can be better supported within the protocol, improving consistency and functionality.
- Censorship Resistance: Bringing features into the protocol ensures that they are legible to the Ethereum protocol, enabling better censorship resistance through inclusion lists.
Gas Efficiency Challenges
Efforts to improve gas efficiency in ERC-4337 revealed a significant challenge – the cost of accessing code and shared libraries. The current approach incurs one-time storage and code reading costs, leading to uneven transaction costs. Enshrining these shared libraries in the protocol would remove these fees, making the system more efficient and equitable for users.
The multi-invalidation problem in account abstraction. One transaction getting included on chain could invalidate thousands of other transactions in the mempool, making the mempool easy to cheaply flood.
Lessons for Enshrining Features
The case of ERC-4337 offers valuable lessons for enshrining features in the Ethereum protocol:
- “Move Complexity to the Edges”: Enshrining features becomes essential when high fixed costs are involved, as is the case with shared libraries.
- Community Response to Code Bugs: When a wide range of users relies on specific features, it makes sense to allow the community to address code bugs through hard forks.
- Leverage Protocol Powers: In some cases, protocol-level features are more powerful and efficient, as seen in in-protocol censorship resistance mechanisms.
Balancing Flexibility and Enshrinement
While enshrining features in the protocol can enhance efficiency, security, and functionality, it’s essential to strike a balance with the need for flexibility to accommodate diverse user requirements. Ethereum’s ongoing debate highlights the evolving nature of blockchain technology and the ongoing quest to optimize and expand its capabilities.