Lighthouse Update #41

I would have expected our first post-merge blog post to wax lyrical about the overwhelming success of the merge, however by the third day of Devcon 6 it became clear that the merge is now "old news". I certainly wouldn't want to waste your time with old news, so I'll quickly switch into the present tense and deprive my ego of further pampering.

Protocol Upgrades

The two hottest topics in the Ethereum protocol right now are EIP-4844 and "withdrawals". EIP-4844 is a big scaling upgrade which promises a much more cost-effective experience for rollup-esque layer 2 solutions. Withdrawals, on the other hand, will add the ability for Ethereum stakers to finally access the staked Ether they've been locking up since November 2020.

Notably, there's been a bit of discussion about which of those two features Ethereum should include in the upcoming Shanghai/Capella upgrades. Withdrawals is a relatively straight-forward upgrade whilst EIP-4844 requires a bit more work, primarily because it increases block sizes and therefore consumes more bandwidth on the P2P network and complicates block propagation.

To me, there appears to be firm consensus among the community that withdrawals must go into the next upgrade. However, the jury still seems to be out on also including EIP-4844 since the complexity it holds has the potential to delay the upgrade. Ultimately, the question is about whether or not we should ship withdrawals as soon as possible, or if we should wait for EIP-4844. Shipping EIP-4844 and withdrawals will likely deliver withdrawals later than if it were on its own, so we're in an interesting trade-off space between fulfilling our promise to stakers and scaling Ethereum (via layer 2 solutions) as soon as possible.

Whilst I can't speak for every individual in the Lighthouse team, my understanding is that we see withdrawals as an outstanding promise that we owe the staking community. Additionally, we appreciate that L2 solutions are the future of Ethereum and we're eager to see them implemented. For me, personally, I am open to a slight delay on withdrawals if it means shipping EIP-4844 earlier. However, if EIP-4844 needs many months more work, I'd be happy to push it to the next upgrade/fork. As I see it, continuing to work on withdrawals and EIP-4844 in parallel is the only way to learn more about the relative complexities and help make an informed decision.

Working on both these features in parallel is exactly what we're doing. Our works for both withdrawals and EIP-4844 presently reside in the sigp/lighthouse/eip4844 branch. We're combining the works in the same branch until it becomes clear whether the two features will ship together or separately. It will be straight-forward to separate them if we need to ship EIP-4844 after withdrawals. The withdrawals efforts are being lead by @ethDreamer. The EIP-4844 works have been predominately lead by @pawanjay176 and @realbigsean.

The next two sections will provide an overview of both of these protocol updates.

Withdrawals

The withdrawals upgrade is currently specified over in the Capella Beacon Chain changes section of the consensus specs. I appreciate that those specs probably mean nothing to practically everyone except the select few who are fluent in Ethereum-consensus-executable-Python. So, I'll give an overview here.

As stakers out there will be acutely aware, the Beacon Chain currently has over US$20 billion in staked Ether that is presently inaccessible to those stakers that own it. There's always been a promise that they will be able to access it someday, however that day has not yet arrived. Withdrawals represent the fulfilment of that promise.

There are two scenarios where stakers should be able withdraw some presently-staked ETH to use as they see fit (sell, re-stake, hodl, etc.):

Full Withdrawals: when a validator exits they should be able to access the full balance of that validator.
Partial Withdrawals: when a validator has over 32 ETH they should be able to access any ETH beyond the minimum 32 ETH that is required to remain staking.

The current implementation of withdrawals addresses both of these use-cases by periodically (and automatically!) withdrawing exited and surplus balances (> 32 ETH) to an Ethereum address of the validator's choosing (more about this "choosing" later). When I say "Ethereum address", I mean a good-old-fashioned 20-byte address on the execution chain which can be transferred to exchanges, used with smart contracts or squirreled away into a hardware wallet.

For full withdrawals (i.e., exited validators), they will see their full balance credited to an Ethereum address less than ten minutes after their validator exit completes and becomes "withdrawable" (which is a matter of days or weeks after they submit their exit).

Partial withdrawals, on the other hand, will happen a little slower. Every epoch (6.4 minutes), up to 256 validators will have any value over 32 ETH automatically withdrawn to an Ethereum address. For example, a validator with a balance of 32.5 ETH will have 0.5 ETH automatically withdrawn.

Since there are presently approximately 410,000 active validators, if each of those validators has some ETH to be withdrawn then we can expect each validator to see a partial withdrawal each 6.4 minutes * (410000 / 256) ~= 7 days. So, with the present numbers, each validator should expect to see a staking rewards deposit to their chosen Ethereum once a week. I find once-a-week to be a nice frequency; it's regular enough to cover expenses but infrequent enough to avoid being an accounting burden.

Back to "choosing" an Ethereum address for these withdrawals; stakers who followed the default flow with the staking-deposit-cli (suggested by most guides) will not have an Ethereum address which could receive partial and full withdrawals. Therefore, before they can withdraw, those stakers will need to submit a special, one-time BLSToExecutionChange message for each validator to:

Prove that they are in possession of the mnemonic used during the validator on-boarding process.
Elect an Ethereum address for withdrawals.

The tooling to achieve this has not yet been developed, so there's nothing required for now. Stakers will certainly be required to access their mnemonic to achieve this, so start thinking about where you hid it! Please don't share your mnemonic with anyone. There will be official communications from the Ethereum Foundation and client developers on how to perform this process safely.

Notably, there are a set of users who have already elected an Ethereum address as their withdrawal credentials. In my experience, these users are most likely institutions or staking pools (e.g., RocketPool) rather than home stakers. Those users will simply see withdrawals start automatically after the upgrade is implemented, with no action required on their end.

EIP-4844

I don't want to go into too much detail on EIP-4844, since I fear this post will be too long and I expected it would be more valuable to readers to spend time on withdrawals rather than here.

EIP-4844 is predominantly a "back-end" change which will allow layer 2 solutions to operate more efficiently. Everyday users won't find EIP-4844 immediately useful, rather they'll feel the benefits via the second-order effects of cheaper layer 2 solutions. Layer 2 solutions like Optimism and Arbitrum are already keen to implement this feature; users of those products are likely to benefit directly from EIP-4844.

The problem that EIP-4844 solves is the short-term data availability problem. Layer 2 solutions need to have data available on the Ethereum chain for a matter of hours, days or weeks. Presently, Ethereum only offers forever data availability; once something is "on" Ethereum it's stored in the state forever (to some degree). This "forever" data availability is rightfully expensive; it should be expensive to store data forever. Unfortunately, this means that layer 2 solutions are massively overpaying for their needs; they want short-term but are paying for long-term.

EIP-4844 adds short-term storage to blocks via "blobs". These blobs will only be stored by nodes for a time of, most likely, days to months (the storage duration is still under debate). Thanks to some fancy cryptography (KZG, to be specific), syncing nodes can still process transactions referencing these blobs without actually having the blob itself (i.e., syncing nodes don't need to download old blobs).

If you'd like to learn more about EIP-4844, the eip4844.com page is a great place to start.

Networking

Diva and Adrian from the Lighthouse/Sigma Prime team gave a very interesting and well-attended talk at Devcon 6: Reducing Beacon Chain Bandwidth for Institutional and Home Stakers.

In this talk, Diva and Adrian cover the impressive bandwidth savings that can be seen for nodes with multiple validators by adopting the episub protocol for the Beacon Chain gossip network. Validators with 64 or more validators may expect to use less than a 10th of the bandwidth they use today.

We have built a version of episub in Lighthouse which is awaiting further testing and feedback from other teams before it can be published it in a release. One of the great things about this upgrade is that it's backwards compatible; Lighthouse can start using this new protocol whilst maintaining compatibility with older versions and other clients who have not implemented it.

Diva and/or Adrian are planning to publish another blog post with more detail about this feature before its release; stay tuned!

Tree States

This section is authored by Michael Sproul.

Tree states is a long-running project to change Lighthouse's internal state representation from flat vectors to copy-on-write trees. The project was begun with the goal of improving performance in a few key areas:

Reduce the number of cache misses during block processing by caching vastly more states in memory. This makes Lighthouse more resilient to re-orgs and long periods without finality. Around 128 tree states can be stored in a similar amount of memory to just 4 full states.
Reduce the amount of data written to disk by relying on the in-memory state cache. Recent states do not need to be written to disk at all if they remain in memory and get pruned on finalization.
Reduce lock contention by performing fast copies of states behind locks, rather than holding locks or performing slow deep copies.

In the course of implementing tree states it also became apparent that there were other high value improvements we could make to the database schema. The tree states change already required a rather major database schema upgrade, so it made sense to roll all of these improvements into one upgrade.

2x less disk usage for validating beacon nodes by compressing beacon blocks on disk (sigp/lighthouse#3208).
More than 16x less disk usage for archive beacon nodes by de-duplicating data and applying compression. The 250 GiB archive node is coming soon! sigp/lighthouse#3626
Overall 5x reduction in Lighthouse IOPS during normal operation.

However, implementing tree states has not been without challenges. The tree data structure is slower to iterate over than the current flat structure, which means that several parts of block processing became slower as a result. New caches and algorithms have been implemented to mitigate this slowness, and some of these have already merged to stable.

In total, tree-states is nearing a state of completion, and I am pushing ahead to solve the remaining issues so we can ship in a not-too-distant release! Adventurous readers are welcome to try out the branch before it's complete, although naturally it comes with absolutely zero stability guarantees, see sigp/lighthouse#3206.

beacon.watch

This section is authored by Mac Ladson.

Another project we've had in the pipeline for the last few months, is our Beacon Chain monitoring and analysis platform, titled beacon.watch.

beacon.watch uses Lighthouse to load and store data from blocks, along with more specialized data from Lighthouse's /lighthouse/analysis API endpoints such as per-epoch validator attestation performances, proposer block rewards and block attestation packing. Pulling the data from these /analysis endpoints can be very resource intensive as it generally requires replaying a large number of blocks and even pulling a few epochs of data can often take several minutes. By pre-computing these values and storing them into a database we can serve them out much faster, enabling quick and detailed analysis of both historic and near-head events.

Another key (and most powerful) feature of beacon.watch is its close integration with Blockprint. This allows us to easily map specific client implementations onto specific validator indices and therefore identify patterns of behaviour unique to those implementations. This should greatly aid in retrospective incident analysis along with identification and attribution of potential consensus bugs/performance deficiencies.

As with Blockprint, we will have a private API available upon request, however all code will be open and available to use should others wish to host their own servers. We also have eventual plans to host a publicly available front-end where sensitive Blockprint data is hidden/aggregated.

beacon.watch is still in the draft phase but you can follow along with development here.

Additional Efforts

Alongside the topics we've covered in detail, there are many other equally-important tasks that didn't already get some airtime. Some devops efforts by @antondlr, who has only been with the team for a matter of weeks, include:

Implementing a checkpoint-sync server at mainnet.checkpoint.sigp.io.
Moving our internal logging infrastructure from Loki to ELK to allow us to perform more complex querying and analysis.
Many more tweaks for performance, usability and security.

Furthermore, we have been:

Working with @gballet on a stateless-Ethereum Verkle-testnet
- Credits to @macladson for the Lighthouse works.
Starting to track the addition of light client support for Lighthouse.
- Thanks to @Giulio2002 for their works in sigp/lighthouse#3655.
Implementing a new lighthouse validator_manager command which can move validators between running VCs via the HTTP API (sigp/lighthouse#3501).
Continuing to optimise Lighthouse, reducing our core block processing times by 20% and our fork-choice block-processing times by almost 50%. These functions are on the path between receiving a block and being able to attest to it, arguably the most important path for attestation rewards.

Summary

It's been a massive year for the Lighthouse team and, as you can see, we're not slowing down. We helped deliver a practically flawless merge (sorry, old news) and we're working on delivering even more before the year is over.

Something that really excites me is seeing the capacity of the team increasing and people that joined in the past year or two taking the lead on big protocol upgrades and delivering quality code and spec contributions. Over the past years we've put a lot of effort into finding great people and giving them time to gain experience in parts of the project that they're enthusiastic about. We're seeing the fruit of these efforts now; Lighthouse is more diverse, productive and capable than it's ever been.

Thanks to our users, contributors and supporters. We couldn't do it without you!