In the wake of the ledger halt incident on November 3, 2021, the XRPLF committed to publishing a technical report, detailing the issue, identifying the root cause and highlighting any fixes made as well any improvements that could be made.
Analysis of the events leading up to the halt, including analysis of logs from multiple sources and examination of possible code/network issues, continues with dozens of ecosystem participants and developers helping sift through information. Of particular note are the efforts of Richard Holland who has developed a tool to monitor the network for any telltale signs.
While we do not have a root cause at this time, some theories are starting to come into focus.
We will not speculate on those theories, but here is what we do know:
- The network has been under higher than normal load and, while it continues to operate, some services at the periphery of the network have been experiencing disruptions.
- Around the time of the incident several validators were offline, reducing the network’s redundancy.
- In the wake of quorum loss, the system behaved as expected, valuing safety over liveness. The halt was the correct response.
- The network recovered automatically, without any human intervention, processing transactions that had been submitted in the interim, and resumed normal operations.
- Monitoring, both at the server operators level and at the network level needs to be radically improved.
As we continue trying to understand this incident and how avoid a recurrence, we are grateful for the efforts of the teams from XRPL Labs and Ripple, as well as those of many other infrastructure operators and ecosystem participants.
We look forward to providing further updates as more information emerges.