How Everscale community repelled attacks on the network

How Everscale community repelled attacks on the network

This report uses technical language specific to the Everscale blockchain that may be unfamiliar to readers. To make things more clear we have included a small glossary of terms below.

Glossary

Opcode — an assembly language instruction that specifies certain atomic logic to be performed by a virtual machine

Masterchain — a subset of the Everscale network. The consensus, blockchain configuration and staking logic happen on the masterchain.

Main Workchain — the part of the Everscale blockchain where smart contracts are deployed, dApps are hosted and major user-facing activity occurs.

Collator — a validator node that is currently assigned to produce (collate) blocks. Block collation involves multiple stages where nodes manipulate the Internal Message Queue to validate and apply new transactions and include them in a block.

Internal Message Queue — an internal data structure that stores the messages to be processed by a node execution layer. The inefficiencies in some parts of the code that manipulated this data structure during the collation timeframe caused the incident in question to happen.

Chain split — a workchain feature that assigns different subsets of accounts to be processed by separate validator groups depending on the current load. Chain Split enables a network to execute smart contracts in parallel in order to withstand an increasing load.

Chain merge — the opposite of a Chain Split. When a load decreases, the split chains are merged back because multiple validator groups are no longer needed.

Downtime — the full or partial halt of the network for a set period of time.

Key takeaways

  • A smart contract consisting of two opcodes and a specially prepared initial state was deployed and executed on the mainnet.
  • Due to the combination of the two opcodes, the contract exploited a flaw in the implementation of the Evescale execution layer.
  • This flaw allows devs to write code that leads to the execution of a large amount of code, while paying a very small fixed price for it.
  • The apparent technical flaw is actually an economic issue in the system.
  • The node team considered several options to mitigate the issue but decided upon one after passing through internal tests in various configurations.
  • On the first day of the downtime, it became clear that adequate solutions would take more than two weeks to implement, including development tests and an initial audit.
  • The committee concluded that such a long downtime is unacceptable and decided to temporarily disallow the execution of the exploiting smart contract, and apply some improvements to the node execution layer. These updates are obviously not final and further upgrades are in the process of being implemented.
  • The network upgrade was carried out in two stages:

1. Node upgrade by the network’s community of independent masterchain validators.

2. Node upgrade by Base Workchain validators

  • The upgrade did not require a fork of the network but changes have been made to the configuration of the blockchain. The coordination of the network’s community of validators played an important role in resolving the incident.
  • The network has successfully processed all user transactions that were initiated during the downtime.
  • No user funds have been impacted

Incident details

On April 25th, 2023 at approximately 9 am CET, the Everscale Mainnet began to experience abnormal activity. This was caused by multiple attempts to exploit an inefficient piece of logic in the node execution layer.

The first attempt did not succeed and the network continued to operate in a normal fashion, with message processing, block creation and the splitting mechanism left intact.

However, during the second attempt, a certain point was reached at which users began to experience the effects of the exploiting smart contract. Namely, the network could not process the transactions tied to the exploiting contract due to internal message queue growth. At this point, the inefficiency in the piece of code that processes the above-mentioned queue began to affect collator node operations. Block creation became too expensive in terms of time resources, which caused fewer transactions to be included in a block. This triggered the network to merge shardchains that had been split before. In simple terms, the network started to “think” that the load had decreased, when it actually had been increasing more and more. When the queue of internal messages reached the threshold size, the collator node was no longer able to produce new blocks because it was busy executing inefficient message queue data structure manipulations.

Several days later, on April 30th at 15:30 CET, there was a third attack on the network exploiting the same vulnerability in a much more aggressive way. Fortunately, the devs were able to remedy the situation.

Technically speaking, the blockchain was not halted. Only the main workchain experienced downtime, while the masterchain operated normally. This allowed the core team and validator community to coordinate and successfully upgrade the network to mitigate the incident.

Economic aspects of the exploit

The cost of creating an internal message for an attacking contract consists of:

  • a constant fixed value per message
  • a message size fee
  • a message creation fee

For the attacking contract, all 3 values were the same for each call and did not change.

For a validating node, the resources to process a request grow logarithmically with the number of unprocessed messages.

The red line shows the constant cost of gas per message.

The purple line shows the actual resources needed to process the message.

At some point, the node spends more resources to process the message than the actual fee that was charged.

Incident Timeline

First attempt

There were around 67,000 transactions processed. The smart contract carried a small balance (100 EVERs) which was emptied out by gas payments for transaction executions before the contract could cause any damage to the network. No changes were noticed on the network.

Starting from block 34599096, the blocks are fully filled with contract calls, from 457 to 395 in block 34599144. In the next block, a split occurred.

Later on, from block 34599181, the contract stops increasing the queue. That’s due to the emptying of the contract balance. After block 34599191, the next split occurs, and one more after block 34599243. The processing ends at block 34599249.

Second attempt

There were over 150,000 transactions processed and another 135,000 in the queue before the issue started to be noticed by users. The smart contract this time carried a bigger balance (1000 EVERs) and succeeded in partially halting network operations.

The split happened after block 34603059. The next block after the split, block 34603060, only carried one transaction. Following that, there were splits after blocks 34603106, 34603160 and 34603237. Likewise, there is only one transaction in the next block with the number 34603238.

Root cause analysis

Everscale architecture ensures that each smart contract creating excessive activity is sharded. Basically, this means any contract doing this is isolated in its shard and there are computational resources allocated for its processing. At the same time, it does not impact the execution of other smart contracts in the network. In this case, the sharding mechanism, due to a very high number of messages being sent by the exploiting smart contract, was activated and worked as it was designed to, meaning, the contract was isolated in its shard and processed. Notwithstanding this, network operations were degraded.

The core developers investigating the event identified the cause of degradation to be an operation of the node that, for small values, is free and, accordingly, does not require any billing (charges). Contrarily, for larger values, it becomes expensive for the network node. However, in this case, it also did not require any billing. This led to a situation in which the exploiting smart contract, consisting of two opcodes and a manually prepared initial state, became too expensive to be processed. As a result, the cost could not be withdrawn from the balance of the exploiting smart contract.

The solution

Faced with the risk of the unbilled, hidden load spreading to other chains, a prompt response was required. In order to remedy the situation, the core developer team made some optimizations to the validator node (where possible considering the tight deadline) and froze the execution of the exploiting smart contract. It is important to note that these patches are not ideal and are only a temporary solution to ensure that the network gets up and running again as fast as possible.

In the long term, Everscale is working on a global resolution for the exploited vulnerability. This global resolution will include the following:

  • complex optimization of the validator node
  • reorganizing core team workflows to enable an accelerated response to incidents and optimize development.
Read More