How new network monitoring system works in Everscale

How new network monitoring system works in Everscale

Everscale is not just another blockchain promising to outperform Ethereum in terms of TPS, but an app-rich ecosystem that has already delivered on this promise, reaching 64k TPS (confirmed). More than that, Everscale accomplished another truly impressive breakthrough by solving the blockchain trilemma.

Due to the network’s performance, both the number of users and dApps in the ecosystem are constantly growing. With the high level of activity on Everscale, network-wide security is absolutely crucial. As a result, the platform has developed a robust Blockchain monitoring system. It is capable of overseeing a variety of network components and processes and is sufficiently flexible to adapt to changing conditions, including exploits from outside the network. This, in turn, covers fault tolerance, ensuring 99.999% availability of all operations in the network (no system is capable of providing 100%).

How Everscale’s blockchain monitoring system works

The initial phase of tracking a validator or overseeing the functionality of other components of the network is monitoring. At the core of any monitoring system there is a data collector that regularly collects and stores relevant information from the system being monitored, which in our case is the Everscale network. The information collected, in turn, is processed and presented to the blockchain monitoring operators, in the form of analytical panels. Notwithstanding the importance of blockchain monitoring, having to keep an eye on potentially multiple dashboards is a very time-consuming way of making sure that a validator or the system as a whole is healthy. To tackle this problem, Everscale introduced an automated alerting system (for some processes). It informs operators about any abnormal activity from network components in a timely manner. This way, the lag between issue identification and resolution is reduced to seconds.

Besides the blockchain’s alerting mechanism, in order for the monitoring system to perform at an optimal level, Everscale broke it up it into three core layers: End-User, Validation and Infrastructure. Each one of them provides detailed information necessary to monitor the activity taking place in their respective layer, such as different metrics, logs and data tracking. Additionally, the respective data is used to conduct comprehensive analysis of the system’s health at each level. Such visualization capabilities are of the utmost importance when it comes to decision-making when incidents occur on the network. It’s worth mentioning that thanks to the performance of the network security monitoring system, the Everscale devs were able to identify and prohibit the activity of a smart contract that recently tried to bring the entire network to a halt.

Layers that compose Everscale’s monitoring system

User layer — data reflecting the activity and experience of the end user, such as message, transaction and block creation speeds. With the help of this data, quantitative and qualitative analysis of transaction processing can be formed. For example, a telling metric of the user interaction with the network will be the number of successful or unsuccessful transactions per unit of time.

Validation layer — data reflecting the operation of validators in the network. On Everscale, validators must be operational, that is, connected to all nodes all the time, and sign each proposed block. If a validator is disconnected, it will miss out on rewards and its stakes will be slashed. To view all validators in real-time, there is a map with all current validators, their IP addresses and statuses (active or not found).

Infrastructure layer — data reflecting the interaction between nodes, ensuring they are synchronized to keep the network well-functioning. A peer-to-peer network is a computer network in which nodes are distributed and share the workload to achieve a common purpose. This layer provides the possibility to monitor Masterchain block intervals, number of shards, average shardchain block rates and the state of elections.

Blockchain monitoring system requirements

Monitoring is probably the least mentioned instrument or service when we speak about decentralized technologies. At the same time, it is clear that without high-quality network monitoring, it is impossible to meet network security requirements, including the recovery time after incidents. A well-functioning blockchain monitoring system has to have the following set of characteristics:

Effectiveness

A blockchain monitoring system must solve a wide range of tasks, from the most basic triggers that respond to static thresholds, to complex analytical concepts such as anomaly detection and forecasting with the help of statistical analysis or machine learning.

Flexibility

A monitoring system should be able to receive data from different components of the network via different channels. Also, it should have a rich set of data exporters and built-in integration mechanisms with the software. That is, any task assigned to monitoring should be solved as quickly as possible and, without long integration processes (with other systems).

Openness

A monitoring system must constantly take inputs from the network, process them and release them as outputs. In other words, the blockchain monitoring system should be part and parcel of the network in which it operates. All newly identified anomalies should immediately be analyzed and then turned into red flags and incorporated into the Network monitoring system.

Currently, Everscale has managed to achieve a certain harmony in combining all three characteristics. Over the past two years, the service has made a great leap forward and has been able to respond to most unforeseen capable of negatively affecting the network.

Setting Up Alerts

A very efficient blockchain monitoring tool has to have some form of automated alerts that track the activity of the network in real-time and immediately signal in case something goes astray. With Everscale, such alerts use the data collected and structured by the monitoring software. There is a set of criteria embedded into them to be able to actively inform devs in case that criteria gets violated. The alerting mechanism can be in-built into the monitoring software itself, or it can be designed around an existing network monitoring setup that exposes the monitored metrics.

The main advantage of programmed alerting over manual monitoring is reaction speed. The only thing devs engaged in monitoring have to do is to promptly react to alerts. They can quickly examine what the issue is, then either fix it themselves or expedite it further in case the resolution requires the expertise and assistance of other teams. Later, the information obtained can be used as a wake-up call to check the blockchain monitoring panel in an effort to troubleshoot or narrow down the root cause of the alert.

Notwithstanding its pros, there is a hurdle in implementing a well-functioning automated alerting system. Namely, defining the criteria for automated alerts, which is a very complicated task. The main challenge lies in minimizing false alarms and generating useful real-time network security alerts. That is to say, the task is not to increase the quantity of alerts, but rather their quality. The quality of an alert can be measured by how helpful it is in the eradication of the identified issue. With the number of helpful alerts increasing, the blockchain’s monitoring system is constantly improved.

For illustrative purposes, we have provided an animation below depicting the way a common alert works in Everscale. In it, we depict the Main Workchain which gradually divides into eight shards to be able to process an increasing number of transactions. Each shard sends block proofs to the Masterchain every two to three seconds. At the same time, the Masterchain issues blocks every five to six seconds. The alert is set to notify the monitoring operator in case the timing mentioned is damaged. In our case, the two particular shards stop sending block proofs for more than five seconds, which immediately halts their activity and notifies the operator about it on Telegram.

Read More