Choosing the state engine type

Journal based state engine has been designed to provide a better alternative to the traditional state engine. From a performance point of view, the journal based state engine provides better results in most situations; nevertheless, due to the proven robustness, it's expected the usage of the traditional state engine will continue for some time, especially when high performance is not a must.

Independently from the chosen state engine, some common concept requires understanding to operate an aware choice.

First of all, the LIXA state server does not persist the data used by the Application Program, it only persists the state of the transactions. In the event that the state of a transaction is not preserved, you don't miss the data, but the state of the transaction that was manipulating the data. The direct consequence of the previous statement is that: in the worst case, if LIXA state server miss the state of the transaction, you have to rollback or commit the transaction manually. This is obviously not a situation you want to frequently have in your normal operations and this is why LIXA state server provides the highest level of resiliency.

If you want to be 100% sure you don't miss the state of the transaction, you must be 100% sure that the state of the transaction has been recorded in a durable way, typically in some storage device. Writing storage, even the fastest storage, is still a quite slow operation in comparison with other communication and computing operations. Very high performance storage can provide write latency below 1ms, but in a typical production environment, having 5ms write latency to the storage subsystem can be considered a good scenario. Incredibly enough, 5ms are a huge time for contemporary computing and contemporary networking.

To make a long story short, this is a typical trade-off between speed and safety: the faster you go, the less safe you can be. In a real case scenario, 100% safety is unlikely to be the best choice: it's capped by the technology limits and it risks to unnecessarily slow down your whole system.

The good news is that the LIXA state engines, both journal and traditional, provide parameters to tune the behavior.

Tuning the journal state engine

The journal state engine provides two levels of resilience with two different Recovery Point Objectives depending on the type of crash encountered by the LIXA state server.

Soft crash RPO

A soft crash is a crash that happens to the LIXA state server process (lixad), but it does not break the operating system. Common examples are:

  • killing lixad

  • lixad crashes due to some unexpected reason

In the event of a soft crash, the restart usually happens from the last active state table file without any data miss. As a consequence, when restart from the last active state table succeeds, the RPO is zero and there's no transaction state missing.

Hard crash RPO

A hard crash is a crash related to the operating system or a damage to state table files. Common examples are:

  • operating system / hypervisor crashes

  • hardware failure

  • disk content corruption

In the event of a hard crash, the restart happens from the last consistent active state table file plus all the available consistent state log records.

Due to different strategies of log flushing, the corresponding RPO can be greater than zero.

Impact of parameter min_elapsed_sync_time

The parameter fixes the minimum delay between the need of a transaction state flushing and the start of the I/O operation:

  • a low value slows down the lixad server due to frequent asking for log flushing

  • a value greater than zero increases the RPO for hard crash by the same amount

  • to have a guaranteed RPO=0 in the event of hard crash, this parameter must be set to zero

Impact of parameter max_elapsed_sync_time

The parameter fixes the maximum delay between the need of a transaction state flushing and the start of the I/O operation:

  • a low value slows down the lixad server due to frequent asking for log flushing

  • a value greater than zero increases the RPO for hard crash by the same amount

  • to have a guaranteed RPO=0 in the event of hard crash, this parameter must be set to zero

The parameter must be greater or equal to min_elapsed_sync_time: if both are set to value zero, state log file is flushed as soon as a transaction needs to persist a new state.

Impact of parameter log_size

The parameter specifies the desired amount of disk space that must be used of every state log file. A state log file can be switched only when the previous state table synchronization has been completed. In presence of disks with high latency and high throughput, a larger value can be helpful to obtain better performances.

Important

Too large logs can produce adverse performances during state server restart in a couple of situations:

  • option log_o_direct="1" (O_DIRECT) is used

  • the state server restart follows a system reboot

In both cases, Linux operating system has not cached the log file page and all the reading must be done by the storage devices. As a rule of thumb, don't allocate a large log if there's no a valid performance benefit, during normal activity, in doing it.

The parameter has no direct impact on RPO.

Impact of parameter max_buffer_log_size

The parameter specifies the quantity of RAM used as buffer for log writing: under some circumstances, higher values can improve performances.

The parameter has no direct impact on RPO.

Impact of parameters log_o_*

The parameters specify the corresponding flags that must be used for writing the state log files; as a general rule, log_o_direct="1" (O_DIRECT) is faster than log_o_dsync="1" (O_DSYNC) and log_o_dsync="1" (O_DSYNC) is faster than log_o_sync="1" (O_SYNC).

Important

To be precise, only log_o_sync=1 in association with min_elapsed_sync_time="0" and max_elapsed_sync_time="0" guarantees RPO=0 in the event of hard crash for every hardware configuration, but such configuration limitates the performance of the LIXA state server and introduce additional latency.

In real life scenarios, depending on the characteristics of the storage subsystem, less restrictive options can be reasonably used. The best configuration requires investigation on the specific hardware configuration. The configuration provided by default can be considered a starting point to adjust according to the user's needs.

Note

Parameters log_o_* can be uses together, for example you can specify both log_o_direct="1" and log_o_dsync="1" to combine the effects of O_DIRECT and O_DSYNC flags for log I/O.

Tuning the traditional state engine

The traditional state engine provides a single level of resilience and no difference among types of crash encountered by the LIXA state server.

Only two parameters can be configured: min_elapsed_sync_time and max_elapsed_sync_time while the others are ignored.

Impact of parameter min_elapsed_sync_time

The parameter fixes the minimum delay between the need of a transaction state flushing and the start of the memory map sync operation:

  • a low value slows down the lixad server due to frequent asking for map syncing

  • a value greater than zero increases the RPO by the same amount

  • to have a guaranteed RPO=0 this parameter must be set to zero

Impact of parameter max_elapsed_sync_time

The parameter fixes the maximum delay between the need of a transaction state flushing and the start of the memory map sync:

  • a low value slows down the lixad server due to frequent asking for map syncing

  • a value greater than zero increases the RPO by the same amount

  • to have a guaranteed RPO=0 this parameter must be set to zero

The parameter must be greater or equal to min_elapsed_sync_time: if both are set to value zero, memory map is synchronized as soon as a transaction needs to persist a new state.

Balancing performance and resilience

The higher the value of RPO, the higher the chance you will have to perform manual recovery in the case of a server crash (manual recovery is explained in the section called “Manual (cold) recovery”).

On the other hand, don't force RPO=0 if you don't have clear evidence that you need it: depending on your business requirements and your hardware configuration, especially if you use the journal based state engine, the need for RPO=0 might be not necessary. LIXA web site contains detailed performance analysis of a couple of possible deployment models: refers to the following links to figure out how the performances are influenced by the configuration parameters.

The first architecture applies to traditional environments with monolithic applications distributed in different tiers: the Application Program and the LIXA state server run in different virtual machines. The figures show that the highest the RPO, the lowest the total latency introduced by LIXA in managing distributed transactions.

The second architecture applies to microservices environments with a sidecar approach, typical of Kubernetes deployments: the Application Program and the LIXA state server run in the same virtual machine (and in the same pod in a Kubernetes configuration). Even for this type of architecture the figures show that the highest the RPO, the lowest the total latency introduced by LIXA in managing distributed transactions. Furtherly, this second type of architecture exhibits a lower latency than the first type.

Conclusions

LIXA configuration parameters allow fine tuning of the state engine; in a real life environment, apply the following guidelines:

  • use default configuration for traditional state engine, if you want to be conservative, or journal based state engine, if you want to be innovative

  • choose the proper deployment architecture: "client/server" or "colocated" depending on your application architecture

  • in the event of scalability issues, you need to manager a huge number of Transactions per Second, split the workload in several LIXA state servers that work indipendently

  • in the event of latency issues, you need a very low latency, understand if an higher RPO fits your requirements; if no, split the workload in several LIXA state servers that work indipendently: the figures show that the latency is strictly correlated with the number of Transactions per Second managed by the state server

  • if you use the journal based state engine, monitor syslog messages: the engine generates useful messages if the storage configuration is not optimized