Tracing

The LIXA project can produce a lot of traces when the software is running; the tracing functions are included in the code by default but you can disable them when you configure the software before the build phase:

	./configure --disable-trace
	make
	...
      

Warning

Excluding tracing functions can reduce CPU and memory usage when the software is running, but you should disable tracing only if you are really sure about it: without tracing you have no instruments to understand what's happening inside LIXA code when it's running.

In the case you didn't disable the tracing feature at build time (see above), you could activate the message production when you need it.

Important

LIXA provides two different tracing capabilities: one is based on the concept of "stack tracing" and allows to catch the exiting condition of every function call; the other one is based on the concept of "modules" and allows selective module tracing. Stack tracing is more usefulto catch anomalous conditions whenever they happen, while module tracing is more useful to get fine grain messages of a subset of functions to understand why things are not running as desired.

Stack Tracing

Stack tracing can be activated using the environment variable LIXA_STACK_TRACE; three different values are allowed:

  • ALL to stack trace every function call; this value should not be used in a production environment to avoid a stack trace message congestion

  • ERRORS to stack trace only the function calls that end with an error condition; this is the value suggested to detect anomalous conditions in production environments

  • WARNINGS to stack trace only the function calls that end with either a warning or an error condition; this setting can produce trace messages that are not related to anomalous conditions

When activated, stack tracing produces JSON formatted records as in the example below (messages produced by a LIXA application):

2020-10-18 19:06:06.871362 [32034/47612092549696] { "type": "StackTrace", "date": "2020-10-18", "time": "19:06:06.871362", "process_id": 32034, "thread_id": 47612092549696, "function_name": "client_status_coll_get_cs", "source_file_name": "client_status.c", "source_file_line": "328", "exception": 1, "return_code": 1, "return_code_text": "WARNING: object not found", "errno": 0, "errno_text": "Success", "trace_module": 0x20000 }
2020-10-18 19:06:06.874283 [32034/47612092549696] { "type": "StackTrace", "date": "2020-10-18", "time": "19:06:06.874283", "process_id": 32034, "thread_id": 47612092549696, "function_name": "lixa_xa_open", "source_file_name": "lixa_xa.c", "source_file_line": "1446", "exception": 10, "return_code": -99, "return_code_text": "ERROR: an XA function returned an unexpected return code", "errno": 0, "errno_text": "Success", "trace_module": 0x2000 }
	

The JSON payload can be extracted by the whole message quite easily and it can be ingested by a log analytic platform like for instance the Elastic Stack (ELK).

Important

Not necessarily an error condition is the result of a bug: LIXA software try to manage as many as possible error conditions without hurting the user. Nevertheless, stack tracing can be useful in catching run time issues that are difficult to reproduce.

Note

Stack trace displays the bit that must be put in trace mask to get all the messages produced by the module that generated the exception. In the example above, look at "trace_module" field inside the JSON records.

Tracing modules

The trace messages produced by LIXA code are divided by modules: you can activate the trace messages for one or more modules. There is not a concept of severity for the trace messages: if you activate the trace for a module, it will print all the trace messages of that module.

The table below shows how the LIXA software is partitioned in modules:

Table 10.1. Tracing module flags

Module Trace LabelHex FlagComponentFunction
LIXA_TRACE_MOD_SERVER0x00000001lixad (state server)main program
LIXA_TRACE_MOD_SERVER_CONFIG0x00000002lixad (state server)configuration: config file parsing and environment variable detection
LIXA_TRACE_MOD_SERVER_LISTENER0x00000004lixad (state server)network listener and signal handler
LIXA_TRACE_MOD_SERVER_MANAGER0x00000008lixad (state server)session client manager, thread manager, network I/O manager
LIXA_TRACE_MOD_SERVER_STATUS0x00000010lixad (state server)persistent state
LIXA_TRACE_MOD_XTA0x00000020lixac (client library)XA Transaction API (XTA) functions
LIXA_TRACE_MOD_SERVER_XA0x00000040lixad (state server)XA logic called by client
LIXA_TRACE_MOD_SERVER_REPLY0x00000080lixad (state server)replies to client messages
LIXA_TRACE_MOD_SERVER_RECOVERY0x00000100lixad (state server)logic necessary to answer the client recovery calls
LIXA_TRACE_MOD_SERVER_FSM0x00000200lixad (state server)server FSM (finite state machine) functions
LIXA_TRACE_MOD_CLIENT_TX0x00001000lixac (client library)transaction demarcation (TX) functions: tx_open(), tx_begin(), ...
LIXA_TRACE_MOD_CLIENT_XA0x00002000lixac (client library)XA function wrapper: xa_open(), xa_start(), ax_reg(), ...
LIXA_TRACE_MOD_CLIENT_CONN0x00004000lixac (client library)function necessary to connect to the state server (lixad)
LIXA_TRACE_MOD_CLIENT_CONFIG0x00008000lixac (client library)configuration: config file parsing and environment variable detection
LIXA_TRACE_MOD_CLIENT_XA_SWITCH0x00010000lixac (client library)XA switch file implementation of the dummy resource managers provided by LIXA and of the XA wrappers for the resource managers without a standard switch file
LIXA_TRACE_MOD_CLIENT_STATUS0x00020000lixac (client library)client status management
LIXA_TRACE_MOD_CLIENT_RECOVERY0x00040000lixac (client library)warm and cold recovery of the transaction(s)
LIXA_TRACE_MOD_CLIENT_GENERIC0x00080000lixac (client library)generic client functions
LIXA_TRACE_MOD_CLIENT_TPM0x00100000lixac (client library)client TPM functions
LIXA_TRACE_MOD_SERVER_TPM0x00200000lixad (state server)server TPM functions
LIXA_TRACE_MOD_COMMON_CONFIG0x01000000lixab (common base library)configuration stuff common to all components
LIXA_TRACE_MOD_COMMON_XML_MSG0x02000000lixab (common base library)functions related to XML messages serialization and deserialization
LIXA_TRACE_MOD_COMMON_STATUS0x04000000lixab (common base library)convenience functions used to manage the status on client and server side
LIXA_TRACE_MOD_COMMON_UTILS0x08000000lixab (common base library)utility functions used by all the components
LIXA_TRACE_MOD_COMMON_XID0x10000000lixab (common base library)functions specialized for XID (transaction ID) management


You can trace any module combination creating any trace mask ORing the bits; for example:

  • 0x00000003 will produce all the messages of LIXA_TRACE_MOD_SERVER and LIXA_TRACE_MOD_SERVER_CONFIG

  • 0xffffffff will produce all the messages

  • 0x00000000 will disable all the messages

Important

The suggested way to guess the modules require tracing is looking at the stack trace record produced activating stack tracing as described in the section called “Stack Tracing”.

Improve troubleshooting with trace

The trace can be activated setting the environment variable LIXA_TRACE_MASK before starting the execution of the traced program. Try the following example...

Check the state server is not active:

tiian@ubuntu:~$ ps -ef|grep lixad|grep -v grep
	

try to execute lixat, the LIXA test program:

tiian@ubuntu:~$ /opt/lixa/bin/lixat
tx_open(): -7
	

it gives us few information (TX_FAIL = -7): from /opt/lixa/include/tx.h we can retrieve the meaning: fatal error... Not so much... We could try to trace the CLIENT TX module:

tiian@ubuntu:~$ export LIXA_TRACE_MASK=0x00001000
tiian@ubuntu:~$ /opt/lixa/bin/lixat
2011-12-04 12:19:49.235007 [8899/3073612464] lixa_tx_open
2011-12-04 12:19:49.241544 [8899/3073612464] lixa_tx_open/TX_*=-7/excp=5/ret_cod=-138/errno=111
tx_open(): -7
	

ret_cod=-138 can be inspected inside src/common/lixa_errors.h:

[...]
/**
 * "connect" function error
 */
#define LIXA_RC_CONNECT_ERROR                  -138
[...]
	

now we know the errors happens using connect() function; errno=111 can be inspected inside /usr/include/asm-generic/errno.h:

[...]
#define ECONNREFUSED    111     /* Connection refused */
[...]
	

The error happens because the LIXA state server (lixad) is not running, but we can improve our diagnosis with a more detailed tracing adding CLIENT CONN module:

tiian@ubuntu:~$ export LIXA_TRACE_MASK=0x00005000
tiian@ubuntu:~$ /opt/lixa/bin/lixat
2011-12-04 17:00:49.447866 [4514/3074067120] lixa_tx_open
2011-12-04 17:00:49.452624 [4514/3074067120] client_connect
2011-12-04 17:00:49.452678 [4514/3074067120] client_connect: connecting socket 4 to server '127.0.0.1' port 2345
2011-12-04 17:00:49.454033 [4514/3074067120] client_connect/excp=2/ret_cod=-138/errno=111
2011-12-04 17:00:49.454872 [4514/3074067120] lixa_tx_open/TX_*=-7/excp=5/ret_cod=-138/errno=111
tx_open(): -7
	

the client is not able to contact the state server that is configured to listen port 2345 at address 127.0.0.1

Activating trace for lixad in daemon mode

When running the state server (lixad) in daemon mode, you must explicitly ask lixad to use a trace file. Try the following steps...

tiian@ubuntu:~$ sudo su - lixa
lixa@ubuntu:~$ export LIXA_TRACE_MASK=0x00000005
lixa@ubuntu:~$ /opt/lixa/sbin/lixad -d
2011-12-04 17:14:46.976067 [4888/3074164464] lixad/daemonize: fork()
2011-12-04 17:14:46.978399 [4889/3074164464] lixad/daemonize: setsid()
2011-12-04 17:14:46.979740 [4889/3074164464] lixad/daemonize: signal()
2011-12-04 17:14:46.980513 [4889/3074164464] lixad/daemonize: fork()
2011-12-04 17:14:46.982516 [4890/3074164464] lixad/daemonize: chdir()
2011-12-04 17:14:46.984002 [4890/3074164464] lixad/daemonize: umask()
	

the process is tracing the initial operations and after the daemonization it does not produce any more messages. By default stderr is used as the standard tracing file, but the daemon disconnects from the terminal and closes the stderr file. To avoid this issue you can specify a specific trace file name using the -t (--trace-file) dedicated option:

lixa@ubuntu:~$ /opt/lixa/sbin/lixad -d -t /tmp/lixad.trace
lixa@ubuntu:~$ ls -la /tmp/lixad.trace
-rw-r--r-- 1 lixa lixa 349 2011-12-04 17:29 /tmp/lixad.trace
lixa@ubuntu:~$ pkill lixad
	

inspecting the content of /tmp/lixad.trace you can now find messages related to the listener module too:

2011-12-04 17:29:12.435474 [5187/3074373360] lixad/daemonize: fork()
2011-12-04 17:29:12.436593 [5188/3074373360] lixad/daemonize: setsid()
2011-12-04 17:29:12.437187 [5188/3074373360] lixad/daemonize: signal()
2011-12-04 17:29:12.437245 [5188/3074373360] lixad/daemonize: fork()
2011-12-04 17:29:12.435474 [5187/3074373360] lixad/daemonize: fork()
2011-12-04 17:29:12.439878 [5189/3074373360] lixad/daemonize: now daemonized!
2011-12-04 17:29:12.441874 [5189/3074373360] lixad/main: starting
2011-12-04 17:29:12.447862 [5189/3074373360] server_listener
2011-12-04 17:29:12.447940 [5189/3074373360] server_listener: resolving address '127.0.0.1' for listener # 0
2011-12-04 17:29:12.448001 [5189/3074373360] server_listener: creating socket for listener # 0
2011-12-04 17:29:12.448033 [5189/3074373360] server_listener: socket for listener 0 is 10
2011-12-04 17:29:12.448044 [5189/3074373360] server_listener: setting SO_REUSE option for listener # 0
2011-12-04 17:29:12.448142 [5189/3074373360] server_listener: resolving address '0.0.0.0' for listener # 1
2011-12-04 17:29:12.448156 [5189/3074373360] server_listener: creating socket for listener # 1
2011-12-04 17:29:12.448174 [5189/3074373360] server_listener: socket for listener 1 is 11
2011-12-04 17:29:12.448183 [5189/3074373360] server_listener: setting SO_REUSE option for listener # 1
2011-12-04 17:29:12.448213 [5189/3074373360] server_listener_signal
2011-12-04 17:29:12.448262 [5189/3074373360] server_listener_signal/excp=3/ret_cod=0/errno=0
2011-12-04 17:29:12.448275 [5189/3074373360] server_listener_loop
2011-12-04 17:29:12.448284 [5189/3074373360] server_listener_loop: entering poll...
2011-12-04 17:29:23.499666 [5189/3074373360] server_listener_signal_action: signo=15
2011-12-04 17:29:23.499850 [5189/3074373360] server_listener_signal_action: sending message to thread id 0
2011-12-04 17:29:23.499877 [5189/3074373360] server_listener_signal_action: sending message to thread id 1
2011-12-04 17:29:23.499904 [5189/3074373360] server_listener_signal_action: sending message to thread id 2
2011-12-04 17:29:23.499942 [5189/3074373360] server_listener_signal_action: sending message to thread id 3
2011-12-04 17:29:23.499991 [5189/3074373360] server_listener_loop: ready file descriptors = 1
2011-12-04 17:29:23.500003 [5189/3074373360] server_listener_loop: slot=0, fd=2, POLLIN=1, POLLERR=0, POLLHUP=0, POLLNVAL=0
2011-12-04 17:29:23.500025 [5189/3074373360] server_listener_loop: waiting thread (1) id 3074329488 termination...
2011-12-04 17:29:23.521138 [5189/3074373360] server_listener_loop: waiting thread (2) id 3065920400 termination...
2011-12-04 17:29:23.521169 [5189/3074373360] server_listener_loop: waiting thread (3) id 3057511312 termination...
2011-12-04 17:29:23.521180 [5189/3074373360] server_listener_loop/excp=8/ret_cod=0/errno=4
2011-12-04 17:29:23.521190 [5189/3074373360] server_listener/excp=8/ret_cod=0/errno=4
2011-12-04 17:29:23.521558 [5189/3074373360] lixad/main: exiting
	

Note

The trace file is written using the stdio and the output is not flushed after every message: if you look at the trace file of a running lixad state server, you will not be able to read the last messages due to the buffering of stdio library. After the server termination you are sure every trace message is inside the trace file.

Redirecting the trace messages

In the previous section you discovered how you can send the trace messages of lixad. The lixac library sends the trace messages to the stderr file associated to the terminal of the user that's running the process; many times you would send the messages to a specific disk hosted file. You can obtain this behavior using redirection:

tiian@ubuntu:~$ export LIXA_TRACE_MASK=0x00005000
tiian@ubuntu:~$ /opt/lixa/bin/lixat 2>/tmp/lixat.trace
tx_open(): -7
tiian@ubuntu:~$ ls -la /tmp/lixat.trace
-rw-r--r-- 1 tiian tiian 417 2011-12-04 17:43 /tmp/lixat.trace
	

You can inspect the content of the file /tmp/lixat.trace:

2011-12-04 17:43:36.078314 [5544/3074013872] lixa_tx_open
2011-12-04 17:43:36.083822 [5544/3074013872] client_connect
2011-12-04 17:43:36.083890 [5544/3074013872] client_connect: connecting socket 4 to server '127.0.0.1' port 2345
2011-12-04 17:43:36.084862 [5544/3074013872] client_connect/excp=2/ret_cod=-138/errno=111
2011-12-04 17:43:36.084906 [5544/3074013872] lixa_tx_open/TX_*=-7/excp=5/ret_cod=-138/errno=111
	

Catching the last trace messages

Under some conditions, the last trace messages produced by the daemon lixad are not captured in the event of a crash. This can increase the difficulty of problem determination. To avoid the buffering of trace messages for lixad, use --enable-debug on ./configure command line:

./configure --enable-debug