Logging levels demystified

If you are building a system that’s going to be used in a production environment, having the right logs becomes crucial. Logs are necessary not only to identify where errors occurred, but also to identify the thread, the time-stamp and the path that the code executed.

More often than not, developers treat logs as second class citizens i.e. they exist in all systems but not much thought is put into them. There are a number of things that needs to be planned for logging, a few are listed below

Where should the log files exist?
At what rate should they be recycled?
How do log files get collected?
What is the impact of excessive logging on the performance of the system?
How often does the code log? What information is logged?
What type of information needs to go into the logs?

Today, we discuss about the very last point. Specifically, we will discuss about what type of information needs to go into the logs and at what logging levels?

Logging levels

Most of the modern logging systems that are used today provide the following levels of logs: TRACE DEBUG INFO WARN ERROR

When a particular level is enabled, the levels higher than it are automatically included. For e.g., by enabling DEBUG level, all of INFO, WARN and ERRORS are automatically logged, but not TRACE. That way we can control the amount of logs that get written into the log files. This is especially important since excessive logging on the performance of the system.

Without further adieu, we list each of the log levels, when to use each of the levels and what type of information needs to go into each of the levels.

Error

An unexpected issue occurred
The system is in an unrecoverable error condition, the end users are impacted.
An important task could not be completed.
A logical component of the system essential for the business failed.
In most cases, human intervention is needed to solve the issue.
A critical error outside the scope of the application has occurred. For e.g. if the system is using a payment gateway, any failures in the payment gateway should be logged as an Error.
Should be compulsorily ON in production system.

In some logging frameworks, there is yet another level named FATAL, more severe than Error. The difference between them is that ERROR is used to log issues or faults within the application logic and FATAL is used to log issues that prevents the application from running correctly.

Warn

An unexpected issue occurred
A problem occurred, but not as critical a failure as covered by “Error”
The system can still continue to operate, although may not be at the most optimal
If not looked into soon enough, it could result in “Error”
Should be compulsorily ON in production system.

Info

Something expected and significant happened and it is not an error.
Should indicate the flow of the application through the most significant components
Info should be he most commonly used logging level in the application.
It captures information about significant run-time events.
Some of these events are the likes of

start/stop of containers
user logging in
significant use-cases executing
code entering an architectural boundary(such as the UI layer, services layer)

Optionally ON in production system. Our recommendation is that you turn on log level INFO for your production environment.
Remember to keep the volume of INFO logs low, else you could end up in a situation of too many logs.
It is good to remember that in many organizations, the system is monitored by customer support staff and not developers. Hence, it is important that the Info, Warn and Error logs are also meaningful and understandable even to a non-developer.

Debug

Used mainly in development. Should only be turned on in production for a short duration to capture detailed information.
More detailed than an Info level log.
As the name indicates, it is used for debugging the code.
Information in this log could include things such entry and exit from methods, results of major processing executing within the application etc.

Trace

Used mainly in development. Should only be turned on in production for a short duration to capture detailed information.
Should only be used in case the calculations are extremely complex and diving into low-level details is necessary.
Could include object value dumps, calculations and iterations in a loop etc.
Used rarely