classification
Title: [logging] add JSON log formatter
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.8
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: brett.cannon, steveha, vinay.sajip, wsanchez
Priority: normal Keywords:

Created on 2018-05-14 22:20 by steveha, last changed 2018-05-25 22:07 by wsanchez. This issue is now closed.

Messages (7)
msg316597 - (view) Author: Steve R. Hastings (steveha) * Date: 2018-05-14 22:20
Many logging aggregation and indexing systems require that log data must be written in an easy-to-parse format such as JSON.  A "batteries included" JSON log formatter would be a useful enhancement.

I have written code with the following features:

Log events are translated into JSON.  The JSON is formatted without indentation, so that one log event will produce one line in the output file.  Each line is a single JSON object.  Because log files contain many lines and JSON logging is usually not read by humans, the JSON separator punctuation are set to simply ':' and ',' (no leading or trailing spaces).

The standard fields of a log event turn into key/value pairs in the JSON object, using the standard field name as the output JSON key name.  For example, a log statement like this: log.info("foo initialized") would result in JSON that included: "message": "foo initialized", "levelname": "INFO"

By default, the JSON output has the timestamp as the first key/value pair; therefore if the file is sorted, or multiple log files are concatenated and then sorted, the logged events will be sorted into elapsed order.  By default, the timestamp format is RFC3339: https://tools.ietf.org/html/rfc3339  Also, by default, the seconds field will have a fractional part with three digits (i.e. millisecond resolution on log event timestamps).

The user can provide a dictionary that will translate key names.  If the user's log collection system requires the logging level to have a key name of "priority", then by specifying that "levelname" should be mapped to "priority" the log event will be output with "priority": "INFO"  This dictionary also provides a way to control which logging event fields are written to the logfile and in what order.

Taking advantage of a feature already present in the Python logging code: The user can provide a dictionary as the only argument after the logging message, and it will be handled specially.  For the JSON formatter, the special handling is that the key/value pairs from the dictionary will be included in the JSON output.  Alternatively, the user can provide a message string that includes % templating codes, and specify arguments to be templated in, as is common practice in logging.  (This is to be backwards-compatible with code written to the current formatters.)

The Python logging code feature mentioned above can be found in LogRecord.__init__()  The first thing that init code does is check whether there is exactly one argument and it is a non-empty Mapping.

In the case where there are multiple arguments provided and templated into the message string, one extra key/value pair is written in the JSON object: the key is "args" and the value is a JSON array with the values of the arguments.

Because logging is fundamental, if anything goes wrong it's important to log as much as possible.  If any exception is raised during the call to json.dumps() that creates an output line, there is fallback code that converts the arguments dictionary to a string and logs that string under an identifiable key name such as "args_text"  So if someone attempts to log arguments that json.dumps() can't handle, something will be written to the logfile even if it's just something like "<__main__.Foo object at 0x7f8fe7621da0>"


Here is a concrete example.  The JSON logging formatter is initialized with the following field name translation table:

TRANSLATE_FIELD_NAMES = {
    "asctime": "@timestamp",
    "levelname": "priority",
    "message": None,
    "process": "pid",
}

The above dictionary means that asctime, levelname, message, and process will be written to the output file, in that order.  The output JSON object will have key names of, respectively: "@timestamp", priority, message (no change), and pid.

Then the following logging call occurs:

 log.info("connecting to server", {"url": server_url, "port": sever_port})



The output written to the log file would be similar to this:

{"@timestamp":"2018-05-14T17:28:04.112-04:00","priority":"INFO","message":"connecting to server","pid":32073,"url":"http://127.0.0.1","port":8043}


As noted above, I have already written this code, and it has been in production for months at the company where I work.  However, the code can be simplified and cleaned up a bit, since it contains Python 2.7 backward compatibility code.
msg316603 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2018-05-14 22:35
> A "batteries included" JSON log formatter would be a useful enhancement.

See

https://docs.python.org/3/howto/logging-cookbook.html#implementing-structured-logging

I don't propose to implement your suggestion.

You've now logged three enhancement requests where there are recommended mechanisms in the cookbook or reference documentation for dealing with the situations you describe.

I'm not really minded to implement any of these proposed enhancements, as I think the approaches already described in the documentation are adequate.
msg316614 - (view) Author: Steve R. Hastings (steveha) * Date: 2018-05-15 02:57
The solution recommended in the logging cookbook does not do the same thing as I have proposed.

I have proposed a logging handler that emits JSON suitable for ingestion by a log collection system.  With the JSON formatter, all log events, no matter their source, are written completely in JSON.  This includes log events from already-written library functions that have no expectation of JSON output.  Also the JSON formatter writes timestamps in the now-universally-understood RFC3339 format, the easiest date format to parse.

The logging cookbook solution is that the user must explicitly write JSON data into the message line of each call to the logging code.  This would make some or all of the message field into JSON but does not solve the problem of logging in JSON; the other fields of the log file would remain non-JSON.

A company that is using a log collection system might need the logs to be pure JSON.  The proposed JSON formatter solves that problem.

P.S. Here is a blog posting on why it is useful to log in pure JSON:

https://journal.paul.querna.org/articles/2011/12/26/log-for-machines-in-json/
msg316631 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2018-05-15 07:47
> The solution recommended in the logging cookbook does not do the same thing as I have proposed.

Perhaps not, but it appears to adequately address your first point:

> Many logging aggregation and indexing systems require that log data must be written in an easy-to-parse format such as JSON.


> The logging cookbook solution is that the user must explicitly write JSON data into the message line of each call to the logging code.

No, it just illustrates one simple way it could be done.

> P.S. Here is a blog posting on why it is useful to log in pure JSON

I'm already aware of situations where machine-parseable logs are valuable, thanks.

There's nothing stopping you from publishing any handlers/formatters you think are generally useful on PyPI, and publicising them so that people who want to use them can do so. There's no reason for them to be in the stdlib, as PyPI offers a better home for more specialised functionality than the stdlib.
msg316688 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2018-05-15 17:51
I just wanted to say thanks to Steve for at least taking the time to open the issues to ask if we would consider accepting the proposed changes.
msg316698 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2018-05-15 19:08
> I just wanted to say thanks to Steve for at least taking the time to open the issues to ask if we would consider accepting the proposed changes.

Sure, and I probably come across as somewhat tetchy in my responses. That's just down to a shortage of time at the minute, and not in general because contributions to logging aren't welcome. They are (in general), as evidenced by numerous PRs which have been accepted.

I would say that for the most part, new handlers, formatters etc. ought to be published on PyPI so that they gain wide exposure. The logging package is reasonably "batteries included" already as regards handlers, etc. and new ones should only be added if they can be shown to have really wide applicability.
msg317720 - (view) Author: Wilfredo Sanchez (wsanchez) Date: 2018-05-25 22:07
Please consider using RFC 7464 (JSON Text Sequences).

This allows detection of truncated JSON events without losing the following event.  It allows newlines in the JSON data.

All this involves is adding an RS between events.
History
Date User Action Args
2018-05-25 22:07:40wsanchezsetnosy: + wsanchez
messages: + msg317720
2018-05-15 19:08:01vinay.sajipsetmessages: + msg316698
2018-05-15 17:51:23brett.cannonsetnosy: + brett.cannon
messages: + msg316688
2018-05-15 07:48:53vinay.sajipsetstatus: open -> closed
resolution: rejected
stage: resolved
2018-05-15 07:47:29vinay.sajipsetmessages: + msg316631
2018-05-15 02:57:47stevehasetmessages: + msg316614
2018-05-14 22:35:49vinay.sajipsetmessages: + msg316603
2018-05-14 22:20:20stevehacreate