This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ariebovenberg
Recipients ariebovenberg, docs@python
Date 2021-12-30.07:44:14
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1640850255.25.0.496981368409.issue46200@roundup.psfhosted.org>
In-reply-to
Content
(I've already submitted this issue to security@python.org, who directed me to propose a documentation change here)

Logging f-strings is quite popular, because the `%` style of logging is often considered ugly.
However, logging preformatted strings comes with security risks which should explicitly be mentioned in the logging documentation.

The following example illustrates the problem:

    logger.info('look: %s', untrusted_string)                # OK
    logger.info('look: %(foo)s', {'foo', untrusted_string})  # OK
    logger.info(f'look: {untrusted_string}')                 # OK (security-wise)
    logger.info(f'look: {untrusted_string}', some_dict)      # DANGER!

The problem is that `untrusted_string` will be interpreted as a string template by the logger. 
If `untrusted_string` has the value `%(foo)999999999s` (where foo is present in `some_dict`), 
logging gets stuck trying to add over a gigabyte of whitespace to a string. 
In other words: a Denial-of-Service attack.

Of course, this combination of f-string and logger arguments is unlikely. But it is plausible that:
- After a refactoring to f-string, removing the dict argument is forgotten;
- A developer adding a variable to a message prefers using f-strings and thus creates a template mixing the two styles (e.g. `{foo} and %(bar)s`);
- The string is passed through a logging Filter or other function which adds a dict argument.

Open questions:
1. Do we want to simply highlight the risks, or actively discourage logging f-strings?

   My thoughts:
   I feel it's important enough to actively discourage logging preformatted strings (i.e. also .format and manual %-formatting).
   A strong recommendation will allow security linters to flag this.
   Additionally, there are other advantages to the formatting built into `logging` (e.g. performance).

2. Should we type annotate logger `msg` as `Literal[str]` following PEP675?

   My thoughts:
   Annotating `msg` as a literal string will greatly help enforce this best practice.
   The adoption of typecheckers exceeds that of security linters.

3. Where should this risk be documented?
  a. In the `logger.debug` function docs
  b. In the `logging` docs introduction (like xml.etree.elementtree)
  c. In a new "security" section of the `logging` docs
  d. An informational PEP on logging security considerations (similar to PEP672)

  My thoughts:
  I fear option (a) would not get the attention of most readers.
  Option (c) or (d) may be useful to also highlight general risks 
  of logging (i.e. log injection)

As soon as there is agreement on these questions, I would be happy to submit a PR.
History
Date User Action Args
2021-12-30 07:44:15ariebovenbergsetrecipients: + ariebovenberg, docs@python
2021-12-30 07:44:15ariebovenbergsetmessageid: <1640850255.25.0.496981368409.issue46200@roundup.psfhosted.org>
2021-12-30 07:44:15ariebovenberglinkissue46200 messages
2021-12-30 07:44:14ariebovenbergcreate