This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Discourage logging f-strings due to security considerations
Type: security Stage:
Components: Documentation, Library (Lib) Versions: Python 3.11, Python 3.10, Python 3.9, Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: ariebovenberg, docs@python, eric.smith, gregory.p.smith, rhettinger, tinchester, vinay.sajip
Priority: normal Keywords:

Created on 2021-12-30 07:44 by ariebovenberg, last changed 2022-04-11 14:59 by admin.

Messages (13)
msg409352 - (view) Author: Arie Bovenberg (ariebovenberg) * Date: 2021-12-30 07:44
(I've already submitted this issue to security@python.org, who directed me to propose a documentation change here)

Logging f-strings is quite popular, because the `%` style of logging is often considered ugly.
However, logging preformatted strings comes with security risks which should explicitly be mentioned in the logging documentation.

The following example illustrates the problem:

    logger.info('look: %s', untrusted_string)                # OK
    logger.info('look: %(foo)s', {'foo', untrusted_string})  # OK
    logger.info(f'look: {untrusted_string}')                 # OK (security-wise)
    logger.info(f'look: {untrusted_string}', some_dict)      # DANGER!

The problem is that `untrusted_string` will be interpreted as a string template by the logger. 
If `untrusted_string` has the value `%(foo)999999999s` (where foo is present in `some_dict`), 
logging gets stuck trying to add over a gigabyte of whitespace to a string. 
In other words: a Denial-of-Service attack.

Of course, this combination of f-string and logger arguments is unlikely. But it is plausible that:
- After a refactoring to f-string, removing the dict argument is forgotten;
- A developer adding a variable to a message prefers using f-strings and thus creates a template mixing the two styles (e.g. `{foo} and %(bar)s`);
- The string is passed through a logging Filter or other function which adds a dict argument.

Open questions:
1. Do we want to simply highlight the risks, or actively discourage logging f-strings?

   My thoughts:
   I feel it's important enough to actively discourage logging preformatted strings (i.e. also .format and manual %-formatting).
   A strong recommendation will allow security linters to flag this.
   Additionally, there are other advantages to the formatting built into `logging` (e.g. performance).

2. Should we type annotate logger `msg` as `Literal[str]` following PEP675?

   My thoughts:
   Annotating `msg` as a literal string will greatly help enforce this best practice.
   The adoption of typecheckers exceeds that of security linters.

3. Where should this risk be documented?
  a. In the `logger.debug` function docs
  b. In the `logging` docs introduction (like xml.etree.elementtree)
  c. In a new "security" section of the `logging` docs
  d. An informational PEP on logging security considerations (similar to PEP672)

  My thoughts:
  I fear option (a) would not get the attention of most readers.
  Option (c) or (d) may be useful to also highlight general risks 
  of logging (i.e. log injection)

As soon as there is agreement on these questions, I would be happy to submit a PR.
msg409416 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2021-12-31 11:55
Another (minor) point against using f-strings or .format is that formatting prematurely might be doing unnecessary work - by default, logging formats messages lazily, waiting until a message actually needs to be output. This could perhaps be more prominently emphasized in the docs where formatting is first introduced.
msg409418 - (view) Author: Arie Bovenberg (ariebovenberg) * Date: 2021-12-31 13:07
Absolutely agree! In practice I find some people are not swayed by this argument -- and prefer the readability of f-strings. 

My expectation is that a clear recommendation in the official docs will convince more people. Especially if there are security concerns.

Do you have an opinion on the 3 questions I highlighted?
msg409495 - (view) Author: Arie Bovenberg (ariebovenberg) * Date: 2022-01-02 12:37
I've done some more digging, and have read the related issue bpo-30995.

There are IMHO two challenges that are worth tackling:
1. A great number[1] of developers do their own string formatting.
   This is sub-optimal for performance and - in rare cases - 
   leaves them potentially vulnerable to attack 
   (DoS through a string formatting attack).
   We cannot prohibit this without breaking backwards compatibility:
   https://docs.python.org/3/howto/logging.html#using-arbitrary-objects-as-messages. 
2. The logging cookbook has several mechanisms for new-style formatting, 
   but because these do not ship with the stdlib, 
   the vast majority[2] of programmers don't use them.

We can solve both issues with a new family of `Logger` methods.
The debug one shown below:

    def debugf(self, msg: Literal[str], *args, exc_info=False, **kwargs):
    """
    Log 'msg.format(*args, **kwargs)' with severity 'DEBUG'.

    To pass exception information, use the keyword argument exc_info with
    a true value, e.g.

    logger.debugf("Houston, we have a {}", "thorny problem", exc_info=1)
    """
    if self.isEnabledFor(DEBUG):
        # `BraceMessage` from the logging cookbook.
        self._log(DEBUG, BraceMessage(msg, args, kwargs), exc_info=exc_info)

Advantages of this approach:
- It's mostly the existing work of @vinay.sajip. It's Idiomatic and proven to work.
- No change to existing methods or loggers. Fully backwards compatible.
- Protection against string formatting attack at the type level (thanks to PEP675).
  Everyone with an IDE will get warned about unsafe use.
- Embed correct use of log formatting at type level.
- People will stop complaining about the `%`-style,
  now that there is an easy way to use new-style formatting in the stdlib. 

The disadvantage of course is an extra set of logging methods. However:
- An `f` suffix for string formatting operations is common (e.g. `printf`)
- It's visually similar to f-strings, thus straightforward to comprehend.
  Compare `debug(f"hello ...` and `debugf("hello ...`.
- It's explicit. The method logs _and_ formats.

Sources:
[1] On GitHub code search, `logger` gets 8M results. 
    `logger` `format` gets almost 4M results, of which about half
    are actually instances of logging `.format()` strings. 
    f-strings are not even included in this search, and probably used
    very frequently as well.
    (https://github.com/search?l=Python&p=2&q=logger+format&type=Code)

[2] There are <1000 instances of `StyleAdapter`, `BraceMessage` in GitHub code search,
    while `getLogger` gets almost 5M results.
    (https://github.com/search?l=Python&q=bracemessage&type=Code)
    (https://github.com/search?l=Python&q=styleadapter&type=Code)
    (https://github.com/search?l=Python&q=getlogger&type=Code)
msg409499 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2022-01-02 15:21
I think there's definitely room for improvement here, and at face value I like the debugf() functions. __format__ style formatting solves a big problem for me: being able to provide format strings for timestamps.

One thing: I'd be careful about saying that delayed formatting is always a performance win. f-strings are so much faster, that if I'm reasonable sure a message is always going to get logged, I'll use an f-string. In my case, I never log any user-supplied data, although it's obviously a risk that should be warned against.
msg409505 - (view) Author: Arie Bovenberg (ariebovenberg) * Date: 2022-01-02 16:58
Indeed the `__format__` style offers a lot more options (see https://pyformat.info/).

Regarding performance you make an interesting point.

One possible solution is to allow f-strings _only_ if there are no args/kwargs.
In that one case formatting would explicitly be skipped. 
This could be represented at type-level with an overload:

    # case: Logger does formatting. Only literal string template allowed.
    @overload
    def debugf(msg: Literal[str], *args, exc_info: bool = False, **kwargs):
        # wrap

    # case: User does formatting. No args/kwargs allowed.
    @overload
    def debugf(msg: str, *, exc_info: bool = False):
        # wrap `msg` in something that will not allow further formatting.
msg412357 - (view) Author: Tin Tvrtković (tinchester) * Date: 2022-02-02 13:45
Eric is absolutely right, due to function calls being somewhat slow in Python the performance argument in practice falls in favor of f-strings.

So if they're faster, more readable, and more convenient to write, no wonder people prefer them (including me).

The overload approach is intriguing.
msg412473 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2022-02-03 21:36
> Eric is absolutely right, due to function calls being 
> somewhat slow in Python the performance argument in
> practice falls in favor of f-strings.

Also f-strings can evaluate expressions in the template which is also a big win:

   f('Pending {len(req)} requests: {req[0]!r} ... {req[-1]!r}')
msg412474 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2022-02-03 21:40
In a favor of deferred substitution, the cookbook should have a recipe where substituted messages are logged to a file and the unsubstituted message stored in SQLite3 database with the parameters stored as JSON.    This gives both human readable output and queryable, preparsed data for automated analysis.
msg412496 - (view) Author: Arie Bovenberg (ariebovenberg) * Date: 2022-02-04 07:23
@rhettinger @tinchester I definitely see now that f-strings should have a place in logging. 
But do you agree that f-strings don't mix 100% safely with the current logger API?
What are your thoughts on a safer set of logger functions (see my comments above, https://bugs.python.org/issue46200#msg409505)

Just throwing an additional alternative out there: instead of the @overload approach, one could have 2 logger function families:

debugs(s: str)                          # s-prefix indicates simply logging a string (no formatting done by logger!)
debugf(s: LiteralStr, *args, **kwargs)  # f prefix indicates logger does the formatting. (no formatting done by user!)

@vinay.sajip what are your thoughts on the discussion above?
msg412504 - (view) Author: Tin Tvrtković (tinchester) * Date: 2022-02-04 11:24
I mean, I agree with your point about being able to accidentally format twice when using the standard library logger.

I'm not a core dev but I think getting new APIs in will be challenging. And if by some fluke of chance you did get the approval to introduce new logging APIs, there are better APIs out there (look at the structlog library). Also the old APIs would have to stay for a *long* time since everybody's using them.

Adding clever typing overloads seems actually realistic and useful to me. But that depends on the LiteralString PEP being accepted and people actually running MyPy. That's probably the best we can do, apart from some documentation.
msg413519 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2022-02-19 00:28
A new system of logging APIs has been on several of our (core dev and otherwise) minds ever since f-strings were introduced.  For this specific issue, agreed that documentation is key.

The old logging APIs cannot change.  And practically preventing passing non-literal constant values in as the first parameter within their implementation while there are theoretical ways to _maybe_ do that, I expect none would be maintainable or performant.

For https://www.python.org/dev/peps/pep-0675/#logging-format-string-injection, my gut feeling is you want logging functions annotated as is a duality using overload.

```
@overload
def info(str) -> None:
    ...

@overload
def info(msg: LiteralString, *args: object) -> None:
```

As that prevents the untrusted template in msg so you could use a lone f-str.

This larger conversation is better turned into a python-dev or discuss.python.org thread than done in our poor bug tracker as anything more than the doc update and musing around the annotations turns into an evolved logging system design discussion.  (a nice thread internally at work just restarted about this and linked here... people want f-logging!)
msg413547 - (view) Author: Arie Bovenberg (ariebovenberg) * Date: 2022-02-19 12:27
Thanks @gregory.p.smith! I didn't know about discuss.python.org. I created a new topic there: https://discuss.python.org/t/safer-logging-methods-for-f-strings-and-new-style-formatting/13802
History
Date User Action Args
2022-04-11 14:59:54adminsetgithub: 90358
2022-02-19 12:27:19ariebovenbergsetmessages: + msg413547
2022-02-19 00:28:52gregory.p.smithsetnosy: + gregory.p.smith
messages: + msg413519
2022-02-04 11:24:50tinchestersetmessages: + msg412504
2022-02-04 10:12:15erlendaaslandsetnosy: - erlendaasland
2022-02-04 07:23:18ariebovenbergsetmessages: + msg412496
2022-02-03 21:40:34rhettingersetmessages: + msg412474
2022-02-03 21:36:10rhettingersetnosy: + rhettinger
messages: + msg412473
2022-02-02 13:45:42tinchestersetnosy: + tinchester
messages: + msg412357
2022-01-02 16:58:01ariebovenbergsetmessages: + msg409505
2022-01-02 15:21:49eric.smithsetmessages: + msg409499
2022-01-02 12:53:38erlendaaslandsetnosy: + erlendaasland
2022-01-02 12:37:04ariebovenbergsetmessages: + msg409495
components: + Library (Lib)
2021-12-31 13:07:27ariebovenbergsetmessages: + msg409418
2021-12-31 11:55:55vinay.sajipsetmessages: + msg409416
2021-12-30 08:25:05eric.smithsetnosy: + eric.smith
2021-12-30 07:57:07serhiy.storchakasetnosy: + vinay.sajip
2021-12-30 07:44:15ariebovenbergcreate