This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Generators with lru_cache can be non-intuituve
Type: behavior Stage: resolved
Components: Versions:
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: exhuma, rhettinger, serhiy.storchaka
Priority: normal Keywords:

Created on 2018-06-11 07:16 by exhuma, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (3)
msg319279 - (view) Author: Michel Albert (exhuma) * Date: 2018-06-11 07:16
Consider the following code:

    # filename: foo.py

    from functools import lru_cache


    @lru_cache(10)
    def bar():
        yield 10
        yield 20
        yield 30


    # This loop will work as expected
    for row in bar():
        print(row)

    # This loop will not loop over anything.
    # The cache will return an already consumed generator.
    for row in bar():
        print(row)


This behaviour is natural, but it is almost invisible to the caller of "foo".

The main issue is one of "surprise". When inspecting the output of "foo" it is clear that the output is a generator:

    >>> import foo
    >>> foo.bar()
    <generator object bar at 0x7fbfecb66a40>

**Very** careful inspection will reveal that each call will return the same generator instance.

So to an observant user the following is an expected behaviour:

    >>> result = foo.bar()
    >>> for row in result:
    ...    print(row)
    ...
    10
    20
    30
    >>> for row in result:
    ...     print(row)
    ...
    >>>

However, the following is not:

    >>> import foo
    >>> result = foo.bar()
    >>> for row in result:
    ...     print(row)
    ...
    10
    20
    30
    >>> result = foo.bar()
    >>> for row in result:
    ...     print(row)
    ...
    >>>


Would it make sense to emit a warning (or even raise an exception) in `lru_cache` if the return value of the cached function is a generator?

I can think of situation where it makes sense to combine the two. For example the situation I am currently in:

I have a piece of code which loops several times over the same SNMP table. Having a generator makes the application far more responsive. And having the cache makes it even faster on subsequent calls. But the gain I get from the cache is bigger than the gain from the generator. So I would be okay with converting the result to a list before storing it in the cache.

What is your opinion on this issue? Would it make sense to add a warning?
msg319281 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-06-11 08:02
No, this will break cases when you need to cache generators.

There are many ways of using lru_cache improperly, and we can't distinguish incorrect uses from intentional correct uses.
msg319363 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-06-12 05:25
Serhiy is correct.  In general, there is no way to detect when someone is caching something that should be cached (i.e. impure functions).
History
Date User Action Args
2022-04-11 14:59:01adminsetgithub: 78008
2018-06-12 05:25:46rhettingersetstatus: open -> closed
resolution: not a bug
messages: + msg319363

stage: resolved
2018-06-11 08:02:25serhiy.storchakasetassignee: rhettinger

messages: + msg319281
nosy: + rhettinger, serhiy.storchaka
2018-06-11 07:16:13exhumacreate