Message 338797 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	cjrh
Recipients	THRlWiTi, cheryl.sabella, cjrh, dmoore, ezio.melotti, georg.brandl, giampaolo.rodola, martin.panter, ncoghlan, r.david.murray, rbcollins, rhettinger, steven.daprano, tim.peters, xgdomingo
Date	2019-03-25.13:46:36
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1553521596.87.0.600186553687.issue19495@roundup.psfhosted.org>
In-reply-to

Content
Somehow I missed that there's been an open issue on this. Like others I've written a bunch of different incarnations of an "elapsed" context manager over the years. Always for the more crude "how long did this take" reason like David mentioned, never the microbenchmarks scenario that timeit serves. The work is never quite substantial enough to make a Pypi package of it, but always annoying to have to do again the next time the need arises. The overall least fiddly scheme I've found is to just use a callback. It's the simplest option: @contextmanager def elapsed(cb: Callable[[float, float], None], counter=time.perf_counter): t0 = counter() try: yield finally: t1 = counter() cb(t0, t1) The simple case, the one most people want when they just want a quick check of elapsed time on a chunk of code, is then quite easy: with elapsed(lambda t0, t1: print(f'read_excel: {t1 - t0:.2f} s')): # very large spreadsheet df = pandas.read_excel(filename, dtype=str) (I rarely need to use a timer decorator for functions, because the profiler tracks function calls. But within the scope of a function it can sometimes be difficult to get timing information, particularly if the calls made there are into native extensions) One of the consequences of using a callback strategy is that an additional version might be required for async callbacks (I have used these in production also): @asynccontextmanager async def aioelapsed(acb: Callable[[float, float], Awaitable[None]], counter=time.perf_counter): t0 = counter() try: yield finally: t1 = counter() await acb(t0, t1) So, the interesting thing here is that there is a general form for which an "elapsed" function is just a special case: T = TypeVar('T') @contextmanager def sample_before_and_after(cb: Callable[[T, T], None], sample: Callable[[], T]): before = sample() try: yield finally: after = sample() cb(before, after) The version of "elapsed" given further above is just this with kwarg sample=time.perf_counter. So, it might be sufficient to cover the use-case of an "elapsed" context manager instead with something like the above instead, which is more general. However, I don't actually have any use cases for this more general thing above, other than "elapsed", but I thought it was interesting. Whether any of this merits being in the stdlib or not is hard to say. These code snippets are all short and easy to write. But I've written them multiple times to make "elapsed". --- Once the "elapsed" abstraction is available, other cool ideas become a little bit easier to think about. These would be things that are user code (not be in the stdlib), but which can make use of the "elapsed" cm; for example, a clever logger for slow code blocks (written a few of these too): @contextmanager def slow_code_logging(logger_name, msg, args, threshold_sec=1.0, kwargs): logger = logging.getLogger(logger_name) if logger.isEnabledFor(logging.INFO): def cb(t0: float, t1: float) -> None: dt = t1 - t0 if dt < threshold_sec: # Change the logger level depending on threshold logger.debug(msg, dt, args, *kwargs) else: logger.info(msg, dt, args, *kwargs) cm = elapsed(cb) else: # Logger is not even enabled, do nothing. cm = nullcontext() with cm: yield with slow_code_logging(__name__, 'Took longer to run than expected: %.4g s'): ... And a super-hacky timing histogram generator (which would be quite interesting to use for measuring latency in network calls, e.g. with asyncio code): @contextmanager def histobuilder(counter, bin_size): def cb(t0, t1): dt = t1 - t0 bucket = dt - dt % bin_size counter[bucket, bucket + bin_size] += 1 with elapsed(cb, counter=time.perf_counter_ns): yield counter = Counter() for i in range(100): with histobuilder(counter, bin_size=int(5e4)): # 50 us time.sleep(0.01) # 10 ms for (a, b), v in sorted(counter.items(), key=lambda _: _[0][0]): print(f'{a/1e6:6.2f} ms - {b/1e6:>6.2f} ms: {v:4} ' + '\u2588' v) output: 9.85 ms - 9.90 ms: 1 █ 9.90 ms - 9.95 ms: 10 ██████████ 9.95 ms - 10.00 ms: 17 █████████████████ 10.00 ms - 10.05 ms: 8 ████████ 10.05 ms - 10.10 ms: 12 ████████████ 10.10 ms - 10.15 ms: 5 █████ 10.15 ms - 10.20 ms: 4 ████ 10.20 ms - 10.25 ms: 4 ████ 10.25 ms - 10.30 ms: 6 ██████ 10.30 ms - 10.35 ms: 9 █████████ 10.35 ms - 10.40 ms: 3 ███ 10.40 ms - 10.45 ms: 5 █████ 10.45 ms - 10.50 ms: 12 ████████████ 10.50 ms - 10.55 ms: 3 ███ 10.55 ms - 10.60 ms: 1 █ Things like the histogram builder above become (for me) much simpler when "elapsed" is just a thing that already exists even though it's very simple on its own. If the right call is that we shouldn't add such a thing to the stdlib, should we instead just add an example of an elapsed context manager to the contextlib docs? Perhaps copy-paste from the docs is good enough for most users (like we do with the itertools recipes for example). I'd be willing to add something to the docs if it's decided that's the way to go. (On the off chance some version of "elapsed" goes into the stdlib, I think contextlib is right place rather than timeit, particularly because of how the "sample_before_and_after" pattern generalises)

Somehow I missed that there's been an open issue on this. Like others I've written a bunch of different incarnations of an "elapsed" context manager over the years.  Always for the more crude "how long did this take" reason like David mentioned, never the microbenchmarks scenario that timeit serves. The work is never quite substantial enough to make a Pypi package of it, but always annoying to have to do again the next time the need arises. 


The overall least fiddly scheme I've found is to just use a callback. It's the simplest option:


@contextmanager
def elapsed(cb: Callable[[float, float], None], counter=time.perf_counter):
    t0 = counter()
    try:
        yield
    finally:
        t1 = counter()
        cb(t0, t1)


The simple case, the one most people want when they just want a quick check of elapsed time on a chunk of code, is then quite easy:


with elapsed(lambda t0, t1: print(f'read_excel: {t1 - t0:.2f} s')):
    # very large spreadsheet
    df = pandas.read_excel(filename, dtype=str)


(I rarely need to use a timer decorator for functions, because the profiler tracks function calls. But within the scope of a function it can sometimes be difficult to get timing information, particularly if the calls made there are into native extensions)

One of the consequences of using a callback strategy is that an additional version might be required for async callbacks (I have used these in production also):


@asynccontextmanager
async def aioelapsed(acb: Callable[[float, float], Awaitable[None]],
                     counter=time.perf_counter):
    t0 = counter()
    try:
        yield
    finally:
        t1 = counter()
        await acb(t0, t1)


So, the interesting thing here is that there is a general form for which an "elapsed" function is just a special case:


T = TypeVar('T')


@contextmanager
def sample_before_and_after(cb: Callable[[T, T], None], sample: Callable[[], T]):
    before = sample()
    try:
        yield
    finally:
        after = sample()
        cb(before, after)


The version of "elapsed" given further above is just this with kwarg sample=time.perf_counter.  So, it might be sufficient to cover the use-case of an "elapsed" context manager instead with something like the above instead, which is more general. However, I don't actually have any use cases for this more general thing above, other than "elapsed", but I thought it was interesting.


Whether any of this merits being in the stdlib or not is hard to say. These code snippets are all short and easy to write. But I've written them multiple times to make "elapsed".


---


Once the "elapsed" abstraction is available, other cool ideas become a little bit easier to think about. These would be things that are user code (not be in the stdlib), but which can make use of the "elapsed" cm; for example, a clever logger for slow code blocks (written a few of these too):


@contextmanager
def slow_code_logging(logger_name, msg, *args, threshold_sec=1.0, **kwargs):
    logger = logging.getLogger(logger_name)

    if logger.isEnabledFor(logging.INFO):
        def cb(t0: float, t1: float) -> None:
            dt = t1 - t0
            if dt < threshold_sec:
                # Change the logger level depending on threshold
                logger.debug(msg, dt, *args, **kwargs)
            else:
                logger.info(msg, dt, *args, **kwargs)

        cm = elapsed(cb)
    else:
        # Logger is not even enabled, do nothing.
        cm = nullcontext()

    with cm:
        yield


with slow_code_logging(__name__, 'Took longer to run than expected: %.4g s'):
    ...


And a super-hacky timing histogram generator (which would be quite interesting to use for measuring latency in network calls, e.g. with asyncio code):


@contextmanager
def histobuilder(counter, bin_size):
    def cb(t0, t1):
        dt = t1 - t0
        bucket = dt - dt % bin_size
        counter[bucket, bucket + bin_size] += 1

    with elapsed(cb, counter=time.perf_counter_ns):
        yield

counter = Counter()

for i in range(100):
    with histobuilder(counter, bin_size=int(5e4)):  # 50 us
        time.sleep(0.01)  # 10 ms

for (a, b), v in sorted(counter.items(), key=lambda _: _[0][0]):
    print(f'{a/1e6:6.2f} ms - {b/1e6:>6.2f} ms: {v:4} ' + '\u2588' * v)


output:


  9.85 ms -   9.90 ms:    1 █
  9.90 ms -   9.95 ms:   10 ██████████
  9.95 ms -  10.00 ms:   17 █████████████████
 10.00 ms -  10.05 ms:    8 ████████
 10.05 ms -  10.10 ms:   12 ████████████
 10.10 ms -  10.15 ms:    5 █████
 10.15 ms -  10.20 ms:    4 ████
 10.20 ms -  10.25 ms:    4 ████
 10.25 ms -  10.30 ms:    6 ██████
 10.30 ms -  10.35 ms:    9 █████████
 10.35 ms -  10.40 ms:    3 ███
 10.40 ms -  10.45 ms:    5 █████
 10.45 ms -  10.50 ms:   12 ████████████
 10.50 ms -  10.55 ms:    3 ███
 10.55 ms -  10.60 ms:    1 █


Things like the histogram builder above become (for me) much simpler when "elapsed" is just a thing that already exists even though it's very simple on its own.

If the right call is that we shouldn't add such a thing to the stdlib, should we instead just add an example of an elapsed context manager to the contextlib docs? Perhaps copy-paste from the docs is good enough for most users (like we do with the itertools recipes for example).  I'd be willing to add something to the docs if it's decided that's the way to go.

(On the off chance some version of "elapsed" goes into the stdlib, I think contextlib is right place rather than timeit, particularly because of how the "sample_before_and_after" pattern generalises)

History
Date	User	Action	Args
2019-03-25 13:46:36	cjrh	set	recipients: + cjrh, tim.peters, georg.brandl, rhettinger, ncoghlan, giampaolo.rodola, rbcollins, ezio.melotti, steven.daprano, r.david.murray, THRlWiTi, martin.panter, dmoore, cheryl.sabella, xgdomingo
2019-03-25 13:46:36	cjrh	set	messageid: <1553521596.87.0.600186553687.issue19495@roundup.psfhosted.org>
2019-03-25 13:46:36	cjrh	link	issue19495 messages
2019-03-25 13:46:36	cjrh	create