This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients ezio.melotti, martin.panter, pitrou, r.david.murray, socketpair, vstinner
Date 2015-12-15.10:05:00
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1450173901.51.0.411921274024.issue25849@psf.upfronthosting.co.za>
In-reply-to
Content
> If the “slow reconstruction algorithm” was clarified or removed, ...

I wrote this algorithm, or I helpd to write it, I don't recall.

The problem is readahead: TextIOWrapper read more bytes than requested for performances. But when tell() is called, the user expects to get the current file position, not the "read ahead" file position. So we have to go backward. Problem: TextIOWrapper uses text (Unicode) whereas all files are bytes on the disk. We need to compute the size of the readahead buffer in bytes from a buffer in characters.

The bad performances comes from multibyte codecs which requires heuristic to first guess the number of bytes and then really encode back bytes to find the exact size.

See _pyio.TextIOWrapper.tell() for the Python implementation.

# Fast search for an acceptable start point, close to our
# current pos.
# Rationale: calling decoder.decode() has a large overhead
# regardless of chunk size; we want the number of such calls to
# be O(1) in most situations (common decoders, non-crazy input).
# Actually, it will be exactly 1 for fixed-size codecs (all
# 8-bit codecs, also UTF-16 and UTF-32).

(Incomplete) history of the Python implementation of the tell() method:

* changeset 7c6972f37fe3 (2007)
* changeset 28bc7ed26574: More efficient implementation
* changeset b5a2e753b682: use the new getstate/setstate decoder API
* changeset 04050373d799 (2008): fix for stateful decoders
* changeset 39a4f4393ef1: additional fixes to the handling of 'limit'
* (Lib/io.py moved to Lib/_pyio.py)
* changeset 4b6052320e98 (Issue #11114): optimization
History
Date User Action Args
2015-12-15 10:05:01vstinnersetrecipients: + vstinner, pitrou, ezio.melotti, r.david.murray, socketpair, martin.panter
2015-12-15 10:05:01vstinnersetmessageid: <1450173901.51.0.411921274024.issue25849@psf.upfronthosting.co.za>
2015-12-15 10:05:01vstinnerlinkissue25849 messages
2015-12-15 10:05:00vstinnercreate