Author jaraco
Recipients Valentin Zhao, Windson Yang, jaraco, paul.moore, steve.dower, tim.golden, zach.ware
Date 2018-11-18.18:42:29
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1542566549.99.0.788709270274.issue35131@psf.upfronthosting.co.za>
In-reply-to
Content
The problem you've encountered is that previously the file was assumed to be one encoding and would fail if it was not that encoding... so it was possible to lazy-load the file and process each line.

In the new model, where you need to evaluate the viability of the file in one of two candidate encodings, you'll necessarily need to read the entire file once before processing its contents.

Therefore, I recommend one of these options:

1. Always read the file in binary mode, ascertain the "best" encoding, then rewind the file and wrap it in a TextIOWrapper for that encoding. Presumably this logic is common--perhaps there's already a routine that does just that.
2. In a try/except block, read the entire content, decoded, into another iterable ... and then have the logic below rely on that content. i.e. `f = list(f)`.
3. Always assume UTF-8 instead of the system encoding. This change would be backward incompatible, so probably isn't acceptable without at least an interim release with a deprecation warning.

I recommend a combination of (1) and then (3) in the future. That is:

def determine_best_encoding(f, encodings=('utf-8', sys.getdefaultencoding())):
    """
    Attempt to read and decode all of stream f using the encodings
    and return the first one that succeeds. Rewinds the file.
    """


f = open(..., 'rb)
encoding = determine_best_encoding(f)
if encoding != 'utf-8':
    warnings.warn("Detected pth file with unsupported encoding", DeprecationWarning)
f = io.TextIOWrapper(f, encoding)


Then, in a future version, dropping support for local encodings, all of that code can be replaced with `f = open(..., encoding='utf-8')`.
History
Date User Action Args
2018-11-18 18:42:30jaracosetrecipients: + jaraco, paul.moore, tim.golden, zach.ware, steve.dower, Windson Yang, Valentin Zhao
2018-11-18 18:42:29jaracosetmessageid: <1542566549.99.0.788709270274.issue35131@psf.upfronthosting.co.za>
2018-11-18 18:42:29jaracolinkissue35131 messages
2018-11-18 18:42:29jaracocreate