This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients pitrou, vstinner
Date 2010-01-07.23:18:47
SpamBayes Score 4.5089884e-05
Marked as misclassified No
Message-id <1262906330.66.0.125497764461.issue7651@psf.upfronthosting.co.za>
In-reply-to
Content
open_bom.patch is the proof of concept. It only works in read mode. The idea is to delay the creation of the encoding and the decoder. We wait for just after the first read_chunk().

The patch changes the default behaviour of open(): if the file starts with a BOM, the BOM is used but skipped. Example:
-------------
from _pyio import open

with open('test.txt', 'w', encoding='utf-8-sig') as fp:
    print("abc", file=fp)
    print("d\xe9f", file=fp)

with open('test.txt', 'r') as fp:
    print("open().read(): {!r}".format(fp.read()))
-------------

Unpatched Python displays '\ufeffabc\ndéf\n', whereas patched Python displays 'abc\ndéf\n'.
History
Date User Action Args
2010-01-07 23:18:52vstinnersetrecipients: + vstinner, pitrou
2010-01-07 23:18:50vstinnersetmessageid: <1262906330.66.0.125497764461.issue7651@psf.upfronthosting.co.za>
2010-01-07 23:18:49vstinnerlinkissue7651 messages
2010-01-07 23:18:48vstinnercreate