This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author steve.dower
Recipients RohanA, ezio.melotti, jayman, methane, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Date 2021-06-26.00:25:16
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1624667116.36.0.00619346002306.issue44510@roundup.psfhosted.org>
In-reply-to
Content
The file that fails contains a UTF-8 BOM at the start, which is a multibyte character indicating that the file is definitely UTF-8.

Unfortunately, none of Python's default settings will handle this, because it's a convention that only really exists on Windows.

On Windows we currently still default to your console encoding, since that is what we have always done and changing it by default is very complex. Apparently your console encoding does not include the character represented by the first byte of the BOM - in any case, it's not a character you'd ever want to see, so if it _had_ worked, you'd just have garbage in your read data.

The immediate fix for your scenario is to use "open(filename, 'r', encoding='utf-8-sig')" which will handle the BOM correctly.

For the core team, I still think it's worth having the default encoding be able to read and drop the UTF-8 BOM from the start of a file. Since we shouldn't do it for any arbitrary operation (which may not be at the start of a file), it'd have to be a special default object for the TextIOWrapper case, but it would have solved this issue. If the BOM is there, it can switch to UTF-8 (or UTF-16, if that BOM exists); if not, it can use whatever the default would have been (based on all the other available settings).
History
Date User Action Args
2021-06-26 00:25:16steve.dowersetrecipients: + steve.dower, paul.moore, vstinner, tim.golden, ezio.melotti, methane, zach.ware, jayman, RohanA
2021-06-26 00:25:16steve.dowersetmessageid: <1624667116.36.0.00619346002306.issue44510@roundup.psfhosted.org>
2021-06-26 00:25:16steve.dowerlinkissue44510 messages
2021-06-26 00:25:16steve.dowercreate