This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Unicode BOM left in loaded text
Type: behavior Stage:
Components: IO, Unicode Versions: Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: loewis, onpon4, vstinner
Priority: normal Keywords:

Created on 2011-01-25 21:25 by onpon4, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (4)
msg127055 - (view) Author: (onpon4) Date: 2011-01-25 21:25
This is for Python 2.7.1. It isn't an issue on 2.6.5 and I haven't tested it on 3.1.

Quite simply, the Unicode BOM (unichr(65279)) is included in the text loaded from a UTF-8 text file. This can cause issues in some cases, but is easily worked around by calling "s.strip(unichr(65279))" on the first line of loaded text.
msg127065 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-01-25 22:58
Can you please be more specific? What do you mean by "text loaded from a UTF-8 text file"? How specifically did you load it?
msg127073 - (view) Author: (onpon4) Date: 2011-01-25 23:45
Like this:

f = io.open()
f.readline()
msg127075 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-01-26 00:23
Why are you saying this isn't an issue in 2.6.5? It behaves exactly the same as 2.7.1.

In any case, this is not a bug. Pass encoding="utf-8-sig" to io.open to have the signature stripped when the file is read.
History
Date User Action Args
2022-04-11 14:57:11adminsetgithub: 55219
2011-01-26 00:23:02loewissetstatus: open -> closed

messages: + msg127075
resolution: not a bug
2011-01-25 23:45:25onpon4setmessages: + msg127073
2011-01-25 23:00:49vstinnersetnosy: + vstinner
2011-01-25 22:58:44loewissetnosy: + loewis
messages: + msg127065
2011-01-25 21:25:59onpon4create