This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: BOM incorrectly inserted before writing, after seeking in text file
Type: behavior Stage: resolved
Components: IO Versions: Python 3.4, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: MarkIngramUK, amaury.forgeotdarc, pitrou, python-dev
Priority: normal Keywords: patch

Created on 2014-12-02 16:41 by MarkIngramUK, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
append-test.py MarkIngramUK, 2014-12-02 16:41 Test case
bom_seek_append.patch pitrou, 2014-12-07 01:11
Messages (7)
msg232015 - (view) Author: Mark Ingram (MarkIngramUK) Date: 2014-12-02 16:41
If you open a text file for append, but then perform any form of seeking, before attempting to write to the file, it will cause the BOM to be written before you text. See the attached file for an example.

If you run the test, take a look at the output file, and you'll notice the UTF16 BOM gets written out before each number.

I'm running a 2014 iMac with Yosemite.
msg232025 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2014-12-02 17:09
issue5006 was supposed to take care of this, but it has a flaw IMO:
This statement https://hg.python.org/cpython/file/0744ceb5c0ed/Lib/_pyio.py#l2003 is missing an "and whence!=2".
msg232091 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-12-03 20:52
This is a limitation more than a bug. When you seek to the start of the file, the encoder is reset because Python thinks you are gonna to write there. If you remove the call to `file.seek(0, io.SEEK_SET)`, things work fine.

@Amaury, whence can only be zero there:
https://hg.python.org/cpython/file/0744ceb5c0ed/Lib/_pyio.py#l1960
msg232092 - (view) Author: Mark Ingram (MarkIngramUK) Date: 2014-12-03 20:57
It's more than a limitation, because if I call `file.seek(0, io.SEEK_END)` then the encoder is still reset, and will still write the BOM, even at the end of the file.

This also means that it's impossible to seek in a text file that you want to append to. I've had to work around this by opening the file as binary, manually writing the BOM, and writing the strings as encoded bytes.
msg232263 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-12-07 01:11
Here is a patch.
msg240688 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-04-13 18:04
New changeset 946740824eaf by Antoine Pitrou in branch '3.4':
Issue #22982: Improve BOM handling when seeking to multiple positions of a writable text file.
https://hg.python.org/cpython/rev/946740824eaf

New changeset 3583e5191b96 by Antoine Pitrou in branch 'default':
Issue #22982: Improve BOM handling when seeking to multiple positions of a writable text file.
https://hg.python.org/cpython/rev/3583e5191b96
msg240689 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-04-13 18:05
Fix is pushed. Thanks for the report!
History
Date User Action Args
2022-04-11 14:58:10adminsetgithub: 67171
2015-04-13 18:05:22pitrousetstatus: open -> closed
resolution: fixed
messages: + msg240689

stage: patch review -> resolved
2015-04-13 18:04:54python-devsetnosy: + python-dev
messages: + msg240688
2014-12-07 01:11:49pitrousetfiles: + bom_seek_append.patch
versions: + Python 3.5
messages: + msg232263

keywords: + patch
stage: patch review
2014-12-03 20:57:15MarkIngramUKsetmessages: + msg232092
2014-12-03 20:52:28pitrousetnosy: + pitrou
messages: + msg232091
2014-12-02 17:09:08amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg232025
2014-12-02 16:41:42MarkIngramUKcreate