This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: codecs.open on Win32 does not force binary mode
Type: behavior Stage:
Components: Documentation, Library (Lib) Versions: Python 3.1, Python 2.6
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: georg.brandl Nosy List: EnigmaCurry, amaury.forgeotdarc, georg.brandl, lemburg
Priority: normal Keywords:

Created on 2009-08-27 04:17 by EnigmaCurry, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
codecs_bug.py EnigmaCurry, 2009-08-27 13:29 Doctests for codecs.open
Messages (4)
msg91995 - (view) Author: Ryan McGuire (EnigmaCurry) Date: 2009-08-27 04:17
Opening a UTF-8 encoded file with unix newlines ("\n") on Win32:

codecs.open("whatever.txt","r","utf-8").read()

replaces the newlines ("\n") with CR+LF ("\r\n").

The docs specifically say that :

"Files are always opened in binary mode, even if no binary mode was
specified. This is done to avoid data loss due to encodings using 8-bit
values. This means that no automatic conversion of '\n' is done on
reading and writing."

And yet, opening the file with an explicit binary mode resolves the
situation:

codecs.open("whatever.txt","rb","utf-8").read()

This reads the file with the original newlines unmodified.

The implementation of codecs.open and the documentation are out of sync.
msg91999 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2009-08-27 09:36
Ryan McGuire wrote:
> 
> New submission from Ryan McGuire <python.org@enigmacurry.com>:
> 
> Opening a UTF-8 encoded file with unix newlines ("\n") on Win32:
> 
> codecs.open("whatever.txt","r","utf-8").read()
> 
> replaces the newlines ("\n") with CR+LF ("\r\n").
> 
> The docs specifically say that :
> 
> "Files are always opened in binary mode, even if no binary mode was
> specified. This is done to avoid data loss due to encodings using 8-bit
> values. This means that no automatic conversion of '\n' is done on
> reading and writing."
> 
> And yet, opening the file with an explicit binary mode resolves the
> situation:
> 
> codecs.open("whatever.txt","rb","utf-8").read()
> 
> This reads the file with the original newlines unmodified.
> 
> The implementation of codecs.open and the documentation are out of sync.

The implementation looks like this:

    if encoding is not None and \
       'b' not in mode:
        # Force opening of the file in binary mode
        mode = mode + 'b'

in both Python 2 and 3, so I'm not sure what could be causing
this.
msg92001 - (view) Author: Ryan McGuire (EnigmaCurry) Date: 2009-08-27 13:29
Uploading a doctest for this. 

The tests are successful on Linux using Python 2.6
They fail on Win32 with Python 2.6
msg92101 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-08-31 06:33
I think your test is invalid: it creates the file in "w" mode, so \n are 
written as two bytes \r\n on the disk.
codecs.open just reads them back.
History
Date User Action Args
2022-04-11 14:56:52adminsetgithub: 51037
2010-04-29 18:00:18terry.reedysetstatus: pending -> closed
2009-08-31 06:33:35amaury.forgeotdarcsetstatus: open -> pending

nosy: + amaury.forgeotdarc
messages: + msg92101

resolution: not a bug
2009-08-27 13:29:29EnigmaCurrysetfiles: + codecs_bug.py

messages: + msg92001
2009-08-27 09:36:43lemburgsetnosy: + lemburg
messages: + msg91999
2009-08-27 04:17:07EnigmaCurrycreate