This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients EnigmaCurry, georg.brandl, lemburg
Date 2009-08-27.09:36:43
SpamBayes Score 6.905232e-11
Marked as misclassified No
Message-id <4A9653AA.70700@egenix.com>
In-reply-to <1251346629.44.0.734409123127.issue6788@psf.upfronthosting.co.za>
Content
Ryan McGuire wrote:
> 
> New submission from Ryan McGuire <python.org@enigmacurry.com>:
> 
> Opening a UTF-8 encoded file with unix newlines ("\n") on Win32:
> 
> codecs.open("whatever.txt","r","utf-8").read()
> 
> replaces the newlines ("\n") with CR+LF ("\r\n").
> 
> The docs specifically say that :
> 
> "Files are always opened in binary mode, even if no binary mode was
> specified. This is done to avoid data loss due to encodings using 8-bit
> values. This means that no automatic conversion of '\n' is done on
> reading and writing."
> 
> And yet, opening the file with an explicit binary mode resolves the
> situation:
> 
> codecs.open("whatever.txt","rb","utf-8").read()
> 
> This reads the file with the original newlines unmodified.
> 
> The implementation of codecs.open and the documentation are out of sync.

The implementation looks like this:

    if encoding is not None and \
       'b' not in mode:
        # Force opening of the file in binary mode
        mode = mode + 'b'

in both Python 2 and 3, so I'm not sure what could be causing
this.
History
Date User Action Args
2009-08-27 09:36:45lemburgsetrecipients: + lemburg, georg.brandl, EnigmaCurry
2009-08-27 09:36:43lemburglinkissue6788 messages
2009-08-27 09:36:43lemburgcreate