This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author EnigmaCurry
Recipients EnigmaCurry, georg.brandl
Date 2009-08-27.04:17:07
SpamBayes Score 1.6016278e-06
Marked as misclassified No
Message-id <1251346629.44.0.734409123127.issue6788@psf.upfronthosting.co.za>
In-reply-to
Content
Opening a UTF-8 encoded file with unix newlines ("\n") on Win32:

codecs.open("whatever.txt","r","utf-8").read()

replaces the newlines ("\n") with CR+LF ("\r\n").

The docs specifically say that :

"Files are always opened in binary mode, even if no binary mode was
specified. This is done to avoid data loss due to encodings using 8-bit
values. This means that no automatic conversion of '\n' is done on
reading and writing."

And yet, opening the file with an explicit binary mode resolves the
situation:

codecs.open("whatever.txt","rb","utf-8").read()

This reads the file with the original newlines unmodified.

The implementation of codecs.open and the documentation are out of sync.
History
Date User Action Args
2009-08-27 04:17:09EnigmaCurrysetrecipients: + EnigmaCurry, georg.brandl
2009-08-27 04:17:09EnigmaCurrysetmessageid: <1251346629.44.0.734409123127.issue6788@psf.upfronthosting.co.za>
2009-08-27 04:17:07EnigmaCurrylinkissue6788 messages
2009-08-27 04:17:07EnigmaCurrycreate