Message91995
Opening a UTF-8 encoded file with unix newlines ("\n") on Win32:
codecs.open("whatever.txt","r","utf-8").read()
replaces the newlines ("\n") with CR+LF ("\r\n").
The docs specifically say that :
"Files are always opened in binary mode, even if no binary mode was
specified. This is done to avoid data loss due to encodings using 8-bit
values. This means that no automatic conversion of '\n' is done on
reading and writing."
And yet, opening the file with an explicit binary mode resolves the
situation:
codecs.open("whatever.txt","rb","utf-8").read()
This reads the file with the original newlines unmodified.
The implementation of codecs.open and the documentation are out of sync. |
|
Date |
User |
Action |
Args |
2009-08-27 04:17:09 | EnigmaCurry | set | recipients:
+ EnigmaCurry, georg.brandl |
2009-08-27 04:17:09 | EnigmaCurry | set | messageid: <1251346629.44.0.734409123127.issue6788@psf.upfronthosting.co.za> |
2009-08-27 04:17:07 | EnigmaCurry | link | issue6788 messages |
2009-08-27 04:17:07 | EnigmaCurry | create | |
|