Message224632
stackoverflow.com has a zillion answers recommending the use of codecs.open() as a unicode capable drop in replacement for open(). This probably means that there is still a lot of code being written that uses codecs.open(). That's bad thing because of codecs.open()'s lack of newline conversion. A lot of that code will
- have compatibility issues when it is moved between unix and windows
- silently break text files on windows, leading to issues further downstream (confusing other tools, messing up revision control histories)
The problem has been fixed with io.open() in 2.x and open() in 3.x. Unfortunately the 2.7 unicode HOWTO still recommends the use of codecs.open(). The 2.7 and the 3.x documentation of codecs.open() doesn't refer the reader to better alternatives.
The attached patches fix that.
The only downside I see is that newly written code that uses the better alternatives would be incompatible with 2.5 and older. However croaking on a small minority of systems is better than silently disrupting workflows, causing platform incompatibilities, and inviting flaky workarounds.
The 2.7 patch makes the unicode HOWTO recommend io.open() instead of codecs.open(). Both patches change the codecs.open() documentation to refer to io.open() or (on 3.x) open().
Additionally I removed the "data loss" explanation from codecs.open()'s note about its lack of newline conversion. It is not particularly helpful information and it is not entirely correct (data loss could also have been avoided by doing newline conversion before encoding and after decoding) |
|
Date |
User |
Action |
Args |
2014-08-03 13:17:19 | Frank.van.Dijk | set | recipients:
+ Frank.van.Dijk, docs@python |
2014-08-03 13:17:18 | Frank.van.Dijk | set | messageid: <1407071838.74.0.684483503953.issue22128@psf.upfronthosting.co.za> |
2014-08-03 13:17:18 | Frank.van.Dijk | link | issue22128 messages |
2014-08-03 13:17:18 | Frank.van.Dijk | create | |
|