Message 224632 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Frank.van.Dijk
Recipients	Frank.van.Dijk, docs@python
Date	2014-08-03.13:17:17
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1407071838.74.0.684483503953.issue22128@psf.upfronthosting.co.za>
In-reply-to

Content
stackoverflow.com has a zillion answers recommending the use of codecs.open() as a unicode capable drop in replacement for open(). This probably means that there is still a lot of code being written that uses codecs.open(). That's bad thing because of codecs.open()'s lack of newline conversion. A lot of that code will - have compatibility issues when it is moved between unix and windows - silently break text files on windows, leading to issues further downstream (confusing other tools, messing up revision control histories) The problem has been fixed with io.open() in 2.x and open() in 3.x. Unfortunately the 2.7 unicode HOWTO still recommends the use of codecs.open(). The 2.7 and the 3.x documentation of codecs.open() doesn't refer the reader to better alternatives. The attached patches fix that. The only downside I see is that newly written code that uses the better alternatives would be incompatible with 2.5 and older. However croaking on a small minority of systems is better than silently disrupting workflows, causing platform incompatibilities, and inviting flaky workarounds. The 2.7 patch makes the unicode HOWTO recommend io.open() instead of codecs.open(). Both patches change the codecs.open() documentation to refer to io.open() or (on 3.x) open(). Additionally I removed the "data loss" explanation from codecs.open()'s note about its lack of newline conversion. It is not particularly helpful information and it is not entirely correct (data loss could also have been avoided by doing newline conversion before encoding and after decoding)

stackoverflow.com has a zillion answers recommending the use of codecs.open() as a unicode capable drop in replacement for open(). This probably means that there is still a lot of code being written that uses codecs.open(). That's bad thing because of codecs.open()'s lack of newline conversion. A lot of that code will 
- have compatibility issues when it is moved between unix and windows
- silently break text files on windows, leading to issues further downstream (confusing other tools, messing up revision control histories)

The problem has been fixed with io.open() in 2.x and open() in 3.x. Unfortunately the 2.7 unicode HOWTO still recommends the use of codecs.open(). The 2.7 and the 3.x documentation of codecs.open() doesn't refer the reader to better alternatives.

The attached patches fix that.

The only downside I see is that newly written code that uses the better alternatives would be incompatible with 2.5 and older. However croaking on a small minority of systems is better than silently disrupting workflows, causing platform incompatibilities, and inviting flaky workarounds.

The 2.7 patch makes the unicode HOWTO recommend io.open() instead of codecs.open(). Both patches change the codecs.open() documentation to refer to io.open() or (on 3.x) open().

Additionally I removed the "data loss" explanation from codecs.open()'s note about its lack of newline conversion. It is not particularly helpful information and it is not entirely correct (data loss could also have been avoided by doing newline conversion before encoding and after decoding)

History
Date	User	Action	Args
2014-08-03 13:17:19	Frank.van.Dijk	set	recipients: + Frank.van.Dijk, docs@python
2014-08-03 13:17:18	Frank.van.Dijk	set	messageid: <1407071838.74.0.684483503953.issue22128@psf.upfronthosting.co.za>
2014-08-03 13:17:18	Frank.van.Dijk	link	issue22128 messages
2014-08-03 13:17:18	Frank.van.Dijk	create