Author ncoghlan
Recipients cvrebert, docs@python, eli.bendersky, eric.araujo, ezio.melotti, nadeem.vawda, ncoghlan, pitrou, vstinner
Date 2012-02-12.11:18:48
SpamBayes Score 2.3858e-11
Marked as misclassified No
Message-id <1329045529.5.0.619347316857.issue13997@psf.upfronthosting.co.za>
In-reply-to
Content
If such use cases are indeed better handled as bytes, then that's what should be documented. However, there are some text processing assumptions that no longer hold when using bytes instead of strings (such as "x[0:1] == x[0]"). You also can't safely pass such byte sequences to various other APIs (e.g. urllib.parse will happily process surrogate escaped text without corrupting them, but will throw UnicodeDecodeError for bytes sequences that aren't pure 7-bit ASCII).

Using surrogateescape instead means that you're only going to have problems if you go to encode the data to an encoding other than the source one. That's basically the things work in Python 2 with 8-bit strings.
History
Date User Action Args
2012-02-12 11:18:49ncoghlansetrecipients: + ncoghlan, pitrou, vstinner, nadeem.vawda, ezio.melotti, eric.araujo, eli.bendersky, cvrebert, docs@python
2012-02-12 11:18:49ncoghlansetmessageid: <1329045529.5.0.619347316857.issue13997@psf.upfronthosting.co.za>
2012-02-12 11:18:48ncoghlanlinkissue13997 messages
2012-02-12 11:18:48ncoghlancreate