This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients benjamin.peterson, ezio.melotti, lemburg, loewis, pitrou, vstinner, ysj.ray
Date 2010-04-19.08:45:19
SpamBayes Score 5.551115e-16
Marked as misclassified No
Message-id <4BCC185D.3070500@egenix.com>
In-reply-to <1271614817.51.0.532587928673.issue8438@psf.upfronthosting.co.za>
Content
STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
>> I think it would be best to backport the handler (even though 
>> it is not needed in Python 2.7), since it makes porting apps 
>> to 3.x easier.
> 
> surrogateescape should not be used directly be applications. It's used by Python3 internals using unicode by default.
> 
> I don't know if it's would help porting applications from Python2 to Python3. I don't know a use case of surrogateescape in Python2. By default, Python2 uses byte string everywhere, especially for filenames, and so it doesn't need any unicode error handler.
> 
> Another point to consider is that utf8 encoder rejects surrogates in Python3, whereas surrogates are accepted by the Python2 utf8 encoder.

Sorry, I think I need to correct myself: I mixed up the handlers
surrogateescape and surrogatepass. I was actually thinking of the
surrogatepass handler which makes the Python3 UTF-8 codec have like the
Python2 UTF-8 codec (without extra handler), not the surrogatescape
handler which implements the UTF-8b logic of escaping non-encodable
bytes to lone surrogates.

* The surrogatepass handler is needed in Python 2.7 to make it
possible to write applications that work in both 2.7 and 3.x
without changing the code.

I consider this an important missing backport for 2.7, since
without this handler, the UTF-8 codecs in 2.7 and 3.x are
incompatible and there's no other way to work around this
other than to make use of the errorhandler conditionally
depend on the Python version.

As such, it's a bug rather than a new feature.

* The surrogateescape handler implements the UTF-8b escaping
logic:

b'\x91\x92'

In Python 3.x this is needed to work around problems with
wrong I/O encoding settings or situations where you have mixed
encoding settings used in external resources such as environment
variable content, filesystems using different encodings than
the system one, remote shell output, pipes which don't carry
any encoding information, etc. etc.

Backporting this handler would be useful for Python 2.7 as
well, since it allows preparing 2.7 applications for use in
3.x and again allows using the same code for 2.7 and 3.x.

Not having this handler in 2.7 is not as serious as the
surrogatepass handler, but still useful for applications to
use that are meant to run in 2.7 and 3.x unchanged.
History
Date User Action Args
2010-04-19 08:45:22lemburgsetrecipients: + lemburg, loewis, pitrou, vstinner, benjamin.peterson, ezio.melotti, ysj.ray
2010-04-19 08:45:20lemburglinkissue8438 messages
2010-04-19 08:45:19lemburgcreate