This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ncoghlan
Recipients ncoghlan
Date 2016-10-10.03:52:10
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1476071532.35.0.73217429855.issue28403@psf.upfronthosting.co.za>
In-reply-to
Content
Some of the hardest compatibility issues to track down in Python 3 migrations are those where existing code is depending on an implicit str->unicode promotion something in the depths of a support library (or sometimes even the standard library - the context where this came up relates to some apparent misbehaviour in the standard library). In other cases, just being able to rule implicit conversions out as a possible contributing factor can be helpful in finding the real problem.

It's technically already possible to hook implicit conversions by adjusting (or shadowing) the site.py module and replacing the default "ascii" encoding with one that emits a warning whenever you rely on it: http://washort.twistedmatrix.com/2010/11/unicode-in-python-and-how-to-prevent-it.html

However, actually setting that up is a bit tricky, since we deliberately drop "sys.setdefaultencoding" from the sys module in the default site module. That means requesting warnings for implicit conversions requires doing the following:

1. Finding the "ascii_with_warnings" codec above (or writing your own)
2. Learning one of the following 3 tricks for overriding the default encoding:

2a. Run with "-S" and call sys.setdefaultencoding post-startup
2b. Edit the actual system site.py in a container or other test environment
2c. Shadow site.py with your own modified copy

3. Run your tests or application with the modified default encoding

If we wanted to make that easier for folks migrating, the first step would be to provide the "ascii_with_warnings" codec by default in Python 2.7 (perhaps as "_ascii_with_warnings", since it isn't intended for general use, it's just a migration helper)

The second would be to provide a way to turn it on that doesn't require fiddling with the site module. The simplest option there would be to always enable it under `-3`.

The argument against the simple option is that I'm not sure how noisy it would be by default - there are some standard library modules (e.g. URL processing) where we still rely on implicit encoding and decoding in Python 2, but have separate code paths in Python 3.

Since we don't have -X options in Python 2, the second simplest alternative would be to leave `sys.setdefaultencoding` available when running under `-3`: that way folks could more easily opt in to enabling the "ascii_with_warnings" codec selectively (e.g. via a context manager), rather than always having it enabled.
History
Date User Action Args
2016-10-10 03:52:12ncoghlansetrecipients: + ncoghlan
2016-10-10 03:52:12ncoghlansetmessageid: <1476071532.35.0.73217429855.issue28403@psf.upfronthosting.co.za>
2016-10-10 03:52:12ncoghlanlinkissue28403 messages
2016-10-10 03:52:10ncoghlancreate