This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Porting guide: disabling & warning on implicit unicode conversions
Type: enhancement Stage: resolved
Components: Documentation Versions: Python 2.7
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: docs@python Nosy List: brett.cannon, docs@python, lemburg, ncoghlan, petr.viktorin, serhiy.storchaka
Priority: normal Keywords:

Created on 2016-10-10 03:52 by ncoghlan, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (9)
msg278402 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-10-10 03:52
Some of the hardest compatibility issues to track down in Python 3 migrations are those where existing code is depending on an implicit str->unicode promotion something in the depths of a support library (or sometimes even the standard library - the context where this came up relates to some apparent misbehaviour in the standard library). In other cases, just being able to rule implicit conversions out as a possible contributing factor can be helpful in finding the real problem.

It's technically already possible to hook implicit conversions by adjusting (or shadowing) the site.py module and replacing the default "ascii" encoding with one that emits a warning whenever you rely on it: http://washort.twistedmatrix.com/2010/11/unicode-in-python-and-how-to-prevent-it.html

However, actually setting that up is a bit tricky, since we deliberately drop "sys.setdefaultencoding" from the sys module in the default site module. That means requesting warnings for implicit conversions requires doing the following:

1. Finding the "ascii_with_warnings" codec above (or writing your own)
2. Learning one of the following 3 tricks for overriding the default encoding:

2a. Run with "-S" and call sys.setdefaultencoding post-startup
2b. Edit the actual system site.py in a container or other test environment
2c. Shadow site.py with your own modified copy

3. Run your tests or application with the modified default encoding

If we wanted to make that easier for folks migrating, the first step would be to provide the "ascii_with_warnings" codec by default in Python 2.7 (perhaps as "_ascii_with_warnings", since it isn't intended for general use, it's just a migration helper)

The second would be to provide a way to turn it on that doesn't require fiddling with the site module. The simplest option there would be to always enable it under `-3`.

The argument against the simple option is that I'm not sure how noisy it would be by default - there are some standard library modules (e.g. URL processing) where we still rely on implicit encoding and decoding in Python 2, but have separate code paths in Python 3.

Since we don't have -X options in Python 2, the second simplest alternative would be to leave `sys.setdefaultencoding` available when running under `-3`: that way folks could more easily opt in to enabling the "ascii_with_warnings" codec selectively (e.g. via a context manager), rather than always having it enabled.
msg278403 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-10-10 03:58
(Correction to the above: the case where this came up turned out to be due to consuming code monkeypatching things when it really shouldn't have been, so it fell into the second category of "It would have been helpful to be able to more easily rule this out as a contributing factor")
msg278405 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2016-10-10 07:17
Nick, I think you've missed the "undefined" encoding that we've had for this ever since Unicode was added to Python.

You put the needed code into your sitecustomize.py file and Python2 will then behave just like Python3, i.e. raise an exception instead of coercing to Unicode:

sitecustomize.py:
import sys
sys.setdefaultencoding('undefined')

There's no need to hack this into site.py or to make sys.setdefaultencoding() available outside sitecustomize.py.

If you want an OS environ switch, you can put the necessary logic into sitecustomize.py as well.
msg278411 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-10-10 12:09
The main problem with the "undefined" encoding is that it actually *fails* the application, rather than allowing it to continue, but providing a warning at each new point where it encounters implicit encoding or decoding. This means the parts of the standard library that actually rely on implicit coercion fail outright, rather than just generate warning noise that you can filter out as irrelevant to your particular application.

You raise a good point about `sitecustomize.py` though - I always forget about that feature myself, and it didn't come up in any of the Google results I looked at either.

The existing "undefined" option also at least allows you to categorically ensure you're not relying on implicit conversions at all, so the Python 3 porting guide could be updated to explicitly cover:

1. Finding the site customization path for your active virtual environment:

    python -c 'import os.path, sysconfig; print(os.path.join(sysconfig.get_path("purelib"), "sitecustomize.py"))'

2. What to write to that location to disable implicit Unicode conversions:

    import sys
    sys.setdefaultencoding('undefined')

Giving folks the following tiered path to Python 3 support:

- get "pylint --py3k" passing (e.g. via python-modernize)
- eliminate "python -3" warnings under Python 2
- (optional) support running with the above site customizations
- actually run under Python 3

Brett, does the above approach sound reasonable to you? If so, then I'll do that as a pure documentation change in the Py3k porting guide with a "See Also" to the above blog post, and then mark this as closed/postponed (given the `sitecustomize` approach to enable it, the 3rd party codec should be fine for folks that want the warning behaviour instead)
msg278413 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-10-10 12:17
Adding Petr to the nosy list, as I'd like to get his perspective on this once I have a draft docs patch to review.

I also realised it made more sense to just repurpose this issue to cover the proposed docs updates.
msg278415 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2016-10-10 13:08
In portingguide [0] I could only recommend sitecustomize with a (possibly third-party) codec that emits warnings; not 'undefined'.

The things that aren't ported yet are generally either Non-Python applications with Python bindings or plugins (Gimp, Samba, ...), projects that are very large relative to the count of available maintainers (VCSs, Sugar, wxPython, ...), or code that depends on those.

If sys.setdefaultencoding('undefined') breaks parts of the standard library, it might be OK for smaller scripts but I fear it won't help big projects much.


[0] http://portingguide.readthedocs.io/en/latest/
msg278420 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2016-10-10 15:02
On 10.10.2016 15:08, Petr Viktorin wrote:
> If sys.setdefaultencoding('undefined') breaks parts of the standard library, it might be OK for smaller scripts but I fear it won't help big projects much.

That's true. It does break the stdlib (the codec was originally
added in order to test exactly this scenario).

A new codec "ascii-warn" could easily be added, based on the
code used for "undefined".
msg278665 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-10-14 21:06
If a new codec gets added to 2.7 then I'm fine with the proposed change.
msg370445 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-05-31 13:24
Python 2.7 is no longer supported.
History
Date User Action Args
2022-04-11 14:58:38adminsetgithub: 72589
2020-05-31 13:24:31serhiy.storchakasetstatus: open -> closed

nosy: + serhiy.storchaka
messages: + msg370445

resolution: out of date
stage: needs patch -> resolved
2016-10-14 21:06:22brett.cannonsetmessages: + msg278665
2016-10-10 15:02:09lemburgsetmessages: + msg278420
2016-10-10 13:08:10petr.viktorinsetmessages: + msg278415
2016-10-10 12:17:08ncoghlansetassignee: docs@python
components: + Documentation
title: Migration RFE: optional warning for implicit unicode conversions -> Porting guide: disabling & warning on implicit unicode conversions
nosy: + petr.viktorin, docs@python

messages: + msg278413
stage: needs patch
2016-10-10 12:09:38ncoghlansetnosy: + brett.cannon
messages: + msg278411
2016-10-10 07:17:34lemburgsetnosy: + lemburg
messages: + msg278405
2016-10-10 03:58:28ncoghlansetmessages: + msg278403
2016-10-10 03:52:12ncoghlancreate