Author vstinner
Recipients Naman-Bhalla, barry, benjamin.peterson, doko, ezio.melotti, jaysinh.shukla, mrabarnett, ncoghlan, paul.moore, serhiy.storchaka, steve.dower, tim.golden, vstinner, xtreak, zach.ware
Date 2019-02-28.17:34:31
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1551375271.56.0.263764017823.issue29571@roundup.psfhosted.org>
In-reply-to
Content
Ah, I can reproduce the bug on Fedora 29 using "LANG=en_IN ./python -m test -v test_re".

The problem is that locale.getlocale() is not reliable: it pretends that the locale encoding is ISO8859-1, whereas the real encoding is UTF-8:

$ LANG=en_IN ./python 
Python 3.8.0a2+ (heads/master:4cbea518a0, Feb 28 2019, 18:19:44) 
>>> chr(224).encode('ISO8859-1')
b'\xe0'
>>> import _testcapi
>>> _testcapi.DecodeLocaleEx(b'\xe0', 0, 'strict')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: decode error: pos=0, reason=decoding error

>>> import locale

# Wrong encoding
>>> locale.getlocale(locale.LC_CTYPE)
('en_IN', 'ISO8859-1')
>>> locale.setlocale(locale.LC_CTYPE, None)
'en_IN'
>>> locale._parse_localename('en_IN')
('en_IN', 'ISO8859-1')

# Real encoding
>>> locale.getpreferredencoding()
'UTF-8'
>>> locale.nl_langinfo(locale.CODESET)
'UTF-8'


Attached PR 12099 fix the issue.
History
Date User Action Args
2019-02-28 17:34:31vstinnersetrecipients: + vstinner, barry, doko, paul.moore, ncoghlan, tim.golden, benjamin.peterson, ezio.melotti, mrabarnett, zach.ware, serhiy.storchaka, steve.dower, jaysinh.shukla, Naman-Bhalla, xtreak
2019-02-28 17:34:31vstinnersetmessageid: <1551375271.56.0.263764017823.issue29571@roundup.psfhosted.org>
2019-02-28 17:34:31vstinnerlinkissue29571 messages
2019-02-28 17:34:31vstinnercreate