classification
Title: test_re is failing when local is set for `en_IN`
Type: behavior Stage:
Components: Regular Expressions, Tests, Windows Versions: Python 3.7, Python 3.6, Python 3.5
process
Status: open Resolution:
Dependencies: 20087 Superseder:
Assigned To: ncoghlan Nosy List: Naman-Bhalla, benjamin.peterson, ezio.melotti, haypo, jaysinh.shukla, mrabarnett, ncoghlan, paul.moore, serhiy.storchaka, steve.dower, tim.golden, zach.ware
Priority: normal Keywords: patch

Created on 2017-02-15 17:29 by jaysinh.shukla, last changed 2017-07-12 21:03 by Naman-Bhalla.

Files
File name Uploaded Description Edit
test_re_locale_flag.patch serhiy.storchaka, 2017-02-15 22:57 review
Pull Requests
URL Status Linked Edit
PR 149 merged ncoghlan, 2017-02-18 09:19
PR 153 merged ncoghlan, 2017-02-18 10:44
PR 154 merged ncoghlan, 2017-02-18 10:45
PR 422 benjamin.peterson, 2017-03-03 07:51
PR 554 merged benjamin.peterson, 2017-03-08 06:07
PR 555 merged benjamin.peterson, 2017-03-08 06:51
PR 556 merged benjamin.peterson, 2017-03-08 06:51
PR 2686 closed Naman-Bhalla, 2017-07-12 20:07
Messages (25)
msg287867 - (view) Author: Jaysinh shukla (jaysinh.shukla) * Date: 2017-02-15 17:29
Description:
    A test case is failing while running `./python -m test -v test_re`.

Traceback:
$>./python -m test -v test_re
== CPython 3.7.0a0 (default, Feb 15 2017, 22:28:32) [GCC 5.4.0 20160609]
==   Linux-4.4.0-62-generic-x86_64-with-debian-stretch-sid little-endian
==   hash algorithm: siphash24 64bit
==  cwd: /home/bigj/Jaysinh/cpython_git/cpython/build/test_python_613
==  encodings: locale=UTF-8, FS=utf-8
Testing with flags: sys.flags(debug=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, verbose=0, bytes_warning=0, quiet=0, hash_randomization=1, isolated=0)
Run tests sequentially
0:00:00 [1/1] test_re
test_re_benchmarks (test.test_re.ExternalTests)
re_tests benchmarks ... ok
test_re_tests (test.test_re.ExternalTests)
re_tests test suite ... ok
test_overlap_table (test.test_re.ImplementationTest) ... ok
test_bytes (test.test_re.PatternReprTests) ... ok
test_inline_flags (test.test_re.PatternReprTests) ... ok
test_locale (test.test_re.PatternReprTests) ... ok
test_long_pattern (test.test_re.PatternReprTests) ... ok
test_multiple_flags (test.test_re.PatternReprTests) ... ok
test_quotes (test.test_re.PatternReprTests) ... ok
test_single_flag (test.test_re.PatternReprTests) ... ok
test_unicode_flag (test.test_re.PatternReprTests) ... ok
test_unknown_flags (test.test_re.PatternReprTests) ... ok
test_without_flags (test.test_re.PatternReprTests) ... ok
test_anyall (test.test_re.ReTests) ... ok
test_ascii_and_unicode_flag (test.test_re.ReTests) ... ok
test_backref_group_name_in_exception (test.test_re.ReTests) ... ok
test_basic_re_sub (test.test_re.ReTests) ... ok
test_big_codesize (test.test_re.ReTests) ... ok
test_bigcharset (test.test_re.ReTests) ... ok
test_bug_113254 (test.test_re.ReTests) ... ok
test_bug_114660 (test.test_re.ReTests) ... ok
test_bug_117612 (test.test_re.ReTests) ... ok
test_bug_1661 (test.test_re.ReTests) ... ok
test_bug_16688 (test.test_re.ReTests) ... ok
test_bug_20998 (test.test_re.ReTests) ... ok
test_bug_2537 (test.test_re.ReTests) ... ok
test_bug_29444 (test.test_re.ReTests) ... ok
test_bug_3629 (test.test_re.ReTests) ... ok
test_bug_418626 (test.test_re.ReTests) ... ok
test_bug_448951 (test.test_re.ReTests) ... ok
test_bug_449000 (test.test_re.ReTests) ... ok
test_bug_449964 (test.test_re.ReTests) ... ok
test_bug_462270 (test.test_re.ReTests) ... ok
test_bug_527371 (test.test_re.ReTests) ... ok
test_bug_581080 (test.test_re.ReTests) ... ok
test_bug_612074 (test.test_re.ReTests) ... ok
test_bug_6509 (test.test_re.ReTests) ... ok
test_bug_6561 (test.test_re.ReTests) ... ok
test_bug_725106 (test.test_re.ReTests) ... ok
test_bug_725149 (test.test_re.ReTests) ... ok
test_bug_764548 (test.test_re.ReTests) ... ok
test_bug_817234 (test.test_re.ReTests) ... ok
test_bug_926075 (test.test_re.ReTests) ... ok
test_bug_931848 (test.test_re.ReTests) ... ok
test_bytes_str_mixing (test.test_re.ReTests) ... ok
test_category (test.test_re.ReTests) ... ok
test_character_set_errors (test.test_re.ReTests) ... ok
test_compile (test.test_re.ReTests) ... ok
test_constants (test.test_re.ReTests) ... ok
test_dealloc (test.test_re.ReTests) ... ok
test_debug_flag (test.test_re.ReTests) ... ok
test_dollar_matches_twice (test.test_re.ReTests)
$ matches the end of string, and just before the terminating ... ok
test_empty_array (test.test_re.ReTests) ... ok
test_enum (test.test_re.ReTests) ... ok
test_error (test.test_re.ReTests) ... ok
test_expand (test.test_re.ReTests) ... ok
test_finditer (test.test_re.ReTests) ... ok
test_flags (test.test_re.ReTests) ... ok
test_getattr (test.test_re.ReTests) ... ok
test_getlower (test.test_re.ReTests) ... ok
test_group (test.test_re.ReTests) ... ok
test_group_name_in_exception (test.test_re.ReTests) ... ok
test_groupdict (test.test_re.ReTests) ... ok
test_ignore_case (test.test_re.ReTests) ... ok
test_ignore_case_range (test.test_re.ReTests) ... ok
test_ignore_case_set (test.test_re.ReTests) ... ok
test_inline_flags (test.test_re.ReTests) ... ok
test_issue17998 (test.test_re.ReTests) ... ok
test_keep_buffer (test.test_re.ReTests) ... ok
test_keyword_parameters (test.test_re.ReTests) ... ok
test_large_search (test.test_re.ReTests) ... ok
test_large_subn (test.test_re.ReTests) ... ok
test_locale_caching (test.test_re.ReTests) ... skipped 'test needs en_US.iso88591 locale'
test_locale_flag (test.test_re.ReTests) ... FAIL
test_lookahead (test.test_re.ReTests) ... ok
test_lookbehind (test.test_re.ReTests) ... ok
test_match_getitem (test.test_re.ReTests) ... ok
test_match_repr (test.test_re.ReTests) ... ok
test_misc_errors (test.test_re.ReTests) ... ok
test_multiple_repeat (test.test_re.ReTests) ... ok
test_not_literal (test.test_re.ReTests) ... ok
test_nothing_to_repeat (test.test_re.ReTests) ... ok
test_other_escapes (test.test_re.ReTests) ... ok
test_pattern_compare (test.test_re.ReTests) ... ok
test_pattern_compare_bytes (test.test_re.ReTests) ... ok
test_pickling (test.test_re.ReTests) ... ok
test_qualified_re_split (test.test_re.ReTests) ... ok
test_qualified_re_sub (test.test_re.ReTests) ... ok
test_re_escape (test.test_re.ReTests) ... ok
test_re_escape_byte (test.test_re.ReTests) ... ok
test_re_escape_non_ascii (test.test_re.ReTests) ... ok
test_re_escape_non_ascii_bytes (test.test_re.ReTests) ... ok
test_re_findall (test.test_re.ReTests) ... ok
test_re_fullmatch (test.test_re.ReTests) ... ok
test_re_groupref (test.test_re.ReTests) ... ok
test_re_groupref_exists (test.test_re.ReTests) ... ok
test_re_groupref_overflow (test.test_re.ReTests) ... ok
test_re_match (test.test_re.ReTests) ... ok
test_re_split (test.test_re.ReTests) ... ok
test_re_subn (test.test_re.ReTests) ... ok
test_repeat_minmax (test.test_re.ReTests) ... ok
test_repeat_minmax_overflow (test.test_re.ReTests) ... ok
test_repeat_minmax_overflow_maxrepeat (test.test_re.ReTests) ... ok
test_scanner (test.test_re.ReTests) ... ok
test_scoped_flags (test.test_re.ReTests) ... ok
test_search_coverage (test.test_re.ReTests) ... ok
test_search_dot_unicode (test.test_re.ReTests) ... ok
test_search_star_plus (test.test_re.ReTests) ... ok
test_special_escapes (test.test_re.ReTests) ... ok
test_sre_byte_class_literals (test.test_re.ReTests) ... ok
test_sre_byte_literals (test.test_re.ReTests) ... ok
test_sre_character_class_literals (test.test_re.ReTests) ... ok
test_sre_character_literals (test.test_re.ReTests) ... ok
test_stack_overflow (test.test_re.ReTests) ... ok
test_string_boundaries (test.test_re.ReTests) ... ok
test_sub_template_numeric_escape (test.test_re.ReTests) ... ok
test_symbolic_groups (test.test_re.ReTests) ... ok
test_symbolic_refs (test.test_re.ReTests) ... ok
test_unlimited_zero_width_repeat (test.test_re.ReTests) ... ok
test_weakref (test.test_re.ReTests) ... ok

======================================================================
FAIL: test_locale_flag (test.test_re.ReTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/bigj/Jaysinh/cpython_git/cpython/Lib/test/test_re.py", line 1422, in test_locale_flag
    self.assertTrue(pat.match(bletter))
AssertionError: None is not true

----------------------------------------------------------------------
Ran 120 tests in 2.079s

FAILED (failures=1, skipped=1)
test test_re failed
test_re failed

1 test failed:
    test_re

Total duration: 2 sec
Tests result: FAILURE

Local value:
$>locale
LANG=en_IN
LANGUAGE=en_IN:en
LC_CTYPE="en_IN"
LC_NUMERIC="en_IN"
LC_TIME="en_IN"
LC_COLLATE="en_IN"
LC_MONETARY="en_IN"
LC_MESSAGES="en_IN"
LC_PAPER="en_IN"
LC_NAME="en_IN"
LC_ADDRESS="en_IN"
LC_TELEPHONE="en_IN"
LC_MEASUREMENT="en_IN"
LC_IDENTIFICATION="en_IN"
LC_ALL=

Operating system: Ubuntu 16.04 LTS(64 bit)
msg287879 - (view) Author: Matthew Barnett (mrabarnett) * Date: 2017-02-15 18:56
I'm just wondering whether the problem is just due to the locale's encoding being UTF-8. The locale support in re really only works with encodings that use 1 byte/character.
msg287880 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-02-15 19:03
Locale encoding is ISO8859-1. This test is skipped on non 8-bit locale.

This is a problem with tests, not with the re module. I don't have a solution.
msg287882 - (view) Author: Matthew Barnett (mrabarnett) * Date: 2017-02-15 19:25
The report says "==  encodings: locale=UTF-8, FS=utf-8".

It says that "test_locale_caching" was skipped, but also that "test_locale_flag" failed.
msg287893 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-02-15 22:57
Good point. The test used locale.getlocale() and it returned returned ('en_IN', 'ISO8859-1').

Following patch makes the test using locale.getpreferredencoding(False), the same encoding as was reported at the header of test report.
msg287894 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-02-15 22:58
> Following patch ...

Seriously? Not a GitHub pull request? ;-) (old habit?)
msg287933 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-02-16 12:05
> Seriously? Not a GitHub pull request? ;-) (old habit?)

I'm not experienced with git, and devguide still looks not ready.
msg288056 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-02-18 05:04
I have a few folks hitting this at the PyCon Pune sprints, so I'm going to apply Serhiy's patch :)
msg288065 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-02-18 08:54
Looking into this at the PyCon Pune sprints, the problem appears to be arising due to the following difference in behaviour when the unqualifed `en_IN` locale is set:

$ LANG=en_IN.UTF-8 python3 -c "import locale; print(locale.getlocale(locale.LC_CTYPE), locale.getpreferredencoding(False), sep='\n')"
('en_IN', 'UTF-8')
UTF-8

$ LANG=en_IN python3 -c "import locale; print(locale.getlocale(locale.LC_CTYPE), locale.getpreferredencoding(False), sep='\n')"
('en_IN', 'ISO8859-1')                                                                                                       
UTF-8

re.LOCALE is presumably picking up the "UTF-8" rather than the "ISO8859-1", and hence the test is failing.
msg288068 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-02-18 09:25
Yes, please push it Nick.
msg288069 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-02-18 09:31
New changeset ace5c0fdd9b962e6e886c29dbcea72c53f051dc4 by GitHub in branch 'master':
bpo-29571: Use correct locale encoding in test_re (#149)
https://github.com/python/cpython/commit/ace5c0fdd9b962e6e886c29dbcea72c53f051dc4
msg288101 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-02-19 04:33
New changeset 0683d6889bd4430599d22e12e201b8e9c45be5a2 by GitHub in branch '3.6':
[3.6] bpo-29571: Use correct locale encoding in test_re (#149) (#153)
https://github.com/python/cpython/commit/0683d6889bd4430599d22e12e201b8e9c45be5a2
msg288102 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-02-19 04:33
New changeset 760f596b6a4b5514afe35e521621f484aef35413 by GitHub in branch '3.5':
[3.5] bpo-29571: Use correct locale encoding in test_re (#149) (#154)
https://github.com/python/cpython/commit/760f596b6a4b5514afe35e521621f484aef35413
msg289054 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2017-03-06 01:34
This seems to have broken test_re on Windows, see https://ci.appveyor.com/project/python/cpython/build/3.7.0a0.1

I found this change to be the culprit via git bisect, unfortunately we didn't have any working CI on Windows (buildbots were otherwise broken) at the time this was merged.
msg289071 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2017-03-06 07:52
Yep, I think we should merge https://github.com/python/cpython/pull/422 and revert ncoghlan's change.
msg289072 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-06 07:54
I'm not sure this will help on Windows.
msg289073 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-06 07:55
And I don't understand why my fix doesn't work on Windows.
msg289074 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2017-03-06 07:55
But the test was never broken on windows.

On Sun, Mar 5, 2017, at 23:54, Serhiy Storchaka wrote:
> 
> Serhiy Storchaka added the comment:
> 
> I'm not sure this will help on Windows.
> 
> ----------
> 
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue29571>
> _______________________________________
msg289075 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2017-03-06 07:55
getpreferredencoding() takes a completely different path on windows
(returns a codepage) and isn't related to the C locale.
msg289076 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-03-06 07:58
I'm with Serhiy on this one: if the "re" module isn't using locale.getpreferredencoding(), then there's something odd going on.

It just sounds like the disconnect on Windows is the opposite of the one we hit on Linux without Benjamin's patch, perhaps due to the UTF-8 mode changes - it wouldn't surprise me to learn that the re module is still using mbcs there instead of utf-8.
msg289077 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2017-03-06 08:01
I don't see what's odd about it. re.LOCALE uses the C locale, which one
obtains from locale.getlocale(). getpreferredencoding() is not
documented to have anything to do with the C locale, and indeed on
Windows it may be completely different.
msg289118 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-03-06 15:52
Thanks for the explanation - given that, I agree that simply reverting the attempted test-based fix and instead relying on the issue 20087 updates is the way to go.
msg290268 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2017-03-24 22:42
New changeset 6a4b04cd337347d074ae0140fb13dca5bd4b11ef by Benjamin Peterson in branch '3.6':
Revert "bpo-29571: Use correct locale encoding in test_re (#149)" (#554) (#555)
https://github.com/python/cpython/commit/6a4b04cd337347d074ae0140fb13dca5bd4b11ef
msg290269 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2017-03-24 22:42
New changeset 312f7dfb7c669fcfc43020951b7f8ff521200ad7 by Benjamin Peterson in branch '3.5':
Revert "bpo-29571: Use correct locale encoding in test_re (#149)" (#554) (#556)
https://github.com/python/cpython/commit/312f7dfb7c669fcfc43020951b7f8ff521200ad7
msg290272 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2017-03-24 22:42
New changeset 21a74312f2d1ddee71fade709af49d078085ec30 by Benjamin Peterson in branch 'master':
Revert "bpo-29571: Use correct locale encoding in test_re (#149)" (#554)
https://github.com/python/cpython/commit/21a74312f2d1ddee71fade709af49d078085ec30
History
Date User Action Args
2017-07-12 21:03:59Naman-Bhallasetnosy: + Naman-Bhalla
2017-07-12 20:07:31Naman-Bhallasetpull_requests: + pull_request2751
2017-04-01 05:49:51serhiy.storchakasetpull_requests: - pull_request1096
2017-03-31 16:36:37dstufftsetpull_requests: + pull_request1096
2017-03-24 22:42:25benjamin.petersonsetmessages: + msg290272
2017-03-24 22:42:10benjamin.petersonsetmessages: + msg290269
2017-03-24 22:42:04benjamin.petersonsetmessages: + msg290268
2017-03-08 06:51:44benjamin.petersonsetpull_requests: + pull_request457
2017-03-08 06:51:41benjamin.petersonsetpull_requests: + pull_request456
2017-03-08 06:07:19benjamin.petersonsetpull_requests: + pull_request455
2017-03-06 15:52:28ncoghlansetmessages: + msg289118
2017-03-06 09:41:12serhiy.storchakasetdependencies: + Mismatch between glibc and X11 locale.alias
2017-03-06 08:01:39benjamin.petersonsetmessages: + msg289077
2017-03-06 07:58:37ncoghlansetmessages: + msg289076
2017-03-06 07:55:55benjamin.petersonsetmessages: + msg289075
2017-03-06 07:55:21benjamin.petersonsetmessages: + msg289074
2017-03-06 07:55:14serhiy.storchakasetmessages: + msg289073
2017-03-06 07:54:36serhiy.storchakasetmessages: + msg289072
2017-03-06 07:52:43benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg289071
2017-03-06 06:09:41serhiy.storchakasetstatus: closed -> open

nosy: + steve.dower, paul.moore, tim.golden
components: + Windows
resolution: fixed ->
stage: resolved ->
2017-03-06 01:34:31zach.waresetnosy: + zach.ware
messages: + msg289054
2017-03-03 07:51:48benjamin.petersonsetpull_requests: + pull_request352
2017-02-19 04:35:03ncoghlansetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2017-02-19 04:33:52ncoghlansetmessages: + msg288102
2017-02-19 04:33:37ncoghlansetmessages: + msg288101
2017-02-18 10:45:14ncoghlansetpull_requests: + pull_request118
2017-02-18 10:44:50ncoghlansetpull_requests: + pull_request117
2017-02-18 09:31:25ncoghlansetmessages: + msg288069
2017-02-18 09:25:58serhiy.storchakasetmessages: + msg288068
2017-02-18 09:19:08ncoghlansetpull_requests: + pull_request112
2017-02-18 08:54:00ncoghlansetmessages: + msg288065
2017-02-18 05:04:15ncoghlansetassignee: serhiy.storchaka -> ncoghlan

messages: + msg288056
nosy: + ncoghlan
2017-02-16 12:05:04serhiy.storchakasetmessages: + msg287933
2017-02-15 22:58:46hayposetmessages: + msg287894
2017-02-15 22:57:27serhiy.storchakasetfiles: + test_re_locale_flag.patch
keywords: + patch
messages: + msg287893

stage: patch review
2017-02-15 19:25:47mrabarnettsetmessages: + msg287882
2017-02-15 19:03:27serhiy.storchakasetmessages: + msg287880
components: + Tests
versions: + Python 3.5, Python 3.6
2017-02-15 18:56:16mrabarnettsetmessages: + msg287879
2017-02-15 18:02:04serhiy.storchakasetassignee: serhiy.storchaka

type: behavior
nosy: + serhiy.storchaka
2017-02-15 17:41:28hayposetnosy: + haypo
2017-02-15 17:29:11jaysinh.shuklacreate