classification
Title: test_re is failing when local is set for `en_IN`
Type: behavior Stage: resolved
Components: Regular Expressions, Tests Versions: Python 3.7, Python 3.6, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: ncoghlan Nosy List: ezio.melotti, haypo, jaysinh.shukla, mrabarnett, ncoghlan, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2017-02-15 17:29 by jaysinh.shukla, last changed 2017-02-19 04:35 by ncoghlan. This issue is now closed.

Files
File name Uploaded Description Edit
test_re_locale_flag.patch serhiy.storchaka, 2017-02-15 22:57 review
Pull Requests
URL Status Linked Edit
PR 149 merged ncoghlan, 2017-02-18 09:19
PR 153 merged ncoghlan, 2017-02-18 10:44
PR 154 merged ncoghlan, 2017-02-18 10:45
Messages (13)
msg287867 - (view) Author: Jaysinh shukla (jaysinh.shukla) * Date: 2017-02-15 17:29
Description:
    A test case is failing while running `./python -m test -v test_re`.

Traceback:
$>./python -m test -v test_re
== CPython 3.7.0a0 (default, Feb 15 2017, 22:28:32) [GCC 5.4.0 20160609]
==   Linux-4.4.0-62-generic-x86_64-with-debian-stretch-sid little-endian
==   hash algorithm: siphash24 64bit
==  cwd: /home/bigj/Jaysinh/cpython_git/cpython/build/test_python_613
==  encodings: locale=UTF-8, FS=utf-8
Testing with flags: sys.flags(debug=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, verbose=0, bytes_warning=0, quiet=0, hash_randomization=1, isolated=0)
Run tests sequentially
0:00:00 [1/1] test_re
test_re_benchmarks (test.test_re.ExternalTests)
re_tests benchmarks ... ok
test_re_tests (test.test_re.ExternalTests)
re_tests test suite ... ok
test_overlap_table (test.test_re.ImplementationTest) ... ok
test_bytes (test.test_re.PatternReprTests) ... ok
test_inline_flags (test.test_re.PatternReprTests) ... ok
test_locale (test.test_re.PatternReprTests) ... ok
test_long_pattern (test.test_re.PatternReprTests) ... ok
test_multiple_flags (test.test_re.PatternReprTests) ... ok
test_quotes (test.test_re.PatternReprTests) ... ok
test_single_flag (test.test_re.PatternReprTests) ... ok
test_unicode_flag (test.test_re.PatternReprTests) ... ok
test_unknown_flags (test.test_re.PatternReprTests) ... ok
test_without_flags (test.test_re.PatternReprTests) ... ok
test_anyall (test.test_re.ReTests) ... ok
test_ascii_and_unicode_flag (test.test_re.ReTests) ... ok
test_backref_group_name_in_exception (test.test_re.ReTests) ... ok
test_basic_re_sub (test.test_re.ReTests) ... ok
test_big_codesize (test.test_re.ReTests) ... ok
test_bigcharset (test.test_re.ReTests) ... ok
test_bug_113254 (test.test_re.ReTests) ... ok
test_bug_114660 (test.test_re.ReTests) ... ok
test_bug_117612 (test.test_re.ReTests) ... ok
test_bug_1661 (test.test_re.ReTests) ... ok
test_bug_16688 (test.test_re.ReTests) ... ok
test_bug_20998 (test.test_re.ReTests) ... ok
test_bug_2537 (test.test_re.ReTests) ... ok
test_bug_29444 (test.test_re.ReTests) ... ok
test_bug_3629 (test.test_re.ReTests) ... ok
test_bug_418626 (test.test_re.ReTests) ... ok
test_bug_448951 (test.test_re.ReTests) ... ok
test_bug_449000 (test.test_re.ReTests) ... ok
test_bug_449964 (test.test_re.ReTests) ... ok
test_bug_462270 (test.test_re.ReTests) ... ok
test_bug_527371 (test.test_re.ReTests) ... ok
test_bug_581080 (test.test_re.ReTests) ... ok
test_bug_612074 (test.test_re.ReTests) ... ok
test_bug_6509 (test.test_re.ReTests) ... ok
test_bug_6561 (test.test_re.ReTests) ... ok
test_bug_725106 (test.test_re.ReTests) ... ok
test_bug_725149 (test.test_re.ReTests) ... ok
test_bug_764548 (test.test_re.ReTests) ... ok
test_bug_817234 (test.test_re.ReTests) ... ok
test_bug_926075 (test.test_re.ReTests) ... ok
test_bug_931848 (test.test_re.ReTests) ... ok
test_bytes_str_mixing (test.test_re.ReTests) ... ok
test_category (test.test_re.ReTests) ... ok
test_character_set_errors (test.test_re.ReTests) ... ok
test_compile (test.test_re.ReTests) ... ok
test_constants (test.test_re.ReTests) ... ok
test_dealloc (test.test_re.ReTests) ... ok
test_debug_flag (test.test_re.ReTests) ... ok
test_dollar_matches_twice (test.test_re.ReTests)
$ matches the end of string, and just before the terminating ... ok
test_empty_array (test.test_re.ReTests) ... ok
test_enum (test.test_re.ReTests) ... ok
test_error (test.test_re.ReTests) ... ok
test_expand (test.test_re.ReTests) ... ok
test_finditer (test.test_re.ReTests) ... ok
test_flags (test.test_re.ReTests) ... ok
test_getattr (test.test_re.ReTests) ... ok
test_getlower (test.test_re.ReTests) ... ok
test_group (test.test_re.ReTests) ... ok
test_group_name_in_exception (test.test_re.ReTests) ... ok
test_groupdict (test.test_re.ReTests) ... ok
test_ignore_case (test.test_re.ReTests) ... ok
test_ignore_case_range (test.test_re.ReTests) ... ok
test_ignore_case_set (test.test_re.ReTests) ... ok
test_inline_flags (test.test_re.ReTests) ... ok
test_issue17998 (test.test_re.ReTests) ... ok
test_keep_buffer (test.test_re.ReTests) ... ok
test_keyword_parameters (test.test_re.ReTests) ... ok
test_large_search (test.test_re.ReTests) ... ok
test_large_subn (test.test_re.ReTests) ... ok
test_locale_caching (test.test_re.ReTests) ... skipped 'test needs en_US.iso88591 locale'
test_locale_flag (test.test_re.ReTests) ... FAIL
test_lookahead (test.test_re.ReTests) ... ok
test_lookbehind (test.test_re.ReTests) ... ok
test_match_getitem (test.test_re.ReTests) ... ok
test_match_repr (test.test_re.ReTests) ... ok
test_misc_errors (test.test_re.ReTests) ... ok
test_multiple_repeat (test.test_re.ReTests) ... ok
test_not_literal (test.test_re.ReTests) ... ok
test_nothing_to_repeat (test.test_re.ReTests) ... ok
test_other_escapes (test.test_re.ReTests) ... ok
test_pattern_compare (test.test_re.ReTests) ... ok
test_pattern_compare_bytes (test.test_re.ReTests) ... ok
test_pickling (test.test_re.ReTests) ... ok
test_qualified_re_split (test.test_re.ReTests) ... ok
test_qualified_re_sub (test.test_re.ReTests) ... ok
test_re_escape (test.test_re.ReTests) ... ok
test_re_escape_byte (test.test_re.ReTests) ... ok
test_re_escape_non_ascii (test.test_re.ReTests) ... ok
test_re_escape_non_ascii_bytes (test.test_re.ReTests) ... ok
test_re_findall (test.test_re.ReTests) ... ok
test_re_fullmatch (test.test_re.ReTests) ... ok
test_re_groupref (test.test_re.ReTests) ... ok
test_re_groupref_exists (test.test_re.ReTests) ... ok
test_re_groupref_overflow (test.test_re.ReTests) ... ok
test_re_match (test.test_re.ReTests) ... ok
test_re_split (test.test_re.ReTests) ... ok
test_re_subn (test.test_re.ReTests) ... ok
test_repeat_minmax (test.test_re.ReTests) ... ok
test_repeat_minmax_overflow (test.test_re.ReTests) ... ok
test_repeat_minmax_overflow_maxrepeat (test.test_re.ReTests) ... ok
test_scanner (test.test_re.ReTests) ... ok
test_scoped_flags (test.test_re.ReTests) ... ok
test_search_coverage (test.test_re.ReTests) ... ok
test_search_dot_unicode (test.test_re.ReTests) ... ok
test_search_star_plus (test.test_re.ReTests) ... ok
test_special_escapes (test.test_re.ReTests) ... ok
test_sre_byte_class_literals (test.test_re.ReTests) ... ok
test_sre_byte_literals (test.test_re.ReTests) ... ok
test_sre_character_class_literals (test.test_re.ReTests) ... ok
test_sre_character_literals (test.test_re.ReTests) ... ok
test_stack_overflow (test.test_re.ReTests) ... ok
test_string_boundaries (test.test_re.ReTests) ... ok
test_sub_template_numeric_escape (test.test_re.ReTests) ... ok
test_symbolic_groups (test.test_re.ReTests) ... ok
test_symbolic_refs (test.test_re.ReTests) ... ok
test_unlimited_zero_width_repeat (test.test_re.ReTests) ... ok
test_weakref (test.test_re.ReTests) ... ok

======================================================================
FAIL: test_locale_flag (test.test_re.ReTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/bigj/Jaysinh/cpython_git/cpython/Lib/test/test_re.py", line 1422, in test_locale_flag
    self.assertTrue(pat.match(bletter))
AssertionError: None is not true

----------------------------------------------------------------------
Ran 120 tests in 2.079s

FAILED (failures=1, skipped=1)
test test_re failed
test_re failed

1 test failed:
    test_re

Total duration: 2 sec
Tests result: FAILURE

Local value:
$>locale
LANG=en_IN
LANGUAGE=en_IN:en
LC_CTYPE="en_IN"
LC_NUMERIC="en_IN"
LC_TIME="en_IN"
LC_COLLATE="en_IN"
LC_MONETARY="en_IN"
LC_MESSAGES="en_IN"
LC_PAPER="en_IN"
LC_NAME="en_IN"
LC_ADDRESS="en_IN"
LC_TELEPHONE="en_IN"
LC_MEASUREMENT="en_IN"
LC_IDENTIFICATION="en_IN"
LC_ALL=

Operating system: Ubuntu 16.04 LTS(64 bit)
msg287879 - (view) Author: Matthew Barnett (mrabarnett) * Date: 2017-02-15 18:56
I'm just wondering whether the problem is just due to the locale's encoding being UTF-8. The locale support in re really only works with encodings that use 1 byte/character.
msg287880 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-02-15 19:03
Locale encoding is ISO8859-1. This test is skipped on non 8-bit locale.

This is a problem with tests, not with the re module. I don't have a solution.
msg287882 - (view) Author: Matthew Barnett (mrabarnett) * Date: 2017-02-15 19:25
The report says "==  encodings: locale=UTF-8, FS=utf-8".

It says that "test_locale_caching" was skipped, but also that "test_locale_flag" failed.
msg287893 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-02-15 22:57
Good point. The test used locale.getlocale() and it returned returned ('en_IN', 'ISO8859-1').

Following patch makes the test using locale.getpreferredencoding(False), the same encoding as was reported at the header of test report.
msg287894 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-02-15 22:58
> Following patch ...

Seriously? Not a GitHub pull request? ;-) (old habit?)
msg287933 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-02-16 12:05
> Seriously? Not a GitHub pull request? ;-) (old habit?)

I'm not experienced with git, and devguide still looks not ready.
msg288056 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-02-18 05:04
I have a few folks hitting this at the PyCon Pune sprints, so I'm going to apply Serhiy's patch :)
msg288065 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-02-18 08:54
Looking into this at the PyCon Pune sprints, the problem appears to be arising due to the following difference in behaviour when the unqualifed `en_IN` locale is set:

$ LANG=en_IN.UTF-8 python3 -c "import locale; print(locale.getlocale(locale.LC_CTYPE), locale.getpreferredencoding(False), sep='\n')"
('en_IN', 'UTF-8')
UTF-8

$ LANG=en_IN python3 -c "import locale; print(locale.getlocale(locale.LC_CTYPE), locale.getpreferredencoding(False), sep='\n')"
('en_IN', 'ISO8859-1')                                                                                                       
UTF-8

re.LOCALE is presumably picking up the "UTF-8" rather than the "ISO8859-1", and hence the test is failing.
msg288068 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-02-18 09:25
Yes, please push it Nick.
msg288069 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-02-18 09:31
New changeset ace5c0fdd9b962e6e886c29dbcea72c53f051dc4 by GitHub in branch 'master':
bpo-29571: Use correct locale encoding in test_re (#149)
https://github.com/python/cpython/commit/ace5c0fdd9b962e6e886c29dbcea72c53f051dc4
msg288101 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-02-19 04:33
New changeset 0683d6889bd4430599d22e12e201b8e9c45be5a2 by GitHub in branch '3.6':
[3.6] bpo-29571: Use correct locale encoding in test_re (#149) (#153)
https://github.com/python/cpython/commit/0683d6889bd4430599d22e12e201b8e9c45be5a2
msg288102 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-02-19 04:33
New changeset 760f596b6a4b5514afe35e521621f484aef35413 by GitHub in branch '3.5':
[3.5] bpo-29571: Use correct locale encoding in test_re (#149) (#154)
https://github.com/python/cpython/commit/760f596b6a4b5514afe35e521621f484aef35413
History
Date User Action Args
2017-02-19 04:35:03ncoghlansetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2017-02-19 04:33:52ncoghlansetmessages: + msg288102
2017-02-19 04:33:37ncoghlansetmessages: + msg288101
2017-02-18 10:45:14ncoghlansetpull_requests: + pull_request118
2017-02-18 10:44:50ncoghlansetpull_requests: + pull_request117
2017-02-18 09:31:25ncoghlansetmessages: + msg288069
2017-02-18 09:25:58serhiy.storchakasetmessages: + msg288068
2017-02-18 09:19:08ncoghlansetpull_requests: + pull_request112
2017-02-18 08:54:00ncoghlansetmessages: + msg288065
2017-02-18 05:04:15ncoghlansetassignee: serhiy.storchaka -> ncoghlan

messages: + msg288056
nosy: + ncoghlan
2017-02-16 12:05:04serhiy.storchakasetmessages: + msg287933
2017-02-15 22:58:46hayposetmessages: + msg287894
2017-02-15 22:57:27serhiy.storchakasetfiles: + test_re_locale_flag.patch
keywords: + patch
messages: + msg287893

stage: patch review
2017-02-15 19:25:47mrabarnettsetmessages: + msg287882
2017-02-15 19:03:27serhiy.storchakasetmessages: + msg287880
components: + Tests
versions: + Python 3.5, Python 3.6
2017-02-15 18:56:16mrabarnettsetmessages: + msg287879
2017-02-15 18:02:04serhiy.storchakasetassignee: serhiy.storchaka

type: behavior
nosy: + serhiy.storchaka
2017-02-15 17:41:28hayposetnosy: + haypo
2017-02-15 17:29:11jaysinh.shuklacreate