This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author serhiy.storchaka
Recipients ezio.melotti, mrabarnett, pitrou, serhiy.storchaka
Date 2014-09-14.16:23:23
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1410711803.67.0.827066928752.issue22410@psf.upfronthosting.co.za>
In-reply-to
Content
Locale-specific case-insensitive regular expression matching works only when the pattern was compiled on the same locale as used for matching. Due to caching this can cause unexpected result.

Attached script demonstrates this (it requires two locales: ru_RU.koi8-r and ru_RU.cp1251). The output is:

locale ru_RU.koi8-r
  b'1\xa3' ('1ё') matches b'1\xb3' ('1Ё')
  b'1\xa3' ('1ё') doesn't match b'1\xbc' ('1╪')
locale ru_RU.cp1251
  b'1\xa3' ('1Ј') doesn't match b'1\xb3' ('1і')
  b'1\xa3' ('1Ј') matches b'1\xbc' ('1ј')
locale ru_RU.cp1251
  b'2\xa3' ('2Ј') doesn't match b'2\xb3' ('2і')
  b'2\xa3' ('2Ј') matches b'2\xbc' ('2ј')
locale ru_RU.koi8-r
  b'2\xa3' ('2ё') doesn't match b'2\xb3' ('2Ё')
  b'2\xa3' ('2ё') matches b'2\xbc' ('2╪')

b'\xa3' matches b'\xb3' on KOI8-R locale if the pattern was compiled on KOI8-R locale and matches b'\xb3' if the pattern was compiled on CP1251 locale.

I see three possible ways to solve this issue:

1. Avoid caching of locale-depending case-insensitive patterns. This definitely will decrease performance of the use of locale-depending case-insensitive regexps (if user don't use own caching) and may be slightly decrease performance of the use of other regexps.

2. Clear precompiled regexps cache on every locale change. This can look simpler, but is vulnerable to locale changes from extensions.

3. Do not lowercase characters at compile time (in locale-depending case-insensitive patterns). This needs to introduce new opcode for case-insensitivity matching or at least rewriting implementation of current opcodes (less efficient). On other way, this is more correct implementation than current one. The problem is that this is incompatible with those distributions which updates only Python library but not statically linked binary (e.g. Vim with Python support). May be there are some workarounds.
History
Date User Action Args
2014-09-14 16:23:23serhiy.storchakasetrecipients: + serhiy.storchaka, pitrou, ezio.melotti, mrabarnett
2014-09-14 16:23:23serhiy.storchakasetmessageid: <1410711803.67.0.827066928752.issue22410@psf.upfronthosting.co.za>
2014-09-14 16:23:23serhiy.storchakalinkissue22410 messages
2014-09-14 16:23:23serhiy.storchakacreate