This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author David MacIver
Recipients David MacIver
Date 2017-08-13.13:43:46
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1502631827.21.0.810598108454.issue31193@psf.upfronthosting.co.za>
In-reply-to
Content
chr(304).lower() is a two character string - a lower case i followed by a combining chr(775) ('COMBINING DOT ABOVE').

The re module seems not to understand the combining character and a regex compiled with IGNORECASE will erroneously match a single lower case i without the required combining character. The attached file demonstrates this. I've tested this on Python 3.6.1 with my locale as ('en_GB', 'UTF-8') (I don't know whether that matters for reproducing this, but I know it can affect how lower/upper work so am including it for the sake of completeness).

The problem does not reproduce on Python 2.7.13 because on that case chr(304).lower() is 'i' without the combining character, so it fails earlier.

This is presumably related to #12728, but as that is closed as fixed while this still reproduces I don't believe it's a duplicate.
History
Date User Action Args
2017-08-13 13:43:47David MacIversetrecipients: + David MacIver
2017-08-13 13:43:47David MacIversetmessageid: <1502631827.21.0.810598108454.issue31193@psf.upfronthosting.co.za>
2017-08-13 13:43:47David MacIverlinkissue31193 messages
2017-08-13 13:43:47David MacIvercreate