New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
re.IGNORECASE not Unicode-ready #47083
Comments
re cannot ignore case of special latin characters: Python 3.0a5 (py3k:62932M, May 9 2008, 16:23:11) [MSC v.1500 32 bit
(Intel)] on win32
>>> 'Á'.lower() == 'á' and 'á'.upper() == 'Á'
True
>>> import re
>>> rx = re.compile('Á', re.IGNORECASE)
>>> rx.match('á') # should match but won't
>>> rx.match('Á') # will match
<_sre.SRE_Match object at 0x014B08A8>
>>> rx = re.compile('á', re.IGNORECASE)
>>> rx.match('Á') # should match but won't
>>> rx.match('á') # will match
<_sre.SRE_Match object at 0x014B08A8> |
Try adding re.LOCALE to the flags. I'm not sure why that is needed but I still think this is a legitimate bug though. |
I have the same error with the re.LOCALE flag... [humitos] [~]$ python3.0
Python 3.0a5+ (py3k:63855, Jun 1 2008, 13:05:09)
[GCC 4.1.3 20080114 (prerelease) (Debian 4.1.2-19)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> rx = re.compile('á', re.LOCALE | re.IGNORECASE)
>>> rx.match('Á')
>>> rx.match('á')
<_sre.SRE_Match object at 0x2b955e204d30>
>>> rx = re.compile('Á', re.IGNORECASE | re.LOCALE)
>>> rx.match('Á')
<_sre.SRE_Match object at 0x2b955e204e00>
>>> rx.match('á')
>>> 'Á'.lower() == 'á' and 'á'.upper() == 'Á'
True
>>> |
Same here, re.LOCALE doesn't circumvent the problem. |
Uh, actually, it works if you specify re.UNICODE. If you don't, the >>> pat = re.compile('Á', re.IGNORECASE | re.UNICODE)
>>> pat.match('á')
<_sre.SRE_Match object at 0xb7c66c28>
>>> pat.match('Á')
<_sre.SRE_Match object at 0xb7c66cd0> I wonder if re.UNICODE shouldn't be the default in Py3k, at least when |
Sounds like re.UNICODE should be on by default when the pattern is a str Also (per mailing list discussion) we should probably only allow Finally, is there a use case of re.LOCALE any more? I'm thinking not. |
Le samedi 28 juin 2008 à 22:20 +0000, Guido van Rossum a écrit :
It's used for locale-specific case matching in the non-unicode case. But 'C'
>>> re.match('À'.encode('latin1'), 'à'.encode('latin1'), re.IGNORECASE)
>>> re.match('À'.encode('latin1'), 'à'.encode('latin1'), re.IGNORECASE |re.LOCALE)
>>> locale.setlocale(locale.LC_CTYPE, 'fr_FR.ISO-8859-1')
'fr_FR.ISO-8859-1'
>>> re.match('À'.encode('latin1'), 'à'.encode('latin1'), re.IGNORECASE)
>>> re.match('À'.encode('latin1'), 'à'.encode('latin1'), re.IGNORECASE | re.LOCALE)
<_sre.SRE_Match object at 0xb7b9ac28> |
Here is a preliminary patch which doesn't remove re.LOCALE, but adds It also includes the patch for bpo-3231 ("re.compile fails with some bytes |
This new patch also introduces re.ASCII as discussed on the mailing-list. |
Improved patch which also detects incompatibilities for "(?u)". |
This new patch adds re.ASCII in all sensitive places I could find in the Also, I didn't get an answer to the following question on the ML: should |
Final patch adding the (?a) inline flag (equivalent to re.ASCII). Please |
Are all those re.ASCII flags mandatory, or are they here just for |
Le lundi 28 juillet 2008 à 20:41 +0000, Amaury Forgeot d'Arc a écrit :
For theoretical correctness. I just don't want to analyze each case |
If nobody (except Amaury :-)) has anything to say about the current |
Let's make sure the release manager is OK with this. |
Barry? |
I haven't looked at the specific patch, but based on the description of |
Make sure of course that the documentation is updated and a NEWS file |
Fixed in r65860. Someone should check the docs though (at least try to |
On 2008-08-19, Antoine Pitrou wrote:
I've revised the ASCII and LOCALE-related texts in re.rst in r65903. |
On 2008-08-19, Antoine Pitrou wrote:
And two more (tiny) fixes in r65904; that's my lot:-) |
Thanks a lot Mark! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: