New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change time.strptime() to make it work with Unicode chars #49489
Comments
On Py3 strptime("2009", "%Y") fails:
>>> strptime("2009", "%Y")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.0/_strptime.py", line 454, in _strptime_time
return _strptime(data_string, format)[0]
File "/usr/local/lib/python3.0/_strptime.py", line 325, in _strptime
(data_string, format))
ValueError: time data '2009' does not match format '%Y'
but non-ascii numbers are supported elsewhere:
>>> int("2009")
2009
>>> re.match("^\d{4}$", "2009").group()
'2009'
The problem seems to be at the line 265 of _strptime.py:
return re_compile(self.pattern(format), IGNORECASE | ASCII)
The ASCII flag prevent the regex to work properly with '2009':
>>> re.match("^\d{4}$", "2009", re.ASCII)
>>> I tried to remove the ASCII flag and it worked fine. On Py2.x the problem is the same:
>>> strptime(u"2009", "%Y")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.5/_strptime.py", line 330, in strptime
(data_string, format))
ValueError>>>
>>> int(u"2009")
2009
>>> re.match("^\d{4}$", u"2009")
Here there's probably to add the re.UNICODE flag at the line 265 (untested):
return re_compile(self.pattern(format), IGNORECASE | UNICODE)
in order to make it work:
>>> re.match("^\d{4}$", u"2009", re.U).group()
u'\uff12\uff10\uff10\uff19' |
This patch comes from bpo-5240. I think testcase is needed. I'll try if |
Hmm, this fails on python2 too. Maybe re.ASCII is added for backward |
re.ASCII was added to many stdlib modules because I wanted to minimize If it is desireable for strptime() and friends to match unicode digits (py3k doesn't have to be 100% compatible with python2 :-)) |
I think Py3 with re.ASCII is the same as Py2 without re.UNICODE (and Py3 It's probably a good idea to have a coherent behavior between Py2 and |
Le vendredi 13 février 2009 à 14:44 +0000, Ezio Melotti a écrit :
Removing re.ASCII in py3k is a no-brainer, because unicode is how |
I meant from the line 265 of _strptime.py, not from Python :P |
That's what I understood. |
Sorry, I misunderstood the meaning of "no-brainer". If we add re.UNICODE on Py2, strptime should work fine with unicode I don't think that adding re.UNICODE will break any existing code, but Also note that encoded strings should be a problem only if they have to I'll try to add re.UNICODE and see what happens. |
I added test. But this requires bpo-5249 fix to be passed on windows. (I used "\u3000" instead of "\xa0" because "\xa0" cannot be decoded on |
I'd say the latter, since str and unicode are often interchangeable in |
This issue seems to be fixed on py3k by r70755. (bpo-5236) |
As Hirokazu pointed out, this was fixed. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: