Issue1039270
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2004-10-03 03:44 by quiver, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
error.txt | quiver, 2004-10-03 03:44 | |||
escape_re_strptime.diff | brett.cannon, 2004-10-03 23:16 | Escape all time strings before generating regex | ||
escape_re_strptime23.diff | quiver, 2004-10-05 18:56 | patch against Python 2.3 branch |
Messages (6) | |||
---|---|---|---|
msg22591 - (view) | Author: George Yoshida (quiver) | Date: 2004-10-03 03:44 | |
Following tests fail on Win 2K(Japanese locale): # test_strptime.py test_compile (__main__.TimeRETests) ... FAIL test_bad_timezone (__main__.StrptimeTests) ... ERROR test_timezone (__main__.StrptimeTests) ... ERROR test_day_of_week_calculation (__main__.CalculationTests) ... ERROR test_gregorian_calculation (__main__.CalculationTests) ... ERROR test_julian_calculation (__main__.CalculationTests) ... ERROR # test_time.py test_strptime (test.test_time.TimeTestCase) ... FAIL === They all stem from time zone tests and can be divided into two groups: FAIL of test_compile is basically same as #bug 883604. http://www.python.org/sf/883604 Local time values include regular expression's metacharacters, but they are not escaped. The rest is caused because strptime can't parse the values of strftime. >>> import time >>> time.tzname ('\x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\x9e)', '\x93 \x8c\x8b\x9e (\x95W\x8f\x80\x8e\x9e)') >>> time.strptime(time.strftime('%Z', time.gmtime())) Traceback (most recent call last): File "<pyshell#1>", line 1, in -toplevel- time.strptime(time.strftime('%Z', time.gmtime())) File "C:\Python24\lib\_strptime.py", line 291, in strptime raise ValueError("time data did not match format: data=%s fmt=%s" % ValueError: time data did not match format: data=q¬ (–B) fmt=%a %b %d %H:%M:%S %Y The output of running test_time.py and test_strptime.py is attached. |
|||
msg22592 - (view) | Author: George Yoshida (quiver) | Date: 2004-10-03 15:05 | |
Logged In: YES user_id=671362 I've found another bug. Line 167 & 169 of Lib/_strptime.py contains the expression: time.tzname[0].lower() I guess this is intended to normalize alphabets, but for multibyte characters this is really dangerous. >>> import time >>> time.tzname[0] '\x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\x9e)' >>> _.lower() '\x93\x8c\x8b\x9e (\x95w\x8f\x80\x8e\x9e)' \x95W and \x95w is not the same character. |
|||
msg22593 - (view) | Author: Brett Cannon (brett.cannon) * | Date: 2004-10-03 23:16 | |
Logged In: YES user_id=357491 The .lower() call is intended to normalize since capitalization is not standard across OSs. But if it is a Unicode string it should be fine. And even if it isn't, it is all lowercased for comparison anyway, so as long as it is consistent, shouldn't it still work? As for your example of strptime not being able to parse, you have a bug in it; you forgot the format string. It should have been ``time.strptime(time.strftime('%Z'), '%Z')``. Give that a run and let me know what the output is. As for this whole multi-byte issue, is it all being returned as Unicode strings, or is it just a regular string? In other words, what is ``type(time.tzname[0])`` spitting out? And what character encoding is all of this in (i.e., what should I pass to unicode so as to not have it raise UnicodeDecodeError)? And finally, for the regex metacharacter stuff, why the hell are there parentheses in a timezone?!? Whoever decided that was good did it just to upset me. That does need to be fixed. Apply the patch I just uploaded and let me know if it at least deals with that problem. Have I mentioned I hate timezones? In case I haven't, I do. Thanks for catching this all, though, George. |
|||
msg22594 - (view) | Author: George Yoshida (quiver) | Date: 2004-10-05 18:56 | |
Logged In: YES user_id=671362 bcannon write: > The .lower() call is intended to normalize since capitalization > is not standard across OSs. But if it is a Unicode string it > should be fine. And even if it isn't, it is all lowercased for > comparison anyway, so as long as it is consistent, shouldn't it > still work? Hmm. > As for your example of strptime not being able to parse, you have > a bug in it; you forgot the format string. It should have been > ``time.strptime(time.strftime('%Z'), '%Z')``. Give that a run > and let me know what the output is. Yeah, it's my fault. I forget to specify a format. Even so, strptime couldn't parse timezone. > As for this whole multi-byte issue, is it all being returned as > Unicod e strings, or is it just a regular string? In other > words, what is ``type(time.tzname[0])`` spitting out? And what > character encoding is all of this in (i.e., what should I pass > to unicode so as to not have it raise UnicodeDecodeError)? It returns strings(not a unicode), and the encoding is cp932. This is a default encoding of Japanese Windows. >>> unicode(time.tzname[0], 'cp932') u'\u6771\u4eac (\u6a19\u6e96\u6642)' > And finally, for the regex metacharacter stuff, why the hell ar > e there parentheses in a timezone?!? Whoever decided that wa > s good did it just to upset me. Ask M$ Japan :-; I don't regard 'Tokyo (standard time)' as an acceptable representation for time zone at all, but this is what Windows returns as a time zone on my box. > That does need to be fixed. Apply the patch I just uploaded and let > me know if it at least deals with that problem. With your patch, all tests succeed without any Error or Fail, and strftime <-> strptime conversions work well. This is a backport candidate, so I created a new patch against Python 2.3 with listcomps instead of genexprs. But there is one problem left. On IDLE, strptime still can't parse. I haven't looked into it in details, but probably patch #590913 has something to do with it. This patch sets locale at IDLE's start up time and this can affect behaviors of string-related functions and constants. [PEP 263 support in IDLE] http://www.python.org/sf/590913 # patch applied >>> time.strptime(time.strptime('%Z'), '%Z') Traceback (most recent call last): File "<pyshell#93>", line 1, in -toplevel- time.strptime(time.strptime('%Z'), '%Z') File "C:\Python24\lib\_strptime.py", line 291, in strptime if not found: ValueError: time data did not match format: data=%Z fmt=% a %b %d %H:%M:%S %Y >>> import locale >>> locale.getlocale() ['Japanese_Japan', '932'] # culprit? > Have I mentioned I hate timezones? In case I haven't, I do. I agree with you one hundred percent. --George |
|||
msg22595 - (view) | Author: George Yoshida (quiver) | Date: 2004-10-05 19:12 | |
Logged In: YES user_id=671362 Correct my previous post. There's nothing wrong with strptime on IDLE. >>> import time >>> time.strptime(time.strftime('%Z'), '%Z') (1900, 1, 1, 0, 0, 0, 0, 1, 0) Please close this bug and apply the patches. Thanks Brett! |
|||
msg22596 - (view) | Author: Brett Cannon (brett.cannon) * | Date: 2004-10-06 02:17 | |
Logged In: YES user_id=357491 rev. 1.33 on HEAD and rev. 1.23.4.5 on 2.3 have the fix. Thanks for the help, George. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:07 | admin | set | github: 40980 |
2004-10-03 03:44:49 | quiver | create |