classification
Title: strptime fails in non-UTF locale
Type: behavior Stage: test needed
Components: Library (Lib), Unicode Versions: Python 3.2, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: BreamoreBoy, ezio.melotti, haypo, lemburg, pitrou, python-dev, terry.reedy
Priority: critical Keywords: patch

Created on 2009-05-02 15:17 by pitrou, last changed 2011-12-09 19:19 by haypo. This issue is now closed.

Files
File name Uploaded Description Edit
tzname_encoding.patch haypo, 2011-12-08 12:45 review
Messages (10)
msg86953 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-05-02 15:17
time.strptime() fails with non-UTF8 locales, *even when the input is
totally ASCII*.

>>> locale.setlocale(locale.LC_TIME, "fr_FR.ISO8859-15")
'fr_FR.ISO8859-15'
>>> time.strptime("2009-01-01", "%Y-%m-%d")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/antoine/py3k/__svn__/Lib/_strptime.py", line 461, in
_strptime_time
    return _strptime(data_string, format)[0]
  File "/home/antoine/py3k/__svn__/Lib/_strptime.py", line 307, in _strptime
    _TimeRE_cache = TimeRE()
  File "/home/antoine/py3k/__svn__/Lib/_strptime.py", line 188, in __init__
    self.locale_time = LocaleTime()
  File "/home/antoine/py3k/__svn__/Lib/_strptime.py", line 72, in __init__
    self.__calc_month()
  File "/home/antoine/py3k/__svn__/Lib/_strptime.py", line 98, in
__calc_month
    a_month = [calendar.month_abbr[i].lower() for i in range(13)]
  File "/home/antoine/py3k/__svn__/Lib/_strptime.py", line 98, in <listcomp>
    a_month = [calendar.month_abbr[i].lower() for i in range(13)]
  File "/home/antoine/py3k/__svn__/Lib/calendar.py", line 60, in __getitem__
    return funcs(self.format)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-3:
invalid data
msg97808 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-01-15 12:57
The reason for this is that the strftime() C lib API is used to build localized month names. With your setting, you'll get French Latin-1 month names and those cannot be coerced to UTF-8 due to the accented characters in them.

This works in Python 2.x since PyUnicode_FromString() et al. convert Latin-1 strings to Unicode.

Apparently, this was changed in Python 3.x without looking at the header file or looking at the Python 2.x implementation which mandate Latin-1 as input encoding. Even the Python 3.x header still says that PyUnicode_FromString() will convert from Latin-1 to Unicode.

No idea why time.strptime() even bothers with these month names, though, since neither the format string nor the string being parsed contains literal month names.
msg98105 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-01-21 11:30
I'm unable to reproduce the error. I tried locales fr_FR.iso88591 and fr_FR.iso885915@euro (fr_FR@euro), but the example works correctly. Should the terminal use the specified locale? My terminal uses fr_FR.utF8 locale. Should set_locale() be called before loaded time and/or calendar module?
msg112997 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-08-05 16:33
I can't reproduce this on Windows Vista with 3.1 or 3.2 despite trying several Western & Eastern European, Chinese & Japanese locales.
msg113736 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-08-13 01:32
> I can't reproduce this on Windows ...

This issue is (was?) maybe specific to Linux.
msg138210 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-06-12 18:33
Still a problem in 3.2.1 or 3.3?
msg138722 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-06-20 14:19
I close the issue because I am unable to reproduce it.
msg149024 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-12-08 12:45
Oh! I think that I understood the problem: if HAVE_WCSFTIME is not defined, timemodule.c uses strftime(), instead of wcsftime(), encode input format and decode the format. It uses UTF-8 to encode/decode, whereas the right encoding is the locale encoding. Attached patch should fix this issue.

@Antoine: Do you have any idea why HAVE_WCSFTIME was not defined?

wcsftime() is defined in <wchar.h> on Ubuntu. In configure, it is tested using AC_CHECK_FUNCS(wcsftime)
msg149038 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-12-08 14:42
Well, it seems defined here:

$ grep HAVE_WCSFTIME pyconfig.h
1071:#define HAVE_WCSFTIME 1

> Attached patch should fix this issue.

I'm sorry, I can't test the patch, because my Linux distro (Mageia) doesn't have the "fr_FR.ISO8859-15" locale anymore :-(
msg149115 - (view) Author: Roundup Robot (python-dev) Date: 2011-12-09 19:19
New changeset 8620e6901e58 by Victor Stinner in branch '3.2':
Issue #5905: time.strftime() is now using the locale encoding, instead of
http://hg.python.org/cpython/rev/8620e6901e58

New changeset bee7694988a4 by Victor Stinner in branch 'default':
(Merge 3.2) Issue #5905: time.strftime() is now using the locale encoding,
http://hg.python.org/cpython/rev/bee7694988a4
History
Date User Action Args
2011-12-09 19:19:41hayposetstatus: open -> closed
resolution: fixed
2011-12-09 19:19:18python-devsetnosy: + python-dev
messages: + msg149115
2011-12-08 14:42:16pitrousetmessages: + msg149038
2011-12-08 12:45:15hayposetstatus: closed -> open
files: + tzname_encoding.patch
messages: + msg149024

components: + Unicode
keywords: + patch
resolution: not a bug -> (no value)
2011-06-20 14:19:28hayposetstatus: open -> closed
resolution: not a bug
messages: + msg138722
2011-06-12 18:33:45terry.reedysetnosy: + terry.reedy

messages: + msg138210
versions: + Python 3.2, Python 3.3, - Python 3.1
2010-08-13 01:32:20hayposetmessages: + msg113736
2010-08-05 16:33:11BreamoreBoysetnosy: + BreamoreBoy
messages: + msg112997
2010-01-21 11:30:27hayposetnosy: + haypo
messages: + msg98105
2010-01-15 12:57:21lemburgsetmessages: + msg97808
2010-01-15 12:15:17ezio.melottisetnosy: + lemburg, ezio.melotti

versions: - Python 3.0
2009-05-02 15:17:20pitroucreate