classification
Title: Unicode problem with TZ
Type: Stage:
Components: Windows Versions: Python 3.0
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: loewis Nosy List: amaury.forgeotdarc, loewis, theller
Priority: normal Keywords:

Created on 2007-08-28 06:05 by theller, last changed 2007-08-30 14:40 by loewis. This issue is now closed.

Messages (5)
msg55351 - (view) Author: Thomas Heller (theller) * (Python committer) Date: 2007-08-28 06:05
In my german version of winXP SP2, python3 cannot import the time module:

c:\svn\py3k\PCbuild>python_d
Python 3.0x (py3k:57600M, Aug 28 2007, 07:58:23) [MSC v.1310 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 9-11:
invalid data
[36719 refs]
>>> ^Z

The problem is that the libc '_tzname' variable contains umlauts.  For
comparison, here is what Python2.5 does:

c:\svn\py3k\PCbuild>\python25\python
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> time.tzname
('Westeurop\xe4ische Normalzeit', 'Westeurop\xe4ische Normalzeit')
>>>
msg55352 - (view) Author: Thomas Heller (theller) * (Python committer) Date: 2007-08-28 06:06
BTW, setting the environment variable TZ to, say, 'GMT' makes the
problem go away.
msg55426 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2007-08-29 17:06
I have a patch for this, which uses MBCS conversion instead of relying
on the default utf-8 (here and several other places). Tested on a French
version of winXP.

Which leads me to the question: should Windows use MBCS encoding by
default when converting between char* and PyUnicode, and not utf-8?
There are some other tracker items which would benefit from this.

After all, C strings can only come from 1) python code, 2) system i/o
and messages, and 3) constants in source code.
IMO, 1) can use the representation it prefers, 2) would clearly lead to
less error if handled as MBCS and 3) only uses 7bit ascii.
There is very little need for utf-8 here.
msg55427 - (view) Author: Thomas Heller (theller) * (Python committer) Date: 2007-08-29 18:01
IMO the very best would be to avoid as many conversions as possible by
using the wide apis on Windows.  Not for _tzname maybe, but for env
vars, sys.argv, sys.path, and so on.  Not that I would have time to work
on that...
msg55481 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-08-30 14:40
This is now fixed in r57720.

Using wide APIs would be possible through GetTimeZoneInformation,
however, then TZ won't be supported anymore (unless the CRT code to
parse TZ is duplicated).
History
Date User Action Args
2007-08-30 14:40:27loewissetstatus: open -> closed
nosy: + loewis
resolution: fixed
messages: + msg55481
2007-08-29 18:01:21thellersetmessages: + msg55427
2007-08-29 17:06:03amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg55426
2007-08-29 16:50:51loewissetassignee: loewis
2007-08-28 06:06:52thellersetmessages: + msg55352
2007-08-28 06:05:53thellercreate