Message76255
It looks like your conjectures are right in both cases.
I tried adding a few lines to Modules/python.c to print out the argv
entries as byte strings, before they're passed to mbstowcs. Results
on OS X 10.5:
> 1. Somebody runs "a.py ภาษาไทย" in a Terminal.app window. Most likely,
> the terminal encoding is applied, which we should assume to be UTF-8
> (although it might be different on some systems).
Yes, it appears that the terminal encoding is applied, if I'm reading
the results right. Trying
./python.exe a.py é
with the terminal character encoding set to "Unicode (UTF-8)", Python
receives the third argument as bytes([195, 169]). With the terminal
encoding set to "Western (ISO Latin 1)" instead, Python receives
bytes([233]).
> 2. Somebody creates a file japanese_コンテンツ in the finder, then uses
> shell completion to pass this to a Python script. Here I expect that
> UTF-8 is used even if the terminal's encoding is not UTF-8.
Yes. Python seems to receive the same string regardless of terminal
encoding. (With the terminal encoding set to latin1, the tab-completed
filename looks like garbage within Terminal, of course.) |
|
Date |
User |
Action |
Args |
2008-11-22 18:41:21 | mark.dickinson | set | recipients:
+ mark.dickinson, loewis, amaury.forgeotdarc, MrJean1 |
2008-11-22 18:41:21 | mark.dickinson | set | messageid: <1227379281.41.0.0299352496239.issue4388@psf.upfronthosting.co.za> |
2008-11-22 18:41:20 | mark.dickinson | link | issue4388 messages |
2008-11-22 18:41:18 | mark.dickinson | create | |
|