This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author mark.dickinson
Recipients MrJean1, amaury.forgeotdarc, loewis, mark.dickinson
Date 2008-11-22.18:41:18
SpamBayes Score 3.2016685e-09
Marked as misclassified No
Message-id <1227379281.41.0.0299352496239.issue4388@psf.upfronthosting.co.za>
In-reply-to
Content
It looks like your conjectures are right in both cases.

I tried adding a few lines to Modules/python.c to print out the argv 
entries as byte strings, before they're passed to mbstowcs.  Results
on OS X 10.5:

> 1. Somebody runs "a.py ภาษาไทย" in a Terminal.app window. Most likely,
> the terminal encoding is applied, which we should assume to be UTF-8
> (although it might be different on some systems).

Yes, it appears that the terminal encoding is applied, if I'm reading 
the results right.  Trying

./python.exe a.py é

with the terminal character encoding set to "Unicode (UTF-8)", Python 
receives the third argument as bytes([195, 169]).  With the terminal 
encoding set to "Western (ISO Latin 1)" instead, Python receives
bytes([233]).

> 2. Somebody creates a file japanese_コンテンツ in the finder, then uses
> shell completion to pass this to a Python script. Here I expect that
> UTF-8 is used even if the terminal's encoding is not UTF-8.

Yes.  Python seems to receive the same string regardless of terminal 
encoding.  (With the terminal encoding set to latin1, the tab-completed 
filename looks like garbage within Terminal, of course.)
History
Date User Action Args
2008-11-22 18:41:21mark.dickinsonsetrecipients: + mark.dickinson, loewis, amaury.forgeotdarc, MrJean1
2008-11-22 18:41:21mark.dickinsonsetmessageid: <1227379281.41.0.0299352496239.issue4388@psf.upfronthosting.co.za>
2008-11-22 18:41:20mark.dickinsonlinkissue4388 messages
2008-11-22 18:41:18mark.dickinsoncreate