Author ncoghlan
Recipients Arfrever, amaury.forgeotdarc, ezio.melotti, lemburg, loewis, ncoghlan, vstinner
Date 2014-04-28.15:11:54
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1398697914.95.0.646742689311.issue8776@psf.upfronthosting.co.za>
In-reply-to
Content
I'd like to revisit this after PEP 432 is in place, since having to do this dance for arg processing when running on Linux in the POSIX locale is somewhat lame:

    argv = sys.argv
    encoding = locale.getpreferredencoding() # Hope nobody changed the locale!
    fixed_encoding = read_encoding_from("/etc/locale.conf") # For example
    argvb = [arg.encode(encoding, "surrogateescape") for arg in argv]
    fixed_argv = [arg.decode(fixed_encoding, "surrogateescape") for arg in argvb]

(For stricter parsing, leave out the second "surrogateescape")

Now, if PEP 432 resolves the system encoding issue such that we are able to use the right encoding even when locale.getpreferredencoding() returns the wrong answer, then it may not be worthwhile to also provide sys.argvb (especially since it won't help hybrid 2/3 code). On the other hand, like os.environb, it does make it easier for POSIX-only code paths that wants to handle boundary encoding issues directly to stick with consuming the binary data directly and avoid the interpreter's automatic conversion to the text domain.

Note also that os.environb is only available when os.supports_bytes_environ is True, so it would make sense to only provide sys.argvb in the circumstances where we provide os.environb.
History
Date User Action Args
2014-04-28 15:11:55ncoghlansetrecipients: + ncoghlan, lemburg, loewis, amaury.forgeotdarc, vstinner, ezio.melotti, Arfrever
2014-04-28 15:11:54ncoghlansetmessageid: <1398697914.95.0.646742689311.issue8776@psf.upfronthosting.co.za>
2014-04-28 15:11:54ncoghlanlinkissue8776 messages
2014-04-28 15:11:54ncoghlancreate