diff -r fd53f083768c Doc/library/sys.rst --- a/Doc/library/sys.rst Sat Mar 08 21:40:29 2014 -0500 +++ b/Doc/library/sys.rst Mon Mar 17 23:57:54 2014 +0100 @@ -27,6 +27,40 @@ To loop over the standard input, or the list of files given on the command line, see the :mod:`fileinput` module. + + However, one important thing to note here is that the command line arguments are + treated as bytestrings. Sometimes, the arguments must be passed as bytes while in + some situations, e.g. when passing a path, there is no need of any particular + characted encoding in Unix. + + Consider an example of a brief python program `a.py`:: + import sys + + print(sys.argv[1]) + print(b'bytes') + + When the above program is executed as python3.4 a.py éléphant, it prints the + expected output as follows:: + éléphant + b'bytes' + + But if we pass the argument, for eg. + python3.4 a.py `echo éléphant|iconv -t latin1` in a different encoding, + we obtain a UnicodeEnodeError. This is because those bytes that are passed are + invalid UTF-8 arguments and hence can't be encoded as such for writing to the console. + + The best solution is to use the application level knowledge of the correct encoding + to reinterpret the command line argument inside the application, as in the following code:: + + $ python3.4 -c "import os, sys; print(os.fsencode(sys.argv[1]).decode('latin-1'))" + `echo éléphant|iconv -t latin1` + éléphant + + The os.fsencode function invocation reverses the transformation Python applied + automatically when processing the command line arguments. The decode('latin-1') + method invocation then performs the correct conversion in order to get a + properly decoded string. + .. data:: base_exec_prefix