classification
Title: sys.argv docs should explaining how to handle encoding issues
Type: enhancement Stage: resolved
Components: Documentation, Unicode Versions: Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Arfrever, andyma, docs@python, ezio.melotti, inada.naoki, miss-islington, ncoghlan, pitrou, sreepriya
Priority: normal Keywords: patch

Created on 2013-02-03 04:01 by ncoghlan, last changed 2019-03-30 06:25 by inada.naoki. This issue is now closed.

Files
File name Uploaded Description Edit
Issue17110.patch sreepriya, 2014-03-17 23:01 Documentation for proper encoding of command line arguments. review
Pull Requests
URL Status Linked Edit
PR 12602 merged inada.naoki, 2019-03-28 12:27
PR 12626 merged miss-islington, 2019-03-30 05:32
Messages (7)
msg181239 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-02-03 04:01
The sys.argv docs [1] currently contain no mention of the fact that they are Unicode strings decoded from bytes provided by the OS. They also don't explain how to correct a decoding error by reversing Python's implicit conversion and redoing it based on the application's knowledge of the correct encoding, as described at [2]

[1] http://docs.python.org/3/library/sys#sys.argv
[2] http://stackoverflow.com/questions/6981594/sys-argv-as-bytes-in-python-3k/
msg213674 - (view) Author: Sreepriya Chalakkal (sreepriya) * Date: 2014-03-15 19:12
I tried running with Python 3.4 the following code

import sys

print(sys.argv[1])
print(b'bytes')

And I ran as follows trying to run with a different encoding. 
$ python ~/a.py `echo priya|iconv -t latin1`
priya
bytes

There was no unicode encode error generated! Is it because the problem is fixed?
msg213699 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-03-16 01:54
> There was no unicode encode error generated! Is it because the problem 
> is fixed?

No, it's not fixed.
First, it seems you are testing with Python 2 (otherwise you would get "b'bytes'", not "bytes"). Python 2 won't have a problem here, since it treats everything as bytestrings.
Second, to evidence the issue you must pass a non-ASCII string. For example:

$ ./python a.py `echo éléphant|iconv -t latin1`
Traceback (most recent call last):
  File "a.py", line 4, in <module>
    print(sys.argv[1])
UnicodeEncodeError: 'utf-8' codec can't encode character '\udce9' in position 0: surrogates not allowed
msg213911 - (view) Author: Sreepriya Chalakkal (sreepriya) * Date: 2014-03-17 23:01
You are right. Instead of running ./python inside the python directory, I ran the default python of older version! Based on the stackoverflow link given, I tried to make some documentation. I am attaching the patch!
msg214022 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-03-18 21:33
Hmm, I'm not sure where those explanations belong but I'm not sure should be in the sys module docs (especially as they are quite lengthy, and they also apply to other data such as os.environ). Perhaps the Unicode HOWTO?
msg339175 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2019-03-30 05:32
New changeset 38f4e468d4b55551e135c67337c18ae142193ba8 by Inada Naoki in branch 'master':
bpo-17110: doc: add note how to get bytes from sys.argv (GH-12602)
https://github.com/python/cpython/commit/38f4e468d4b55551e135c67337c18ae142193ba8
msg339176 - (view) Author: miss-islington (miss-islington) Date: 2019-03-30 05:38
New changeset 5b80cb5584a72044424f2d82d0ae79c720f24c47 by Miss Islington (bot) in branch '3.7':
bpo-17110: doc: add note how to get bytes from sys.argv (GH-12602)
https://github.com/python/cpython/commit/5b80cb5584a72044424f2d82d0ae79c720f24c47
History
Date User Action Args
2019-03-30 06:25:04inada.naokisetstatus: open -> closed
stage: patch review -> resolved
resolution: fixed
versions: + Python 3.7, Python 3.8, - Python 3.2, Python 3.3, Python 3.4
2019-03-30 05:38:17miss-islingtonsetnosy: + miss-islington
messages: + msg339176
2019-03-30 05:32:36miss-islingtonsetpull_requests: + pull_request12559
2019-03-30 05:32:11inada.naokisetnosy: + inada.naoki
messages: + msg339175
2019-03-28 12:27:55inada.naokisetstage: needs patch -> patch review
pull_requests: + pull_request12542
2014-03-18 21:33:01pitrousetmessages: + msg214022
2014-03-18 08:56:15andymasetnosy: + andyma
2014-03-17 23:01:03sreepriyasetfiles: + Issue17110.patch
keywords: + patch
messages: + msg213911
2014-03-16 01:54:01pitrousetnosy: + pitrou
messages: + msg213699
2014-03-15 19:12:56sreepriyasetnosy: + sreepriya
messages: + msg213674
2013-02-03 04:27:08Arfreversetnosy: + Arfrever
2013-02-03 04:01:11ncoghlancreate