This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author jwilk
Recipients jwilk
Date 2014-08-12.18:13:04
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1407867185.09.0.871560290901.issue22187@psf.upfronthosting.co.za>
In-reply-to
Content
This is how shell quoting in commands.mkarg() is implemented:

def mkarg(x):
    if '\'' not in x:
        return ' \'' + x + '\''
    s = ' "'
    for c in x:
        if c in '\\$"`':
            s = s + '\\'
        s = s + c
    s = s + '"'
    return s

This is unfortunately not compatible with the way bash splits arguments in some locales.
The problem is that in a few East Asian encodings (at least BIG5, BIG5-HKSCS, GB18030, GBK), the 0x5C byte (backslash in ASCII) could be the second byte of a two-byte character; and bash apparently decodes the strings before splitting.

PoC:

$ sh --version | head -n1
GNU bash, version 4.3.22(1)-release (i486-pc-linux-gnu)

$ LC_ALL=C python test-mkargs.py
crw-rw-rw- 1 root root 1, 3 Aug 12 16:00 /dev/null
ls: cannot access " ; python -c 'import this' | grep . | shuf | head -n1 | cowsay -y ; ": No such file or directory

$ LC_ALL=zh_CN.GBK python test-mkargs.py
crw-rw-rw- 1 root root 1, 3 8月  12 16:00 /dev/null
ls: 无法访问乗: No such file or directory
 ________________________________
< Simple is better than complex. >
 --------------------------------
        \   ^__^
         \  (..)\_______
            (__)\       )\/\
                ||----w |
                ||     ||
sh: 乗: 未找到命令
History
Date User Action Args
2014-08-12 18:13:05jwilksetrecipients: + jwilk
2014-08-12 18:13:05jwilksetmessageid: <1407867185.09.0.871560290901.issue22187@psf.upfronthosting.co.za>
2014-08-12 18:13:05jwilklinkissue22187 messages
2014-08-12 18:13:04jwilkcreate