classification
Title: commands.mkarg() buggy in East Asian locales
Type: security Stage:
Components: Library (Lib) Versions: Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: glyph, jwilk, mirabilos, r.david.murray
Priority: normal Keywords:

Created on 2014-08-12 18:13 by jwilk, last changed 2014-10-06 12:04 by jwilk.

Files
File name Uploaded Description Edit
test-mkargs.py jwilk, 2014-08-12 18:13
test.sh jwilk, 2014-09-05 18:34
Messages (8)
msg225235 - (view) Author: Jakub Wilk (jwilk) Date: 2014-08-12 18:13
This is how shell quoting in commands.mkarg() is implemented:

def mkarg(x):
    if '\'' not in x:
        return ' \'' + x + '\''
    s = ' "'
    for c in x:
        if c in '\\$"`':
            s = s + '\\'
        s = s + c
    s = s + '"'
    return s

This is unfortunately not compatible with the way bash splits arguments in some locales.
The problem is that in a few East Asian encodings (at least BIG5, BIG5-HKSCS, GB18030, GBK), the 0x5C byte (backslash in ASCII) could be the second byte of a two-byte character; and bash apparently decodes the strings before splitting.

PoC:

$ sh --version | head -n1
GNU bash, version 4.3.22(1)-release (i486-pc-linux-gnu)

$ LC_ALL=C python test-mkargs.py
crw-rw-rw- 1 root root 1, 3 Aug 12 16:00 /dev/null
ls: cannot access " ; python -c 'import this' | grep . | shuf | head -n1 | cowsay -y ; ": No such file or directory

$ LC_ALL=zh_CN.GBK python test-mkargs.py
crw-rw-rw- 1 root root 1, 3 8月  12 16:00 /dev/null
ls: 无法访问乗: No such file or directory
 ________________________________
< Simple is better than complex. >
 --------------------------------
        \   ^__^
         \  (..)\_______
            (__)\       )\/\
                ||----w |
                ||     ||
sh: 乗: 未找到命令
msg225237 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-08-12 18:34
For the record, neither this module nor this routine exist in python3, so this is a python2 only issue.

I'm not sure I fully understand the problem, but perhaps a possible strategy is to apply the fixes to python2's pipes.quote that were applied in python3 (where the function was further moved to shlex), and use that instead of mkarg.
msg226423 - (view) Author: mirabilos (mirabilos) Date: 2014-09-05 13:45
Just for the record, please do not assume all shells behave like GNU bash.
msg226426 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-09-05 15:49
That is true, but sh-alikes (posix standard) are the only ones we support in commands.  subprocess (which commands was folded in to in python3) also supports windows cmd to the extent we've managed, but that's all we are committed to support.
msg226439 - (view) Author: Jakub Wilk (jwilk) Date: 2014-09-05 18:34
I think what mirabilos meant (and what I should have mentioned in my initial message) is that even sh-alikes don't necessarily behave the same way as bash:

$ bash test.sh 
乗

$ ksh test.sh 
乗

$ dash test.sh 
test.sh: 2: test.sh: Syntax error: Unterminated quoted string

$ mksh test.sh 
test.sh[2]: no closing quote

$ posh test.sh 
test.sh:2: no closing quote
msg228095 - (view) Author: Glyph Lefkowitz (glyph) Date: 2014-10-01 17:20
Would simply replacing this function with pipes.quote resolve the issue?
msg228665 - (view) Author: Jakub Wilk (jwilk) Date: 2014-10-06 12:03
Something like this should be safe:

def mkarg(x):
    ' ' + pipes.quote(x)
msg228667 - (view) Author: Jakub Wilk (jwilk) Date: 2014-10-06 12:04
Err, with return of course. :-)
History
Date User Action Args
2014-10-06 12:04:34jwilksetmessages: + msg228667
2014-10-06 12:03:45jwilksetmessages: + msg228665
2014-10-01 17:20:07glyphsetnosy: + glyph
messages: + msg228095
2014-09-05 18:34:16jwilksetfiles: + test.sh

messages: + msg226439
2014-09-05 15:49:50r.david.murraysetmessages: + msg226426
2014-09-05 13:45:12mirabilossetnosy: + mirabilos
messages: + msg226423
2014-08-12 18:34:07r.david.murraysetnosy: + r.david.murray
messages: + msg225237
2014-08-12 18:13:05jwilkcreate