Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shlex.split() does not tokenize like the shell #43667

Closed
robodan mannequin opened this issue Jul 13, 2006 · 32 comments
Closed

shlex.split() does not tokenize like the shell #43667

robodan mannequin opened this issue Jul 13, 2006 · 32 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@robodan
Copy link
Mannequin

robodan mannequin commented Jul 13, 2006

BPO 1521950
Nosy @vsajip, @ericvsmith, @ezio-melotti, @merwok, @bitdancer
Files
  • ref_shlex.py
  • test_shlex.diff
  • changes-tests-docs.diff: Patch showing changes, tests and docs
  • changes-after-feedback.diff: Changes after feedback from Éric
  • changes-after-more-feedback.diff: Changes following feedback from R. David Murray
  • changes-after-yet-more-feedback.diff: Changes following more feedback from R. David Murray
  • incorporating-issue-21999.diff
  • refresh-2016.diff: Updated patch for 3.6 and incorporating SilentGhost's comments.
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/vsajip'
    closed_at = <Date 2016-07-29.21:35:50.925>
    created_at = <Date 2006-07-13.17:44:33.000>
    labels = ['type-feature', 'library']
    title = 'shlex.split() does not tokenize like the shell'
    updated_at = <Date 2016-07-29.21:35:50.922>
    user = 'https://bugs.python.org/robodan'

    bugs.python.org fields:

    activity = <Date 2016-07-29.21:35:50.922>
    actor = 'python-dev'
    assignee = 'vinay.sajip'
    closed = True
    closed_date = <Date 2016-07-29.21:35:50.925>
    closer = 'python-dev'
    components = ['Library (Lib)']
    creation = <Date 2006-07-13.17:44:33.000>
    creator = 'robodan'
    dependencies = []
    files = ['23780', '23781', '24158', '24590', '25365', '25809', '36772', '43831']
    hgrepos = ['99']
    issue_num = 1521950
    keywords = ['patch']
    message_count = 32.0
    messages = ['60940', '115462', '115482', '148272', '148298', '148338', '148352', '148360', '148405', '148410', '148413', '148417', '150761', '153823', '153882', '153883', '154031', '154056', '154064', '158932', '158934', '158956', '159016', '159019', '162157', '162170', '207023', '266823', '270949', '270952', '271456', '271651']
    nosy_count = 9.0
    nosy_names = ['vinay.sajip', 'eric.smith', 'robodan', 'ezio.melotti', 'eric.araujo', 'r.david.murray', 'cvrebert', 'python-dev', 'Andrey.Kislyuk']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue1521950'
    versions = ['Python 3.6']

    @robodan
    Copy link
    Mannequin Author

    robodan mannequin commented Jul 13, 2006

    When shlex.split defines tokens, it doesn't properly
    interpret ';', '&', and '&&'. These should always be
    place in a separate token (unless inside a string).

    The shell treats the following as identical cases, but
    shlex.split doesn't:

    echo hi&&echo bye
    echo hi && echo bye

    echo hi;echo bye
    echo hi ; echo bye

    echo hi&echo bye
    echo hi & echo bye

    shlex.split makes these cases ambiguous:

    echo 'foo&'
    echo foo&

    echo '&&exit'
    echo &&exit

    @robodan robodan mannequin added the stdlib Python modules in the Lib dir label Jul 13, 2006
    @devdanzin devdanzin mannequin added the type-feature A feature request or enhancement label Mar 30, 2009
    @merwok
    Copy link
    Member

    merwok commented Sep 3, 2010

    Thanks for the report. Would you like to work on a patch, or translate your examples into unit tests?

    The docs do not mention “&” at all, and platform discrepancies have to be taken into account too, so I really don’t know if this is a bug fix for the normal mode, the POSIX mode, or a feature request requiring a new argument to the shlex function to preserve compatibility.

    @robodan
    Copy link
    Mannequin Author

    robodan mannequin commented Sep 3, 2010

    It's been a while since I looked at this.  I'm not really in a
    position to contribute code/tests right now; but I can comment.

    I don't think POSIX mode existed when I first reported this, but
    that's where it makes sense.  I think all POSIX shells (borne, C,
    korne), will behave the same way for the issues mentioned.

    There are really two cases in one bug.

    The first part is that the shell will split tokens at characters that
    shlex doesn't.  The handling of &, |, ;, >, and < could be done by
    adjusting the definition of shlex.wordchars.  The shell may also
    understands things like: &&, ||, |&, and >&.  The exact definition of
    these depends on the shell, so maybe it's best to just split them out
    as separate tokens and let the user figure out the compound meanings.

    The proper handling of quotes/escapes requires some kind of new
    interface.  You need to distinguish between tokens that were modified
    by the quote/escape rules and those that were not.  One suggestion is
    to add a new method as such:

    shlex.get_token2()
       Return a tuple of the token and the original text of the token
    (including quotes and escapes).  Otherwise, this is the same as
    shlex.get_token().

    Comparing the two values for equality (or maybe identity) would tell
    you if something special was going on.  You can always pass the second
    value to a reconstructed command line without losing any of the
    original parsing information.

    -Dan

    On Fri, Sep 3, 2010 at 10:27 AM, Éric Araujo <report@bugs.python.org> wrote:

    Éric Araujo <merwok@netwok.org> added the comment:

    Thanks for the report. Would you like to work on a patch, or translate your examples into unit tests?

    The docs do not mention “&” at all, and platform discrepancies have to be taken into account too, so I really don’t know if this is a bug fix for the normal mode, the POSIX mode, or a feature request requiring a new argument to the shlex function to preserve compatibility.

    ----------
    nosy: +eric.araujo, eric.smith


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue1521950\>


    @merwok
    Copy link
    Member

    merwok commented Nov 24, 2011

    Thanks for the comments.

    There are really two cases in one bug.
    The first part is that the shell will split tokens at characters that shlex doesn't. The handling
    of &, |, ;, >, and < could be done by adjusting the definition of shlex.wordchars. The shell may
    also understands things like: &&, ||, |&, and >&. The exact definition of these depends on the
    shell, so maybe it's best to just split them out as separate tokens and let the user figure out the
    compound meanings.
    Yes. I think that the main use of shlex is really to parse a line into chunks with a way to embed spaces; it’s intended to parse a program command line (“prog --blah "value stillthesamevalue" "arg samearg"”), but not necessarily a full shell line (with & and | and whatnot). When people have a line containing & and |, then they need a shell to execute it, so they would not call shlex.split but just pass the full line to os.system or subprocess.Popen. Do you remember what use cases you had when you opened this report?

    The proper handling of quotes/escapes requires some kind of new interface. You need to distinguish
    between tokens that were modified by the quote/escape rules and those that were not.
    I don’t see why I would care about quotes in the result of shlex.split.

    See also bpo-7611.

    @robodan
    Copy link
    Mannequin Author

    robodan mannequin commented Nov 25, 2011

    Of course, that's how it's used. That's all it can do right now.

    I was was splitting and combining commands (using ;, &&, and ||) and
    then running the resulting (mega) one liners over ssh. It still gets
    run by a shell, but I was specifying the control flow. 0

    It's kind of like a makefile command block. You want to be able to
    specify if a failure aborts the sequence, or is ignored (&& vs ;).
    Sometimes there are fallback commands (via ||). Of course, you can
    also group using ().

    Once things are split properly, then understanding the shell control
    characters is straight forward. I my mind, shlex.split() should
    either be as close to shell syntax as possible, or have a clear
    explanation of what is different (and why).

    I ended up doing my own parsing. I'm not actually at that company
    anymore, so I can't pull up the code.

    I'll see if I can come up with a reference case and maybe a unittest
    this weekend (that's really the only time I'll have to dig into it).

    -Dan

    On Thu, Nov 24, 2011 at 9:20 AM, Éric Araujo <report@bugs.python.org> wrote:

    Éric Araujo <merwok@netwok.org> added the comment:

    Thanks for the comments.

    > There are really two cases in one bug.
    > The first part is that the shell will split tokens at characters that shlex doesn't.  The handling
    > of &, |, ;, >, and < could be done by adjusting the definition of shlex.wordchars.  The shell may
    > also understands things like: &&, ||, |&, and >&.  The exact definition of these depends on the
    > shell, so maybe it's best to just split them out as separate tokens and let the user figure out the
    > compound meanings.
    Yes.  I think that the main use of shlex is really to parse a line into chunks with a way to embed spaces; it’s intended to parse a program command line (“prog --blah "value stillthesamevalue" "arg samearg"”), but not necessarily a full shell line (with & and | and whatnot).  When people have a line containing & and |, then they need a shell to execute it, so they would not call shlex.split but just pass the full line to os.system or subprocess.Popen.  Do you remember what use cases you had when you opened this report?

    > The proper handling of quotes/escapes requires some kind of new interface.  You need to distinguish
    > between tokens that were modified by the quote/escape rules and those that were not.
    I don’t see why I would care about quotes in the result of shlex.split.

    See also bpo-7611.

    ----------


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue1521950\>


    @merwok
    Copy link
    Member

    merwok commented Nov 25, 2011

    Of course, that's how it's used. That's all it can do right now.
    :) What I meant is that it is *meant* to be used in this way.

    I was was splitting and combining commands (using ;, &&, and ||) and then running the resulting
    (mega) one liners over ssh. It still gets run by a shell, but I was specifying the control flow.
    Thank you for the reply. It is indeed a valuable use case to pass a command line as one string to ssh, and the split/quote combo should round-trip and be useful for this usage.

    I'll see if I can come up with a reference case and maybe a unittest this weekend
    Great! A new argument (with a default value which gets us the previous behavior) will probably be needed, to preserve backward compatibility.

    @robodan
    Copy link
    Mannequin Author

    robodan mannequin commented Nov 25, 2011

    I've attached a diff to test_shlex.py and a script that I used to
    verify what the shells actually do.
    Both are relative to Python-3.2.2/Lib/test

    I'm completely ignoring the quotes issue for now. That should
    probably be an enhancement. I don't think it really matters until the
    parsing issues are resolved.

    ref_shlex is python 2 syntax. python -3 shows that it should convert cleanly.
    ./ref_shlex.py
    It will run by default against /bin/*sh
    If you don't want that, do something like: export SHELLS='/bin/sh,/bin/csh'
    It runs as a unittest. So you will only see dots if all shells do
    what it expects. Some shells are flaky (e.g. zsh, tcsh), so you may
    need to run it multiple times.

    Getting this into the mainline will be interesting. I would think it
    would take some community discussion. I may be able to convince
    people that the current behaviour is wrong, but I can't tell you what
    will break if it is "fixed". And should the fix be the default? As
    you mentioned, it depends on what people expect it to do and how it is
    currently being used. I see the first step as presenting a clear case
    of how it should work.

    -Dan

    On Fri, Nov 25, 2011 at 10:01 AM, Éric Araujo <report@bugs.python.org> wrote:

    Éric Araujo <merwok@netwok.org> added the comment:

    > Of course, that's how it's used.  That's all it can do right now.
    :) What I meant is that it is *meant* to be used in this way.

    > I was was splitting and combining commands (using ;, &&, and ||) and then running the resulting
    > (mega) one liners over ssh.  It still gets run by a shell, but I was specifying the control flow.
    Thank you for the reply.  It is indeed a valuable use case to pass a command line as one string to ssh, and the split/quote combo should round-trip and be useful for this usage.

    > I'll see if I can come up with a reference case and maybe a unittest this weekend
    Great!  A new argument (with a default value which gets us the previous behavior) will probably be needed, to preserve backward compatibility.

    ----------
    nosy: +niemeyer
    versions: +Python 3.3 -Python 3.2


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue1521950\>


    @robodan
    Copy link
    Mannequin Author

    robodan mannequin commented Nov 25, 2011

    I just realized that I left out a major case. The shell will also
    split (). I think this is now complete. If you do "man bash" and
    skip down to DEFINITONS it lists all the control characters.

    I've attached updated versions of ref_shlex.py and test_shlex.diff.
    They replace the previous ones.

    -Dan

    On Fri, Nov 25, 2011 at 12:25 PM, Dan Christian <report@bugs.python.org> wrote:

    Dan Christian <robodan@users.sourceforge.net> added the comment:

    I've attached a diff to test_shlex.py and a script that I used to
    verify what the shells actually do.
    Both are relative to Python-3.2.2/Lib/test

    I'm completely ignoring the quotes issue for now.  That should
    probably be an enhancement.  I don't think it really matters until the
    parsing issues are resolved.

    ref_shlex is python 2 syntax.  python -3 shows that it should convert cleanly.
    ./ref_shlex.py
    It will run by default against /bin/*sh
    If you don't want that, do something like: export SHELLS='/bin/sh,/bin/csh'
    It runs as a unittest.  So you will only see dots if all shells do
    what it expects.  Some shells are flaky (e.g. zsh, tcsh), so you may
    need to run it multiple times.

    Getting this into the mainline will be interesting.  I would think it
    would take some community discussion.  I may be able to convince
    people that the current behaviour is wrong, but I can't tell you what
    will break if it is "fixed".  And should the fix be the default?  As
    you mentioned, it depends on what people expect it to do and how it is
    currently being used.  I see the first step as presenting a clear case
    of how it should work.

    -Dan

    @merwok
    Copy link
    Member

    merwok commented Nov 26, 2011

    Thanks for the diff and test. (I removed the older versions; there are “edit” links in the list of files leading to pages where it’s possible to remove them, if one has the required permissions.)

    Your script passes with dash, which is probably the most POSIX-compliant shell we can find. (bash has extensions, zsh/csh don’t use the POSIX shell language, so I think the behavior of dash should be our reference, not the bash man page.)

    I may be able to convince people that the current behaviour is wrong, but I can't tell you what will
    break if it is "fixed". And should the fix be the default? As you mentioned, it depends on what
    people expect it to do and how it is currently being used.

    python-dev takes compatibility seriously. Some things are clearly bugs and we fix them, even if it will break buggy code out there. For example, we recently fixed bugs in HTML parsing: We had a specification to decide that they were really bugs, and we judged that no sane program could be relying on the exact behavior of the parser. shlex is another case; in my opinion, it’s been used for years to implement parsing similar, but not identical in all cases, to the shell’s, and as there is code out there that depends on the current behavior of shlex and does not need to support && || ; ( ), if we add support for these tokens we should not break the existing code. Given that we can’t test all programs that use shlex, I think we’ll have to add a new parameter, with a default value which gets us the previous behavior, as I said in my previous message.

    (BTW, would you mind editing the quoted section when you reply by email? Otherwise we get unhelpful, distracting walls of quoted texts. Thanks in advance.)

    @robodan
    Copy link
    Mannequin Author

    robodan mannequin commented Nov 26, 2011

    On Sat, Nov 26, 2011 at 7:12 AM, Éric Araujo <report@bugs.python.org> wrote:

    Your script passes with dash, which is probably the most POSIX-compliant shell we can find.  (bash has extensions, zsh/csh don’t use the POSIX shell language, so I think the behavior of dash should be our reference, not the bash man page.)

    I was just looking for a reference where I didn't have to sift through
    tons of documentation. Most systems have bash. Before that I was
    just working from experience (I've done a lot of shell scripting).

    there is code out there that depends on the current behavior of shlex and does not need to support && || ; ( ), if we add support for these tokens we should not break the existing code.

    Here's a thought on how that might work (just brainstorming). shlex
    uses a series of character strings to drive it's parsing: whitespace,
    escape, quotes. Add another one: control = '();<>|&'. If it is unset
    (by default?), then the behavior is as before. If it is set, then
    shlex will output any character in control as a separate token.

    There might be a shell specific script (or maybe it's left to the
    user) that decides that certain tokens can be recombined: '&&', '||',
    '|&', '>>', etc. This code is pretty simple: walk the token
    sequence, if you see a two token pair, pop the second and combine it
    into the first.

    -Dan

    @merwok
    Copy link
    Member

    merwok commented Nov 26, 2011

    I was just looking for a reference where I didn't have to sift through tons of documentation.
    Sure :) That’s why I suggest using dash for quick tests and rely on the work of other people who did read the POSIX spec. I’ll have to check it too before committing a patch.

    shlex uses a series of character strings to drive it's parsing: whitespace, escape, quotes.
    Add another one: control = '();<>|&'. If it is unset (by default?), then the behavior is as
    before.
    So we would need to add a Shlex subclass to the module to provide the new behavior. I think I prefer a new argument, because we can just extend the existing class and functions instead of adding subtly differing duplicates.

    If it is set, then shlex will output any character in control as a separate token.
    Unless it is part of a quoted segment, right? (See bpo-7611 for 'foo#bar' vs. 'foo #bar').

    There might be a shell specific script (or maybe it's left to the user)
    that decides that certain tokens can be recombined:
    Seems to much complexity. I really prefer if we agree on one command parsing behavior (POSIX, i.e. dash) and improve shlex to support that. People wanting zsh rules can write their own subclass.

    '&&', '||', '|&', '>>', etc.
    Wouldn’t it be more correct to consider them different tokens? I don’t have a format training in CS or programming, so I’m not sure that my definition is correct at all, but in my mind a token is a unit, and thus & and && are two different things.

    @robodan
    Copy link
    Mannequin Author

    robodan mannequin commented Nov 26, 2011

    Sure :)  That’s why I suggest using dash for quick tests and rely on the work of other people who did read the POSIX spec.  I’ll have to check it too before committing a patch.

    The point of ref_shlex.py is that all shells act the same for common
    cases and shlex doesn't match any of them. The only real split it
    that csh based shells do some things differently that sh based shells
    ('2>' vs '&>').

    > shlex uses a series of character strings to drive it's parsing:  whitespace, escape, quotes.
    > Add another one: control = '();<>|&'.  If it is unset (by default?), then the behavior is as
    > before.
    So we would need to add a Shlex subclass to the module to provide the new behavior.  I think I prefer a new argument, because we can just extend the existing class and functions instead of adding subtly differing duplicates.

    You don't have to do a subclass (although that might have some
    advantages). You could do something like:
    def shlex(s, comments=False, posix=True, control=False):
    ...
    if control:
    if control is True:
    self.control = '();<>|&'
    else:
    self.control = control # let user specify their own control set

    > If it is set, then shlex will output any character in control as a separate token.
    Unless it is part of a quoted segment, right?  (See bpo-7611 for 'foo#bar' vs. 'foo #bar').

    Correct, quotes wouldn't change.

    > There might be a shell specific script (or maybe it's left to the user)
    > that decides that certain tokens can be recombined:
    Seems to much complexity.  I really prefer if we agree on one command parsing behavior (POSIX, i.e. dash) and improve shlex to support that.  People wanting zsh rules can write their own subclass.

    shlex is a pretty simple lexer (as lexers go), and I wouldn't want it
    to get complicated. It's easier in the current code structure to
    split everything and then re-join as needed. This also allows you to
    select sh vs csh joining rules (e.g. '|&' means different things in sh
    vs csh). Every shell that I've seen follows one of those two flavors
    for syntax.

    > '&&', '||', '|&', '>>', etc.
    Wouldn’t it be more correct to consider them different tokens?  I don’t have a format training in CS or programming, so I’m not sure that my definition is correct at all, but in my mind a token is a unit, and thus & and && are two different things.

    Ideally, the final tokens have exact meanings. It easier to write
    handler code for '&&' than ('&', '&'). This is just a case of whether
    the parse joins them together or it's done in a second step. The
    current code doesn't do much look ahead, so it's hard for the lexer to
    produce things like '&&' directly.

    -Dan

    @vsajip
    Copy link
    Member

    vsajip commented Jan 6, 2012

    I've made a patch which implements this functionality, together with docs and tests. Please review.

    @merwok
    Copy link
    Member

    merwok commented Feb 21, 2012

    This time you should have received an email from Rietveld, I made sure that your ID was expanded to an email address.

    I like all the suggestions you made in reply to my comments.

    @vsajip
    Copy link
    Member

    vsajip commented Feb 21, 2012

    I updated the patch to reflect Éric's comments on Rietveld, but there are also some other changes:

    Previously when punctuation chars were set, wordchars was being augmented by '-'. This was incomplete, so the augmentation is now with '~-./*?=' which allows for wildcards, filename chars and argument flags.

    I added a token_type attribute whose value is 'a' for alphanumeric tokens and 'c' for punctuation tokens. This token type is internally tracked anyway - we just expose it now. It is needed for when multiple punctuation tokens need to be disambiguated, because we might return two logically separate punctuation tokens as one if they are not separated by whitespace in the source being tokenised.

    New attributes and the changes to wordchars have been documented, and a test added for token_type return values.

    @vsajip
    Copy link
    Member

    vsajip commented Feb 21, 2012

    Plus I also changed a few instances of the anachronism

    a = a + b

    to

    a += b

    @merwok
    Copy link
    Member

    merwok commented Feb 23, 2012

    Previously when punctuation chars were set, wordchars was being augmented by '-'. This was
    incomplete, so the augmentation is now with '~-./*?=' which allows for wildcards, filename
    chars and argument flags.
    I did not fully get what you meant here, but the example you added to the doc made it clear. Is this covered by tests?

    Overall great patch! Dan, do you have time to test it (or read the new examples in the patch) to tell us if it meets what you wanted?

    @vsajip
    Copy link
    Member

    vsajip commented Feb 23, 2012

    Éric Araujo <merwok@netwok.org> added the comment:

    I did not fully get what you meant here, but the example you added to the doc made it clear.  Is this covered by tests?

    Yes, I believe that testSyntaxSplitCustom covers this.

    Overall great patch!  Dan, do you have time to test it (or read the new examples in the patch) to tell us if it meets what you wanted?

    Thanks! It was a bit fiddly, shlex is somewhat difficult to extend cleanly. I developed this functionality for a subprocess ease-of-use-wrapper module called sarge, and I had to basically copy and modify the whole read_token method :-(

    @robodan
    Copy link
    Mannequin Author

    robodan mannequin commented Feb 23, 2012

    I haven't been following this much. Sorry. My day job isn't in this area any more (and I'm stuck using 2.4 :-().

    Looking at the docs, I notice the "old" is different from what it used to be. Notably: 'e;' gets split into two tokens; and ">'abc';" gets split into 3. I'm pretty sure that baseline code doesn't split those at all. So there is a question of if "old" is fully backward compatible.

    The "new" functionality looks great. That's what I was looking for when I filed the bug.

    Thank you!
    -Dan

    @vsajip
    Copy link
    Member

    vsajip commented Apr 21, 2012

    I've received no comments on the latest revision of my patch (incorporating comments on the previous version); is it OK to commit this?

    @bitdancer
    Copy link
    Member

    I'd like to take a look at this (I wasn't aware of it before). I'll try to do that some time in the next 24 hours, and if I don't you shouldn't wait for me :)

    Did you address Dan's concern about 'old' possibly not matching the old behavior completely?

    @vsajip
    Copy link
    Member

    vsajip commented Apr 22, 2012

    I believe Dan meant that the behaviour of shlex.split() now is different from what it was when he first raised the issue (in July 2006). Looking at the default branch of CPython, this is what I see:

    Python 3.3.0a2+ (default:ff6593aa8376, Apr 22 2012, 12:39:08) 
    [GCC 4.3.3] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import shlex
    >>> list(shlex.shlex('e;'))
    ['e', ';']
    >>> list(shlex.shlex(">'abc';"))
    ['>', "'abc'", ';']

    Likewise, on the 2.6 branch:

    Python 2.6.8+ (unknown, Apr 22 2012, 12:44:43) 
    [GCC 4.3.3] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import shlex
    >>> list(shlex.shlex('e;'))
    ['e', ';']
    >>> list(shlex.shlex(">'abc';"))
    ['>', "'abc'", ';']

    So from what Dan is saying, it would seem that he is saying that shlex behaviour (before my patch being applied) is different now to how he remembers it - not that the patch introduces any incompatibility.

    Still, another set of eyeballs on the patch would be good.

    @merwok
    Copy link
    Member

    merwok commented Apr 23, 2012

    I'd like to take a look at this (I wasn't aware of it before).
    Are you interested in shlex in general or only this bug? If the former, then I’ll try to remember to make you nosy on future issues.

    BTW, what is the shlex unicode bug you mentioned a few times on Rietveld? The one I know is fixed now.

    @bitdancer
    Copy link
    Member

    I am interested in shell stuff in general.

    The unicode bug is bpo-1170.

    @vsajip
    Copy link
    Member

    vsajip commented Jun 2, 2012

    I've updated the patch following comments by RDM - it probably could do with a code review (now that I've addressed RDM's comments on the docs).

    @bitdancer
    Copy link
    Member

    Review, including a code-but-not-algorithm review :), posted.

    @vsajip
    Copy link
    Member

    vsajip commented Dec 28, 2013

    Let's hope we can get this into 3.5. I updated my patch a while ago to address RDM's comments.

    @AndreyKislyuk
    Copy link
    Mannequin

    AndreyKislyuk mannequin commented Jun 1, 2016

    Is there any chance of getting this into 3.6? We are still in a situation where the shlex module misleads developers into believing that it has functionality to parse things the way the shell does. I've had to vendor the copy of shlex with patches from this bug applied (thanks Vinay!)

    @vsajip
    Copy link
    Member

    vsajip commented Jul 21, 2016

    This has been knocking around since 3.3, but never got enough attention to make it in. Barring objections from anyone, I'd like to commit this patch once I check that it applies cleanly against 3.6, before we get into 3.6 beta.

    @bitdancer
    Copy link
    Member

    No objection from me. I'm not likely to have the time to give it the kind of thorough review I'd *like* to, but I don't think it is really needed.

    @vsajip
    Copy link
    Member

    vsajip commented Jul 27, 2016

    Okay, I've updated with a new patch addressing SilentGhost's comments, and addressed the comments on that patch. If I don't hear any objections by Friday, I plan to commit this change.

    @vsajip vsajip self-assigned this Jul 27, 2016
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jul 29, 2016

    New changeset ea99e2f0b829 by Vinay Sajip in branch 'default':
    Closes bpo-1521950: Made shlex parsing more shell-like.
    https://hg.python.org/cpython/rev/ea99e2f0b829

    @python-dev python-dev mannequin closed this as completed Jul 29, 2016
    @python-dev python-dev mannequin closed this as completed Jul 29, 2016
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants