classification
Title: shlex behaves unexpected if newlines are not whitespace
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.2, Python 3.1, Python 3.0, Python 2.7, Python 2.6, Python 2.5, Python 2.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: gagenellina, jjdmol2 (2)
Priority: Keywords patch

Created on 2009-10-09 08:11 by jjdmol2, last changed 2009-10-10 03:15 by gagenellina.

Files
File name Uploaded Description Edit Remove
lexertest.py jjdmol2, 2009-10-09 08:11
lexer-newline-tokens.patch jjdmol2, 2009-10-09 08:25
Messages (3)
msg93776 - (view) Author: Jan David Mol (jjdmol2) Date: 2009-10-09 08:11
The shlex module does not function as expected in the presence of
comments when newlines are not whitespace. An example (attached):

>>> from shlex import shlex
>>> 
>>> lexer = shlex("a \n b")
>>> print ",".join(lexer)
a,b
>>> 
>>> lexer = shlex("a # comment \n b")
>>> print ",".join(lexer)
a,b
>>> 
>>> lexer = shlex("a \n b")
>>> lexer.whitespace=" "
>>> print ",".join(lexer)
a,
,b
>>> 
>>> lexer = shlex("a # comment \n b")
>>> lexer.whitespace=" "
>>> print ",".join(lexer)
a,b

Now where did my newline go? The comment ate it! Even though the docs
seem to indicate the newline is not part of the comment itself:

shlex.commenters:
    The string of characters that are recognized as comment beginners.
All characters from the comment beginner to end of line are ignored.
Includes just '#' by default.
msg93778 - (view) Author: Jan David Mol (jjdmol2) Date: 2009-10-09 08:25
Attached is a patch which fixes this for me. It basically does a
fall-through using '\n' when encountering a comment. So that may be a
bit of a hack (who says '\n' is the only newline char in there, and not
'\r'?) but I'll leave the more intricate stuff to you experts.
msg93820 - (view) Author: Gabriel Genellina (gagenellina) Date: 2009-10-10 03:15
If you could add some tests to lib/test/test_shlex.py, there are more 
chances for this patch to be accepted.

Also, consider the case when the comment is on the last line of input 
and there is no \n ending character.
History
Date User Action Args
2009-10-10 03:15:30gagenellinasetnosy: + gagenellina
messages: + msg93820
2009-10-09 09:41:02jjdmol2setcomponents: + Library (Lib)
2009-10-09 08:25:20jjdmol2setfiles: + lexer-newline-tokens.patch
keywords: + patch
messages: + msg93778
2009-10-09 08:11:52jjdmol2create