classification
Title: shlex.shlex.lineno reports a different number depending on the previous token
Type: behavior Stage: resolved
Components: Versions: Python 3.6, Python 3.2, Python 3.3, Python 2.7, Python 2.6
process
Status: closed Resolution: duplicate
Dependencies: Superseder: shlex.split() does not tokenize like the shell
View: 1521950
Assigned To: Nosy List: cheryl.sabella, daniel-s, hoadlck
Priority: normal Keywords:

Created on 2013-09-29 02:35 by daniel-s, last changed 2019-01-04 17:25 by cheryl.sabella. This issue is now closed.

Files
File name Uploaded Description Edit
shlex_line.py daniel-s, 2013-09-29 02:35 The code example from the comment.
Messages (3)
msg198561 - (view) Author: Daniel (daniel-s) Date: 2013-09-29 02:35
See the example below (also attached).

First example: The lineno reported just after "word2" is pulled is 2.
Second example: The lineno reported just after "," is pulled is still 1.

This behaviour seems inconsistent. The lineno should increment either when the last token of a line is pulled, or after the first token from the next line (in my opinion preferably the former). It should not have different bahaviour depending on what type of token that is (alpha vs. comma).

I have repeated this on 

Also, does Issue 16121 relate to this?


#!/usr/bin/env python
import shlex

first = shlex.shlex("word1 word2\nword3")
print (first.get_token())
print (first.get_token())
print ("line no", first.lineno)
print ("")

second = shlex.shlex("word1 word2,\nword3")
print (second.get_token())
print (second.get_token())
print (second.get_token())
print ("line no", second.lineno)


# Output:
# word1
# word2
# line no 2
#
# word1
# word2
# ,
# line no 1
msg198562 - (view) Author: Daniel (daniel-s) Date: 2013-09-29 02:38
From the unfinished sentence:

I have repeated this on all versions of shlex on which I have tried. Including Python 2.6, 2.7, 3.2 and 3.3.
msg332986 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2019-01-04 17:25
There was a parameter `punctuation_chars` added to the shlex.shlex class with issue 1521950 (implemented for 3.6).  Although the comma is not one of the default punctuation characters (setting the parameter to punctuation_chars=True won't change the behavior), you can use `punctuation_chars=","` to see the results reported in this issue.


>>> second = shlex.shlex('word1 word2,\nword3', punctuation_chars=',')
>>> second.get_token()
'word1'
>>> second.lineno
1
>>> second.get_token()
'word2'
>>> second.lineno
1
>>> second.get_token()
','
>>> second.lineno
2
>>>


Closing this as a duplicate of #1521950.
History
Date User Action Args
2019-01-04 17:25:03cheryl.sabellasetstatus: open -> closed

superseder: shlex.split() does not tokenize like the shell

nosy: + cheryl.sabella
messages: + msg332986
resolution: duplicate
stage: resolved
2016-12-27 13:26:30hoadlcksetnosy: + hoadlck

versions: + Python 3.6
2013-09-29 02:38:09daniel-ssetmessages: + msg198562
2013-09-29 02:35:13daniel-screate