classification
Title: csv.Sniffer.has_header doesn't escape characters used in regex
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.4, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: akuchling, davechallis, python-dev, r.david.murray, vajrasky
Priority: normal Keywords: easy, patch

Created on 2013-06-07 11:09 by davechallis, last changed 2013-06-29 22:45 by r.david.murray. This issue is now closed.

Files
File name Uploaded Description Edit
csv_has_header.diff vajrasky, 2013-06-07 16:12 Patch to fix the sniffer detecting header with regex character review
Messages (7)
msg190742 - (view) Author: Dave Challis (davechallis) Date: 2013-06-07 11:09
When attempting to detect the presence of CSV headers, delimiters are passed to a regex function without escaping, which causes an exception if a delimiter which has meaning in a regex (e.g. '+', '*' etc.) is used.

Code to reproduce:
import csv
s = csv.Sniffer()
s.has_header('"x"+"y"')

Trace:
Traceback (most recent call last):
  File "t.py", line 4, in <module>
    s.has_header('"x"+"y"')
  File "/usr/lib64/python3.3/csv.py", line 393, in has_header
    rdr = reader(StringIO(sample), self.sniff(sample))
  File "/usr/lib64/python3.3/csv.py", line 183, in sniff
    self._guess_quote_and_delimiter(sample, delimiters)
  File "/usr/lib64/python3.3/csv.py", line 268, in _guess_quote_and_delimiter
    {'delim':delim, 'quote':quotechar}, re.MULTILINE)
  File "/home/dsc/venv/p3compat/lib64/python3.3/re.py", line 214, in compile
    return _compile(pattern, flags)
  File "/home/dsc/venv/p3compat/lib64/python3.3/re.py", line 281, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/home/dsc/venv/p3compat/lib64/python3.3/sre_compile.py", line 494, in compile
    p = sre_parse.parse(p, flags)
  File "/home/dsc/venv/p3compat/lib64/python3.3/sre_parse.py", line 748, in parse
    p = _parse_sub(source, pattern, 0)
  File "/home/dsc/venv/p3compat/lib64/python3.3/sre_parse.py", line 360, in _parse_sub
    itemsappend(_parse(source, state))
  File "/home/dsc/venv/p3compat/lib64/python3.3/sre_parse.py", line 696, in _parse
    p = _parse_sub(source, state)
  File "/home/dsc/venv/p3compat/lib64/python3.3/sre_parse.py", line 360, in _parse_sub
    itemsappend(_parse(source, state))
  File "/home/dsc/venv/p3compat/lib64/python3.3/sre_parse.py", line 696, in _parse
    p = _parse_sub(source, state)
  File "/home/dsc/venv/p3compat/lib64/python3.3/sre_parse.py", line 360, in _parse_sub
    itemsappend(_parse(source, state))
  File "/home/dsc/venv/p3compat/lib64/python3.3/sre_parse.py", line 569, in _parse
    raise error("nothing to repeat")
sre_constants.error: nothing to repeat
msg190745 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-06-07 13:21
I doubt this is a regression, so I'm marking the others versions as well without actually testing it.  Should be an easy fix.
msg190755 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-06-07 15:36
Attached the patch to fix the problem.
msg190756 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-06-07 15:40
Sorry. This one is correct. Attached the patch to fix the problem.
msg190759 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-06-07 16:12
Added test to sniffer double quote.
msg192057 - (view) Author: Roundup Robot (python-dev) Date: 2013-06-29 22:44
New changeset 68ff68f9a0d5 by R David Murray in branch '3.3':
#18155: Regex-escape delimiter, in case it is a regex special char.
http://hg.python.org/cpython/rev/68ff68f9a0d5

New changeset acaf73e3d882 by R David Murray in branch 'default':
Merge #18155: Regex-escape delimiter, in case it is a regex special char.
http://hg.python.org/cpython/rev/acaf73e3d882

New changeset 0e1d538d36dc by R David Murray in branch '2.7':
#18155: Regex-escape delimiter, in case it is a regex special char.
http://hg.python.org/cpython/rev/0e1d538d36dc
msg192058 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-06-29 22:45
Committed, with slight modifications to the tests.  Thanks Vajrasky.
History
Date User Action Args
2013-06-29 22:45:47r.david.murraysetstatus: open -> closed
resolution: fixed
messages: + msg192058

stage: needs patch -> resolved
2013-06-29 22:44:10python-devsetnosy: + python-dev
messages: + msg192057
2013-06-14 20:25:42akuchlingsetnosy: + akuchling
2013-06-07 16:12:19vajraskysetfiles: + csv_has_header.diff

messages: + msg190759
2013-06-07 16:11:37vajraskysetfiles: - csv_has_header.diff
2013-06-07 15:40:53vajraskysetfiles: + csv_has_header.diff

messages: + msg190756
2013-06-07 15:38:13vajraskysetfiles: - csv_has_header.diff
2013-06-07 15:36:43vajraskysetfiles: + csv_has_header.diff

nosy: + vajrasky
messages: + msg190755

keywords: + patch
2013-06-07 13:21:53r.david.murraysetversions: + Python 2.7, Python 3.4
nosy: + r.david.murray

messages: + msg190745

keywords: + easy
stage: needs patch
2013-06-07 11:09:19davechalliscreate