classification
Title: csv.Sniffer.sniff() regex error
Type: Stage: resolved
Components: Library (Lib) Versions: Python 3.8, Python 3.7, Python 3.6, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: jcdavis1983, mrabarnett, r.david.murray, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2017-04-25 01:51 by jcdavis1983, last changed 2018-02-09 22:03 by serhiy.storchaka. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 1273 closed jcdavis1983, 2017-04-25 01:51
PR 5601 merged serhiy.storchaka, 2018-02-09 17:09
PR 5602 merged miss-islington, 2018-02-09 18:02
PR 5603 merged serhiy.storchaka, 2018-02-09 18:11
PR 5604 merged serhiy.storchaka, 2018-02-09 18:16
Messages (13)
msg292249 - (view) Author: Jake Davis (jcdavis1983) * Date: 2017-04-25 01:51
Line 220 of Lib/csv.py has an extra `>` in the first group:

r'(?P<delim>>[^\w\n"\'])
msg292254 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-04-25 08:39
What is the consequence of this change? Does it change the syntax of the parser? Which kind of format wasn't parsed correctly?
msg292267 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2017-04-25 16:02
There are 4 patterns. They try to determine the delimiter and quote by looking for matches. Each pattern supposedly covers one of 4 cases:

1. Delimiter, quote, value, quote, delimiter.

2. Start of line/text, quote, value, quote, delimiter.

3. Delimiter, quote, value, quote, end of line/text.

4. Start of line/text, quote, value, quote, end of line/text.

On that basis, case 3 looks wrong because the pattern for delimiter is:

    >[^\w\n"\']

instead of the expected:

    [^\w\n"\']

Looks like a bug to me.
msg292282 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-04-25 22:41
Can you please try to write a unit test to check for non-regression? Or at least give an example?
msg292290 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-04-26 01:02
If it is a bug that indicates there is at least one missing unit test :)  Maybe the OP will contribute a test.
msg292294 - (view) Author: Jake Davis (jcdavis1983) * Date: 2017-04-26 02:59
Will do! I will try to get a regression proof test into test_csv.py in the next 24 hours. Essentially I will make sure that the sniffer returns a positive match for each of the patterns that the regex is intended to hit.
msg292434 - (view) Author: Jake Davis (jcdavis1983) * Date: 2017-04-27 12:43
I've added some unittests for Sniffer._guess_quote_and_delimiter(); they should prevent regression.
msg311898 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-02-09 17:11
Since the original author didn't respond for long time I have recreated PR 1273 as PR 5601.
msg311902 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-02-09 18:00
New changeset 2411292ba8155327125d8a1da8a4c9fa003d5909 by Serhiy Storchaka in branch 'master':
bpo-30157: Fix csv.Sniffer.sniff() regex pattern. (GH-5601)
https://github.com/python/cpython/commit/2411292ba8155327125d8a1da8a4c9fa003d5909
msg311908 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-02-09 22:00
New changeset 2ef69a1d45de8aa41c45d32d9ee1ff227bb1a566 by Serhiy Storchaka (Miss Islington (bot)) in branch '3.7':
bpo-30157: Fix csv.Sniffer.sniff() regex pattern. (GH-5601) (GH-5602)
https://github.com/python/cpython/commit/2ef69a1d45de8aa41c45d32d9ee1ff227bb1a566
msg311909 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-02-09 22:01
New changeset 504f19145ca5738162d6a720fa45b364ac8c0384 by Serhiy Storchaka in branch '3.6':
[3.6] bpo-30157: Fix csv.Sniffer.sniff() regex pattern. (GH-5601) (GH-5603)
https://github.com/python/cpython/commit/504f19145ca5738162d6a720fa45b364ac8c0384
msg311910 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-02-09 22:02
New changeset e7197936c987bdf31b6b7b1dab275d1a762e03b3 by Serhiy Storchaka in branch '2.7':
[2.7] bpo-30157: Fix csv.Sniffer.sniff() regex pattern. (GH-5601) (GH-5604)
https://github.com/python/cpython/commit/e7197936c987bdf31b6b7b1dab275d1a762e03b3
msg311911 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-02-09 22:03
Thank you for your contribution Jake!
History
Date User Action Args
2018-02-09 22:03:21serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg311911

stage: patch review -> resolved
2018-02-09 22:02:07serhiy.storchakasetmessages: + msg311910
2018-02-09 22:01:42serhiy.storchakasetmessages: + msg311909
2018-02-09 22:00:56serhiy.storchakasetmessages: + msg311908
2018-02-09 18:16:06serhiy.storchakasetpull_requests: + pull_request5416
2018-02-09 18:11:25serhiy.storchakasetpull_requests: + pull_request5415
2018-02-09 18:02:01miss-islingtonsetpull_requests: + pull_request5414
2018-02-09 18:00:51serhiy.storchakasetmessages: + msg311902
2018-02-09 17:11:49serhiy.storchakasetversions: + Python 3.8, - Python 3.3, Python 3.4, Python 3.5
2018-02-09 17:11:35serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg311898
2018-02-09 17:09:59serhiy.storchakasetkeywords: + patch
stage: patch review
pull_requests: + pull_request5413
2017-04-27 12:43:09jcdavis1983setmessages: + msg292434
2017-04-26 02:59:51jcdavis1983setmessages: + msg292294
2017-04-26 01:02:30r.david.murraysetnosy: + r.david.murray
messages: + msg292290
2017-04-25 22:41:07vstinnersetmessages: + msg292282
2017-04-25 16:02:43mrabarnettsetnosy: + mrabarnett
messages: + msg292267
2017-04-25 08:39:01vstinnersetnosy: + vstinner
messages: + msg292254
2017-04-25 03:25:04louielusettitle: csn.Sniffer.sniff() regex error -> csv.Sniffer.sniff() regex error
2017-04-25 01:51:06jcdavis1983create