This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author pt12lol
Recipients pt12lol
Date 2021-07-19.17:40:33
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1626716433.69.0.308565611704.issue44677@roundup.psfhosted.org>
In-reply-to
Content
Let's consider the following CSV content: "a|b\nc| 'd\ne|' f". The real delimiter in this case is '|' character while ' ' is sniffed. Find verbose example attached.

Problem lays in csv.py file in the following code:

```
        matches = []
        for restr in (r'(?P<delim>[^\w\n"\'])(?P<space> ?)(?P<quote>["\']).*?(?P=quote)(?P=delim)', # ,".*?",
                      r'(?:^|\n)(?P<quote>["\']).*?(?P=quote)(?P<delim>[^\w\n"\'])(?P<space> ?)',   #  ".*?",
                      r'(?P<delim>[^\w\n"\'])(?P<space> ?)(?P<quote>["\']).*?(?P=quote)(?:$|\n)',   # ,".*?"
                      r'(?:^|\n)(?P<quote>["\']).*?(?P=quote)(?:$|\n)'):                            #  ".*?" (no delim, no space)
            regexp = re.compile(restr, re.DOTALL | re.MULTILINE)
            matches = regexp.findall(data)
            if matches:
                break
```

What makes matches non-empty and farther processing happens with delimiter falsely set to ' '.
History
Date User Action Args
2021-07-19 17:40:33pt12lolsetrecipients: + pt12lol
2021-07-19 17:40:33pt12lolsetmessageid: <1626716433.69.0.308565611704.issue44677@roundup.psfhosted.org>
2021-07-19 17:40:33pt12lollinkissue44677 messages
2021-07-19 17:40:33pt12lolcreate