Message397821
Let's consider the following CSV content: "a|b\nc| 'd\ne|' f". The real delimiter in this case is '|' character while ' ' is sniffed. Find verbose example attached.
Problem lays in csv.py file in the following code:
```
matches = []
for restr in (r'(?P<delim>[^\w\n"\'])(?P<space> ?)(?P<quote>["\']).*?(?P=quote)(?P=delim)', # ,".*?",
r'(?:^|\n)(?P<quote>["\']).*?(?P=quote)(?P<delim>[^\w\n"\'])(?P<space> ?)', # ".*?",
r'(?P<delim>[^\w\n"\'])(?P<space> ?)(?P<quote>["\']).*?(?P=quote)(?:$|\n)', # ,".*?"
r'(?:^|\n)(?P<quote>["\']).*?(?P=quote)(?:$|\n)'): # ".*?" (no delim, no space)
regexp = re.compile(restr, re.DOTALL | re.MULTILINE)
matches = regexp.findall(data)
if matches:
break
```
What makes matches non-empty and farther processing happens with delimiter falsely set to ' '. |
|
Date |
User |
Action |
Args |
2021-07-19 17:40:33 | pt12lol | set | recipients:
+ pt12lol |
2021-07-19 17:40:33 | pt12lol | set | messageid: <1626716433.69.0.308565611704.issue44677@roundup.psfhosted.org> |
2021-07-19 17:40:33 | pt12lol | link | issue44677 messages |
2021-07-19 17:40:33 | pt12lol | create | |
|