Message214800
I had a look at this and have the following remarks.
1) the file csv_sniffing_excel_tab.py no longer works with python 3.3. It now produces the folowing traceback:
Traceback (most recent call last):
File "csv_sniffing_excel_tab.py", line 36, in <module>
create_file()
File "csv_sniffing_excel_tab.py", line 23, in create_file
writer.writerows(test_data)
TypeError: 'str' does not support the buffer interface
2) The problem seems to be in the _guess_quote_and_delimiter method. If you always call _guess_delimiter, the sniffer give the correct result.
3) As far as I understand the problem is the first regular expression:
(?P<delim>[^\w\n"\'])(?P<space> ?)(?P<quote>["\']).*?(?P=quote)(?P=delim)
Now if we have a line as the following
273:MVREGR1:ByEuPo:"Baryton ""Euphonium"" populaire"
The delim group will match the space, the space group will match nothing the quote group will match " the non-group pattern will match "Euphonium" followed by the quote group matching " again and the delim group matching the space.
And so we get the wrong delimiter. |
|
Date |
User |
Action |
Args |
2014-03-25 09:30:30 | Antoon.Pardon | set | recipients:
+ Antoon.Pardon, GhislainHivon, dmi.baranov |
2014-03-25 09:30:30 | Antoon.Pardon | set | messageid: <1395739830.14.0.406314677209.issue17829@psf.upfronthosting.co.za> |
2014-03-25 09:30:30 | Antoon.Pardon | link | issue17829 messages |
2014-03-25 09:30:29 | Antoon.Pardon | create | |
|