This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: csv.reader split error
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eric.smith, wy7305e
Priority: normal Keywords:

Created on 2020-05-01 04:05 by wy7305e, last changed 2022-04-11 14:59 by admin.

Files
File name Uploaded Description Edit
01_test_code.py wy7305e, 2020-05-01 04:05 use csv.reader split a sentence, return columns
Messages (2)
msg367823 - (view) Author: (wy7305e) * Date: 2020-05-01 04:05
#python 3.6 or python 3.8

csv.reader

delimiter=','
quotechar='"'

split this sentence:

"A word of encouragement and explanation, of pity for my childish ignorance, of welcome home, of reassurance to me that it was home, might have made me dutiful to him in my heart henceforth, instead of in my hypocritical<eword w=\"hypocritical\"></eword> outside, and might have made me respect instead of hate him. ","Part 1/CHAPTER 4. I FALL INTO DISGRACE/","David Copperfield"

return 4 columns, but it should return 3 columns.
msg367825 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2020-05-01 07:40
You should tell us what you're seeing, and what you're expecting.

I'm adding the rest of this not because it solves your problem, but because it might help you or someone else troubleshoot this further.

Here's a simpler reproducer:

import csv
lst = ['"A,"h"e, ","E","DC"']

csv_list = csv.reader(lst)
for idx, col in enumerate(next(csv_list)):
    print(idx, repr(col))

Which produces:
0 'A,h"e'
1 ' "'
2 'E'
3 'DC'

Although true to its word, this is using the default dialect='excel', and my version of Excel gives these same 4 columns, including the space starting the second column.

Dropping the space after the "e," gives 3 columns:

lst = ['"A,"h"e,","E","DC"']

Produces:
0 'A,h"e'
1 ',E"'
2 'DC'

Again, this is exactly what Excel gives, as odd as it seems.

It might be worth playing around with the dialect parameters to see if you can achieve what you want. In your example:
delimiter=',', quotechar='"'
are the default values for the "excel" dialect, which is why I dropped them above.
History
Date User Action Args
2022-04-11 14:59:30adminsetgithub: 84643
2020-05-01 07:40:16eric.smithsetnosy: + eric.smith
messages: + msg367825
2020-05-01 05:50:04SilentGhostsettype: enhancement -> behavior
versions: + Python 3.7, Python 3.8, - Python 3.6
2020-05-01 04:05:16wy7305ecreate