This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author keef604
Recipients keef604
Date 2017-04-10.21:50:24
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1491861025.06.0.741376367599.issue30034@psf.upfronthosting.co.za>
In-reply-to
Content
If a csv file has a quote character at the beginning of a field but no closing quote, the csv module will keep reading the file until the very end in an attempt to close out the field.  It's true this situation occurs only when the quoting in a csv file is incorrect, but it would be extremely helpful if the csv reader could be told to stop reading each row of fields when it encounters a newline character, even if it is within a quoted field at the time.  At the moment, with large files, the csv reader will typically error out in this situation once it reads the maximum size of a string.  Furthermore, this is not an easy situation to trap with custom code.

Here's an example of the what I'm talking about.  For a csv file with the following content:
a,b,c
d,"e,f
g,h,i

This code:

    import csv
    with open('file.txt') as f:
        reader = csv.reader(f)
        for row in reader:
            print(row)

returns:
['a', 'b', 'c']
['d', 'e,f\ng,h,i\n']

Note that the whole of the file after "e", including delimiters and newlines, has been added to the second field on the second line. This is correct csv behavior but is very unhelpful to me in this situation.

On the grounds that most csv files do not have multiline values within them, perhaps a new dialect attribute called "multiline" could be added to the csv module, that defaults to True for backwards compatibility.  It would indicate whether the csv file has any field values within it that span more than one line.  If multiline is False, then the "parse_process_char" function in "_csv" would always close out a row of fields when it encounters a newline character.  It might be best if this multiline attribute were taken into account only when "strict" is False.

Right now, I do get badly-formatted files like this, and I cannot ask the source for a new file.  I have to manually correct the file using a mixture of custom scripts and vi before the csv module will read it. It would be very helpful if csv would handle this directly.
History
Date User Action Args
2017-04-10 21:50:25keef604setrecipients: + keef604
2017-04-10 21:50:25keef604setmessageid: <1491861025.06.0.741376367599.issue30034@psf.upfronthosting.co.za>
2017-04-10 21:50:25keef604linkissue30034 messages
2017-04-10 21:50:24keef604create