Message 309811 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	xflr6
Recipients	kalaxy, lukasz.langa, maciej.szulik, mjohnson, python-dev, r.david.murray, serhiy.storchaka, terry.reedy, xflr6
Date	2018-01-11.14:57:58
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1515682678.27.0.467229070634.issue15927@psf.upfronthosting.co.za>
In-reply-to

Content
I am not sure about the design vs. code bug distinction, but what makes me think this should be fixed is primarily the broken round-trip (already mentioned above): >>> import io, csv >>> def roundtrip(value, fmtparams): with io.BytesIO() as f: csv.writer(f, fmtparams).writerow([value]) f.seek(0) return next(csv.reader(f, fmtparams)) >>> roundtrip('spam\neggs', quoting=csv.QUOTE_NONE, escapechar='\\') ['spam\n'] Furthermore, there is the inconsistency between Python 2 and 3, now that this has been fixed in 3.4. I agree that the documentation of Dialect.escapechar is not in line with the code (in both Python 2 and Python 3): How about changing it to something along the following lines (TODO: reformulate according to how exactly Dialect.lineterminator affects this)? "to escape the delimiter, \r, \n, and the quotechar if quoting is set to QUOTE_NONE and the quotechar for all other quoting styles if doublequote is False": >>> def write_csv(value, fmtparams): with io.BytesIO() as f: csv.writer(f, **fmtparams).writerow([value]) return f.getvalue() >>> write_csv('spam\reggs', quoting=csv.QUOTE_NONE, escapechar='\\') 'spam\\\reggs\r\n' >>> write_csv('spam\neggs', quoting=csv.QUOTE_NONE, escapechar='\\') 'spam\\\neggs\r\n' >>> write_csv('spam"eggs', quoting=csv.QUOTE_NONE, escapechar='\\') 'spam\\"eggs\r\n' >>> write_csv('spam"eggs', quoting=csv.QUOTE_NONE, quotechar=None, escapechar='\\') 'spam"eggs\r\n' >>> write_csv('spam"eggs', escapechar='\\', doublequote=False) 'spam\\"eggs\r\n' > In any case, 'one\nelement' and 'one\\\nelement' are each 2 physical lines. > I don't see anything in the doc about csv.reader joining physical lines > into 'logical' lines the way that compile() does. How about the following? "csvreader.line_num The number of lines read from the source iterator. This is not the same as the number of records returned, as records can span multiple lines." "On reading, the escapechar removes any special meaning from the following character." >>> write_csv('spam\neggs', quoting=csv.QUOTE_NONE) # with delimiter, \r, \n, and quotechar Traceback (most recent call last): ... Error: need to escape, but no escapechar set >>> roundtrip('spam\neggs') ['spam\neggs'] >>> write_csv('spam\neggs') '"spam\neggs"\r\n'

I am not sure about the design vs. code bug distinction, but what makes me think this should be fixed is primarily the broken round-trip (already mentioned above): 

>>> import io, csv
>>> def roundtrip(value, **fmtparams):
        with io.BytesIO() as f:
             csv.writer(f, **fmtparams).writerow([value])
             f.seek(0)
             return next(csv.reader(f, **fmtparams))
>>> roundtrip('spam\neggs', quoting=csv.QUOTE_NONE, escapechar='\\')
['spam\n']

Furthermore, there is the inconsistency between Python 2 and 3, now that this has been fixed in 3.4.

I agree that the documentation of Dialect.escapechar is not in line with the code (in both Python 2 and Python 3): How about changing it to something along the following lines (TODO: reformulate according to how exactly Dialect.lineterminator affects this)?

"to escape the delimiter, \r, \n, and the quotechar if quoting is set to QUOTE_NONE
and the quotechar for all other quoting styles if doublequote is False":

>>> def write_csv(value, **fmtparams):
        with io.BytesIO() as f:
            csv.writer(f, **fmtparams).writerow([value])
            return f.getvalue()
>>> write_csv('spam\reggs', quoting=csv.QUOTE_NONE, escapechar='\\')
'spam\\\reggs\r\n'
>>> write_csv('spam\neggs', quoting=csv.QUOTE_NONE, escapechar='\\')
'spam\\\neggs\r\n'
>>> write_csv('spam"eggs', quoting=csv.QUOTE_NONE, escapechar='\\')
'spam\\"eggs\r\n'
>>> write_csv('spam"eggs', quoting=csv.QUOTE_NONE, quotechar=None, escapechar='\\')
'spam"eggs\r\n'
>>> write_csv('spam"eggs', escapechar='\\', doublequote=False)
'spam\\"eggs\r\n'

> In any case, 'one\nelement' and 'one\\\nelement' are each 2 physical lines. 
> I don't see anything in the doc about csv.reader joining physical lines
> into 'logical' lines the way that compile() does.

How about the following?

"csvreader.line_num

    The number of lines read from the source iterator. This is not the same as the number of records returned, as records can span multiple lines."

"On reading, the escapechar removes any special meaning from the following character."

>>> write_csv('spam\neggs', quoting=csv.QUOTE_NONE)  # with delimiter, \r, \n, and quotechar
Traceback (most recent call last):
...
Error: need to escape, but no escapechar set

>>> roundtrip('spam\neggs')
['spam\neggs']

>>> write_csv('spam\neggs')
'"spam\neggs"\r\n'

History
Date	User	Action	Args
2018-01-11 14:57:58	xflr6	set	recipients: + xflr6, terry.reedy, r.david.murray, lukasz.langa, python-dev, serhiy.storchaka, maciej.szulik, kalaxy, mjohnson
2018-01-11 14:57:58	xflr6	set	messageid: <1515682678.27.0.467229070634.issue15927@psf.upfronthosting.co.za>
2018-01-11 14:57:58	xflr6	link	issue15927 messages
2018-01-11 14:57:58	xflr6	create