Message 369937 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	sidhant
Recipients	remi.lapeyre, serhiy.storchaka, sidhant
Date	2020-05-26.02:44:53
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1590461093.72.0.417643512103.issue40762@roundup.psfhosted.org>
In-reply-to

Content
Hi Remi, I understand your concerns with the current approach to resolve this issue. I would like to propose a new/different change to the way `csv.writer` works. I am putting here the diff of how the updated docs (https://docs.python.org/3/library/csv.html#csv.writer) should look for my proposed change: -.. function:: writer(csvfile, dialect='excel', fmtparams) +.. function:: writer(csvfile, encoding=None, dialect='excel', fmtparams) Return a writer object responsible for converting the user's data into delimited strings on the given file-like object. csvfile can be any object with a :func:`write` method. If csvfile is a file object, it should be opened with - ``newline=''`` [1]_. An optional dialect + ``newline=''`` [1]_. An optional encoding parameter can be given which is + used to define how to decode the bytes encountered before writing them to + the csv. After being decoded using this encoding scheme, this resulting string + will then be transcoded from this encoding scheme to the encoding scheme specified + by the file object and be written into the CSV file. If the decoding or the + transcoding fails an error will be thrown. Incase this optional parameter is not + provided or is set to None then all the bytes will be stringified with :func:`str` + before being written just like all the other non-string data. Another optional dialect parameter can be given which is used to define a set of parameters specific to a particular CSV dialect. It may be an instance of a subclass of the :class:`Dialect` class or one of the strings returned by the import csv with open('eggs.csv', 'w', newline='') as csvfile: - spamwriter = csv.writer(csvfile, delimiter=' ', + spamwriter = csv.writer(csvfile, encoding='latin1', delimiter=' ', quotechar='\|', quoting=csv.QUOTE_MINIMAL) + spamwriter.writerow([b'\xc2', 'A', 'B']) spamwriter.writerow(['Spam'] * 5 + ['Baked Beans']) spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam']) (This diff can be found here: https://github.com/sidhant007/cpython/commit/50d809ca21eeab72edfd8c3e5a2e8a998fb467bd) > If another program opens this CSV file, it will read the string "b'A'" which is what this field actually contains. Everything that is not a number or a string gets converted to a string: In this proposal, I am proposing "bytes" to be treated specially just like strings and numbers are treated by the CSV module, since it also one of the primitive datatypes and more relevant than other use defined custom datatypes (in your example the point and person object) in a lot of use cases. > I read your PR, but succeeding to decode it does not mean it's correct Now we will be providing the user the option to decode according to what encoding scheme they want to and that will overcome this. If they provide no encoding scheme or set it to None we will simply revert to the current behaviour, i.e the b-prefixed string will be written to the CSV. This will ensure no accidental conversions using incorrect encoding schemes

Hi Remi, 

I understand your concerns with the current approach to resolve this issue. 
I would like to propose a new/different change to the way `csv.writer` works.

I am putting here the diff of how the updated docs (https://docs.python.org/3/library/csv.html#csv.writer) should look for my proposed change:

-.. function:: writer(csvfile, dialect='excel', **fmtparams)
+.. function:: writer(csvfile, encoding=None, dialect='excel', **fmtparams)

    Return a writer object responsible for converting the user's data into delimited
    strings on the given file-like object.  *csvfile* can be any object with a
    :func:`write` method.  If *csvfile* is a file object, it should be opened with
-   ``newline=''`` [1]_.  An optional *dialect*
+   ``newline=''`` [1]_.  An optional *encoding* parameter can be given which is
+   used to define how to decode the bytes encountered before writing them to
+   the csv. After being decoded using this encoding scheme, this resulting string
+   will then be transcoded from this encoding scheme to the encoding scheme specified
+   by the file object and be written into the CSV file. If the decoding or the
+   transcoding fails an error will be thrown. Incase this optional parameter is not
+   provided or is set to None then all the bytes will be stringified with :func:`str`
+   before being written just like all the other non-string data. Another optional *dialect*
    parameter can be given which is used to define a set of parameters specific to a
    particular CSV dialect.  It may be an instance of a subclass of the
    :class:`Dialect` class or one of the strings returned by the

       import csv
       with open('eggs.csv', 'w', newline='') as csvfile:
-          spamwriter = csv.writer(csvfile, delimiter=' ',
+          spamwriter = csv.writer(csvfile, encoding='latin1', delimiter=' ',
                                   quotechar='|', quoting=csv.QUOTE_MINIMAL)
+          spamwriter.writerow([b'\xc2', 'A', 'B'])
           spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
           spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])

(This diff can be found here: https://github.com/sidhant007/cpython/commit/50d809ca21eeab72edfd8c3e5a2e8a998fb467bd)

> If another program opens this CSV file, it will read the string "b'A'" which is what this field actually contains. Everything that is not a number or a string gets converted to a string:

In this proposal, I am proposing "bytes" to be treated specially just like strings and numbers are treated by the CSV module, since it also one of the primitive datatypes and more relevant than other use defined custom datatypes (in your example the point and person object) in a lot of use cases.

> I read your PR, but succeeding to decode it does not mean it's correct

Now we will be providing the user the option to decode according to what encoding scheme they want to and that will overcome this. If they provide no encoding scheme or set it to None we will simply revert to the current behaviour, i.e the b-prefixed string will be written to the CSV. This will ensure no accidental conversions using incorrect encoding schemes

History
Date	User	Action	Args
2020-05-26 02:44:53	sidhant	set	recipients: + sidhant, serhiy.storchaka, remi.lapeyre
2020-05-26 02:44:53	sidhant	set	messageid: <1590461093.72.0.417643512103.issue40762@roundup.psfhosted.org>
2020-05-26 02:44:53	sidhant	link	issue40762 messages
2020-05-26 02:44:53	sidhant	create