This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author remi.lapeyre
Recipients remi.lapeyre, serhiy.storchaka, sidhant
Date 2020-05-25.14:05:31
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1590415532.57.0.540581335385.issue40762@roundup.psfhosted.org>
In-reply-to
Content
> in real-life that b-prefixed string is just not readable by another program in an easy way

If another program opens this CSV file, it will read the string "b'A'" which is what this field actually contains. Everything that is not a number or a string gets converted to a string:

In [1]: import collections, dataclasses, random, secrets, io, csv 
   ...:  
   ...: Point = collections.namedtuple('Point', 'x y') 
   ...:  
   ...: @dataclasses.dataclass 
   ...: class Valar: 
   ...:     name: str 
   ...:     age: int 
   ...:  
   ...: a = Point(1, 2) 
   ...: b = Valar('Melkor', 2900) 
   ...: c = secrets.token_bytes(4) 
   ...:  
   ...: out = io.StringIO() 
   ...: f = csv.writer(out) 
   ...: f.writerow((a, b, c)) 
   ...:  
   ...: out.seek(0) 
   ...: print(out.read()) 
   ...:                                                                                                                                                                
"Point(x=1, y=2)","Valar(name='Melkor', age=2900)",b'\x95g6\xa2'

Here another would find three fields, all strings: "Point(x=1, y=2)", "Valar(name='Melkor', age=2900)" and "b'\x95g6\xa2'". Would you expect to get actual objects instead of strings when reading the two first fields?


> Incase it fails to decode using that, then it will throw a UnicodeDecodeError

I read your PR, but succeeding to decode it does not mean it's correct:

   In [4]: b'r\xc3\xa9sum\xc3\xa9'.decode('latin')                                                                                                                        
   Out[4]: 'résumé'

It worked, but is it the appropriate encoding? Probably not

   In [5]: b'r\xc3\xa9sum\xc3\xa9'.decode('utf8')                                                                                                                         
   Out[5]: 'résumé'



If you want to be able to save bytes, the best way is to use a format that can roundtrip bytes like parquet:

    In [18]: df = pd.DataFrame.from_dict({'a': [b'a']})                                                                                                                    

    In [19]: df.to_parquet('foo.parquet')                                                                                                                                  

    In [20]: type(pd.read_parquet('foo.parquet')['a'][0])                                                                                                                  
    Out[20]: bytes
History
Date User Action Args
2020-05-25 14:05:32remi.lapeyresetrecipients: + remi.lapeyre, serhiy.storchaka, sidhant
2020-05-25 14:05:32remi.lapeyresetmessageid: <1590415532.57.0.540581335385.issue40762@roundup.psfhosted.org>
2020-05-25 14:05:32remi.lapeyrelinkissue40762 messages
2020-05-25 14:05:31remi.lapeyrecreate