Message 233549 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	jdufresne
Recipients	jdufresne
Date	2015-01-06.19:05:05
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1420571105.9.0.00648910593011.issue23178@psf.upfronthosting.co.za>
In-reply-to

Content
The following test script demonstrates that Python's csv library does not handle a BOM. I would expect the returned row to be equal to expected and to print 'True' to stdout. In the wild, it is typical for other CSV writers to add a BOM. MS Excel is especially picky about the BOM when reading a utf-8 encoded file. So many writers add a BOM for interopability with MS Excel. If a python program accepts a CSV file as input (often the case in web apps), these files will not be handled correctly without preprocessing. In my opinion, this should "just work" when reading the file. --- import codecs import csv f = open('foo.csv', 'wb') f.write(codecs.BOM_UTF8 + b'a,b,c') f.close() expected = ['a', 'b', 'c'] f = open('foo.csv') r = csv.reader(f) row = next(r) print(row) print(row == expected) --- Output --- $ ./python ~/test.py ['\ufeffa', 'b', 'c'] False ---

The following test script demonstrates that Python's csv library does not handle a BOM. I would expect the returned row to be equal to expected and to print 'True' to stdout.

In the wild, it is typical for other CSV writers to add a BOM. MS Excel is especially picky about the BOM when reading a utf-8 encoded file. So many writers add a BOM for interopability with MS Excel.

If a python program accepts a CSV file as input (often the case in web apps), these files will not be handled correctly without preprocessing. In my opinion, this should "just work" when reading the file.

---
import codecs
import csv

f = open('foo.csv', 'wb')
f.write(codecs.BOM_UTF8 + b'a,b,c')
f.close()

expected = ['a', 'b', 'c']
f = open('foo.csv')
r = csv.reader(f)
row = next(r)

print(row)
print(row == expected)
---

Output
---
$ ./python ~/test.py
['\ufeffa', 'b', 'c']
False
---

History
Date	User	Action	Args
2015-01-06 19:05:06	jdufresne	set	recipients: + jdufresne
2015-01-06 19:05:05	jdufresne	set	messageid: <1420571105.9.0.00648910593011.issue23178@psf.upfronthosting.co.za>
2015-01-06 19:05:05	jdufresne	link	issue23178 messages
2015-01-06 19:05:05	jdufresne	create