This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: csv.reader does not handle BOM
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Python3: guess text file charset using the BOM
View: 7651
Assigned To: Nosy List: jdufresne, r.david.murray
Priority: normal Keywords:

Created on 2015-01-06 19:05 by jdufresne, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (2)
msg233549 - (view) Author: Jon Dufresne (jdufresne) * Date: 2015-01-06 19:05
The following test script demonstrates that Python's csv library does not handle a BOM. I would expect the returned row to be equal to expected and to print 'True' to stdout.

In the wild, it is typical for other CSV writers to add a BOM. MS Excel is especially picky about the BOM when reading a utf-8 encoded file. So many writers add a BOM for interopability with MS Excel.

If a python program accepts a CSV file as input (often the case in web apps), these files will not be handled correctly without preprocessing. In my opinion, this should "just work" when reading the file.

---
import codecs
import csv

f = open('foo.csv', 'wb')
f.write(codecs.BOM_UTF8 + b'a,b,c')
f.close()

expected = ['a', 'b', 'c']
f = open('foo.csv')
r = csv.reader(f)
row = next(r)

print(row)
print(row == expected)
---

Output
---
$ ./python ~/test.py
['\ufeffa', 'b', 'c']
False
---
msg233550 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-01-06 19:52
This is not a problem with the csv module in particular.  See issue 7651.
History
Date User Action Args
2022-04-11 14:58:11adminsetgithub: 67367
2015-01-06 19:52:05r.david.murraysetstatus: open -> closed

superseder: Python3: guess text file charset using the BOM

nosy: + r.david.murray
messages: + msg233550
resolution: duplicate
stage: resolved
2015-01-06 19:05:05jdufresnecreate