This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Christoph.Rauch
Recipients Christoph.Rauch
Date 2013-01-31.15:28:27
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1359646107.71.0.993136506568.issue17090@psf.upfronthosting.co.za>
In-reply-to
Content
I have uncovered a strange behavior in io.TextIOWrapper which I think is a bug.

#!/usr/bin/env python
# encoding: utf-8

import csv 
import io
                                                                                                                                                                                                              
raw_file = io.FileIO('utf-8-encoded.csv', 'rb')
stream = io.BufferedReader(raw_file)
stream = io.TextIOWrapper(stream, encoding="UTF-8")
reader = csv.reader(stream, delimiter=";")

cells = 0 

for row in reader:
    # Cells should contain 4 Unicode characters.
    assert all([len(cell.decode('utf-8')) == 4 for cell in row]), row 
    cells += len(row)

assert cells == 210, cells

This produces a not very useful:

Traceback (most recent call last):
  File "utf8-textio-test.py", line 15, in <module>
    for row in reader:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-4: ordinal not in range(128)

The only way to let it *not* crash is to set encoding to ascii and errors to ignore, but this clears out all the characters with ord>128, clearly not useful as well, so I hope this behavior is not intended.

I appended a file with which to test this problem.
History
Date User Action Args
2013-01-31 15:28:27Christoph.Rauchsetrecipients: + Christoph.Rauch
2013-01-31 15:28:27Christoph.Rauchsetmessageid: <1359646107.71.0.993136506568.issue17090@psf.upfronthosting.co.za>
2013-01-31 15:28:27Christoph.Rauchlinkissue17090 messages
2013-01-31 15:28:27Christoph.Rauchcreate