Title: test_csv struni fixes + unicode support in _csv
Type: Stage:
Components: None Versions: Python 3.0
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: gvanrossum Nosy List: gvanrossum, hupp, skip.montanaro
Priority: normal Keywords: patch

Created on 2007-08-04 00:11 by hupp, last changed 2008-01-06 22:29 by admin. This issue is now closed.

File name Uploaded Description Edit
py3k-struni-csv.patch hupp, 2007-08-04 00:11
Messages (4)
msg52984 - (view) Author: Adam Hupp (hupp) Date: 2007-08-04 00:11
This patch fixes for the struni branch and modifies _csv.c to support unicode strings.


 1. The failures caused by bytes/str conflicts have been resolved.  

 2. Uses of mkstemp have been replaced with TemporaryFile in a 'with' block. 

 3. The _csv.c module now uses unicode for string handling.   I've uncommented the unicode read tests in, and added tests for writing unicode content and a unicode delimiter.

All tests are now passing on my system (linux).
msg52985 - (view) Author: Skip Montanaro (skip.montanaro) * Date: 2007-08-05 13:07

I've spent some time looking at this patch.  Bear in mind this is my first foray into Py3k.  Still, I'm confused about what's going on here.  I'm hoping you can help me understand the changes.  In parse_save_field, you replaced PyString_FromStringAndSize with PyUnicode_FromUnicode, however in get_nullchar_as_None you replaced it with PyUnicode_DecodeASCII.

When I execute the csv tests there are a number of assertion errors related to the default delimiter.  The traceback goes something like this:

FAIL: test_writer_kw_attrs (__main__.Test_Csv)
Traceback (most recent call last):
  File "Lib/test/", line 88, in test_writer_kw_attrs
    self._test_kw_attrs(csv.writer, StringIO())
  File "Lib/test/", line 75, in _test_kw_attrs
    self.assertEqual(obj.dialect.delimiter, ':')
AssertionError: s'\x00' != ':'

Any idea how to solve that?  It looks to me like some Unicode buffer might be getting interpreted as a char *, but I'm not sure.

msg52986 - (view) Author: Adam Hupp (hupp) Date: 2007-08-05 16:39
I think the error you're seeing is being caused by a conversion from Py_UNICODE -> char -> unicode through get_nullchar_as_None.  That function should look like this:

static PyObject *
get_nullchar_as_None(Py_UNICODE c)
        if (c == '\0') {
                return Py_None;
            return PyUnicode_FromUnicode((Py_UNICODE*)&c, 1);

Unfortunately I'm on the road right now so I can't test it.

Is there something I need to do with my build to trigger those assertions?  I didn't see them.
msg52987 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-08-06 19:33
This looked good enough to submit.
I had to clean up the whitespace use in the C code. Please next time set your tabs to 8 spaces when editing C code. Also try to conform to the surrounding code's use of spaces or tab (unfortunately this file is inconsistent and sometimes uses spaces, other times tabs -- that's worth a separate cleanup).

Committed revision 56777.
Date User Action Args
2008-01-06 22:29:45adminsetkeywords: - py3k
versions: + Python 3.0
2007-08-04 00:11:43huppcreate