Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csv module no longer works as expected when file opened in binary mode #49705

Closed
smontanaro opened this issue Mar 9, 2009 · 7 comments
Closed
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@smontanaro
Copy link
Contributor

BPO 5455
Nosy @smontanaro, @birkenfeld
Superseder
  • bpo-4847: csv fails when file is opened in binary mode
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2009-04-05.16:28:33.393>
    created_at = <Date 2009-03-09.02:48:38.353>
    labels = ['type-bug', 'library']
    title = 'csv module no longer works as expected when file opened in binary mode'
    updated_at = <Date 2009-04-05.16:28:33.386>
    user = 'https://github.com/smontanaro'

    bugs.python.org fields:

    activity = <Date 2009-04-05.16:28:33.386>
    actor = 'georg.brandl'
    assignee = 'none'
    closed = True
    closed_date = <Date 2009-04-05.16:28:33.393>
    closer = 'georg.brandl'
    components = ['Library (Lib)']
    creation = <Date 2009-03-09.02:48:38.353>
    creator = 'skip.montanaro'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 5455
    keywords = []
    message_count = 7.0
    messages = ['83350', '83351', '83353', '83355', '83376', '83380', '85524']
    nosy_count = 4.0
    nosy_names = ['skip.montanaro', 'georg.brandl', 'sjmachin', 'jdwhitley']
    pr_nums = []
    priority = 'normal'
    resolution = 'duplicate'
    stage = None
    status = 'closed'
    superseder = '4847'
    type = 'behavior'
    url = 'https://bugs.python.org/issue5455'
    versions = ['Python 3.0', 'Python 3.1']

    @smontanaro
    Copy link
    Contributor Author

    I just discovered that the csv module's reader class in 3.x doesn't work
    as expected when used as documented. The requirement has always been
    that the CSV file is opened in binary mode so that embedded newlines in
    fields are screwed up. Alas, in 3.x files opened in binary mode return
    their contents as bytes, not unicode strings which are apparently not
    allowed by the next() builtin:

    % python3.1
    Python 3.1a0 (py3k:70084M, Feb 28 2009, 20:46:48) 
    [GCC 4.0.1 (Apple Inc. build 5490)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import csv    
    >>> next(csv.reader(open("f.csv", "rb")))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    _csv.Error: iterator should return strings, not bytes (did you open the 
    file in text mode?)
    >>> next(csv.reader(open("f.csv", "r")))
    ['col1', 'col2', 'color']

    At the very least the documentation for the csv.reader class is no
    longer correct. However, I can't see how you can open a CSV file in
    text mode and not screw up embedded newlines. I think binary mode
    *has* to stay and some other way of dealing with bytes has to be found.

    @smontanaro smontanaro added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Mar 9, 2009
    @jdwhitley
    Copy link
    Mannequin

    jdwhitley mannequin commented Mar 9, 2009

    in _csv.c, the check is done here:

    lineobj = PyIter_Next(self->input_iter);
       if (lineobj == NULL) {
          /* End of input OR exception */
          if (!PyErr_Occurred() && self->field_len != 0)
             PyErr_Format(error_obj,
                "newline inside string");
             return NULL;
          }
    if (!PyUnicode_Check(lineobj)) {
       PyErr_Format(error_obj,
          "iterator should return strings, "
          "not %.200s "
          "(did you open the file in text mode?)",
          lineobj->ob_type->tp_name
       );
       Py_DECREF(lineobj);
       return NULL;
    }

    So the returned lineobj is a bytes type and then the PyUnicode_Check
    throws the error.

    @jdwhitley
    Copy link
    Mannequin

    jdwhitley mannequin commented Mar 9, 2009

    Hi Skip,

    Currently, once we are sure the lineobj is a unicode obj we then
    get it's internal buffer using:

    line = PyUnicode_AsUnicode(lineobj); 

    for the purpose of iterating through the line.

    is there an opportunity to use:

    line = PyBytes_AsString(lineobj); 

    (or similar approach if I have quoted an incorrect function) for the
    case that we have a bytes object (not Unicode)?

    @sjmachin
    Copy link
    Mannequin

    sjmachin mannequin commented Mar 9, 2009

    This is in effect a duplicate of bpo-4847.

    Summary:
    The docs are CORRECT.
    The 3.X implementation is WRONG.
    The 2.X implementation is CORRECT.

    See examples in my comment on bpo-4847.

    @smontanaro
    Copy link
    Contributor Author

    Jervis> So the returned lineobj is a bytes type and then the
    Jervis> PyUnicode_Check throws the error.

    Right, but given that fact how do you get a Unicode string out of the bytes
    without an encoding? You can't open a file in binary mode and give the
    encoding arg.

    @smontanaro
    Copy link
    Contributor Author

    John> The docs are CORRECT.
    John> The 3.X implementation is WRONG.
    John> The 2.X implementation is CORRECT.

    I agree. I posted a note to python-dev referencing both tickets. Hopefully
    one of the bytes/unicode experts there can shed some light on a possible
    solution.

    Skip

    @birkenfeld
    Copy link
    Member

    Setting bpo-4847 as superseder.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants