Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csv.writer converts None to '""\n' when it is first line, otherwise '\n' #76436

Closed
Licht-T mannequin opened this issue Dec 8, 2017 · 12 comments
Closed

csv.writer converts None to '""\n' when it is first line, otherwise '\n' #76436

Licht-T mannequin opened this issue Dec 8, 2017 · 12 comments
Labels
3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@Licht-T
Copy link
Mannequin

Licht-T mannequin commented Dec 8, 2017

BPO 32255
Nosy @bitdancer, @serhiy-storchaka, @nitishch, @Licht-T
PRs
  • bpo-32255: Fix inconsistent behavior when csv.writer writes None #4769
  • [3.6] bpo-32255: Always quote a single empty field when write into a CSV file. (GH-4769) #4810
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2017-12-12.10:56:58.260>
    created_at = <Date 2017-12-08.14:43:54.282>
    labels = ['3.7', 'type-bug', 'library']
    title = 'csv.writer converts None to \'""\\n\' when it is first line, otherwise \'\\n\''
    updated_at = <Date 2017-12-12.10:56:58.259>
    user = 'https://github.com/Licht-T'

    bugs.python.org fields:

    activity = <Date 2017-12-12.10:56:58.259>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2017-12-12.10:56:58.260>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2017-12-08.14:43:54.282>
    creator = 'licht-t'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 32255
    keywords = ['patch']
    message_count = 12.0
    messages = ['307851', '307939', '307940', '307941', '307984', '307986', '307997', '308009', '308050', '308102', '308103', '308109']
    nosy_count = 4.0
    nosy_names = ['r.david.murray', 'serhiy.storchaka', 'nitishch', 'licht-t']
    pr_nums = ['4769', '4810']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue32255'
    versions = ['Python 3.6', 'Python 3.7']

    @Licht-T
    Copy link
    Mannequin Author

    Licht-T mannequin commented Dec 8, 2017

    Inconsistent behavior while reading a single column CSV.
    I have the patch and waiting for the CLA response.

    # Case 1
    ## Input

    import csv
    fp = open('test.csv', 'w')
    w = csv.writer(fp)
    w.writerow([''])
    w.writerow(['1'])
    fp.close()
    

    ## Output

    ""
    1
    

    # Case 2
    ## Input

    import csv
    fp = open('test.csv', 'w')
    w = csv.writer(fp)
    w.writerow(['1'])
    w.writerow([''])
    fp.close()
    

    ## Output

    1
    
    

    @Licht-T Licht-T mannequin added topic-IO 3.8 only security fixes 3.7 (EOL) end of life type-bug An unexpected behavior, bug, or error labels Dec 8, 2017
    @nitishch
    Copy link
    Mannequin

    nitishch mannequin commented Dec 10, 2017

    Which scenario you think is the wrong behaviour in this case? First one or second one?

    I don't know much about csv module, but I thought it was a deliberate choice made to quote all empty lines and hence considered the second scenario as buggy. But your pull requests seems to fix the first case. Am I missing something here?

    @Licht-T
    Copy link
    Mannequin Author

    Licht-T mannequin commented Dec 10, 2017

    I think the first one is buggy and there are two reasons.

    1. The both are valid CSV. The double quoting is unnecessary. Some other applications, eg. Excel, does not use the double quoting.
      Also, the current implementation make to quote only if the string is '' and the output is at the first line.

    2. '' is not quoted when the two columns case.
      ## Input:

    import csv
    fp = open('test.csv', 'w')
    w = csv.writer(fp, dialect=None)
    w.writerow(['', ''])
    w.writerow(['3', 'a'])
    fp.close()
    

    ## Output:

    ,
    3,a
    

    These seem inconsistent and the quoting is unnecessary in this case.

    # References
    http://www.ietf.org/rfc/rfc4180.txt

    @Licht-T
    Copy link
    Mannequin Author

    Licht-T mannequin commented Dec 10, 2017

    The current implementation does not quote in most case. IOW, the patch which makes all '' is quoted is the breaking change (Note that there are some applications does not use quoting).

    @bitdancer
    Copy link
    Member

    The second case is indeed the bug, as can be seen by running the examples against python2.7. It looks like this was probably broken by 7901b48 from bpo-23171.

    @bitdancer bitdancer added stdlib Python modules in the Lib dir and removed topic-IO 3.8 only security fixes labels Dec 10, 2017
    @bitdancer
    Copy link
    Member

    Serhiy, since it was your patch that probably introduced this bug, can you take a look? Obviously it isn't a very high priority bug, since no one has reported a problem (even this issue isn't reporting the change in behavior as a *problem* :)

    @serhiy-storchaka
    Copy link
    Member

    For restoring the 3.4 behavior the single empty field must be quoted. This allows to distinguish a 1-element row with the single empty field from an empty row.

    @Licht-T
    Copy link
    Mannequin Author

    Licht-T mannequin commented Dec 11, 2017

    Thanks for your investigation!
    Would you mind if I create a new patch?

    @Licht-T
    Copy link
    Mannequin Author

    Licht-T mannequin commented Dec 11, 2017

    PR is now fixed so as to follow the behavior on Python 2.7!

    @serhiy-storchaka
    Copy link
    Member

    New changeset 2001900 by Serhiy Storchaka (Licht Takeuchi) in branch 'master':
    bpo-32255: Always quote a single empty field when write into a CSV file. (bpo-4769)
    2001900

    @serhiy-storchaka
    Copy link
    Member

    Thank you for your contribution Licht!

    @serhiy-storchaka
    Copy link
    Member

    New changeset ce5a3cd by Serhiy Storchaka (Miss Islington (bot)) in branch '3.6':
    bpo-32255: Always quote a single empty field when write into a CSV file. (GH-4769) (bpo-4810)
    ce5a3cd

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants