Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csv writer doesn't escape escapechar #56387

Closed
ebreck mannequin opened this issue May 25, 2011 · 14 comments
Closed

csv writer doesn't escape escapechar #56387

ebreck mannequin opened this issue May 25, 2011 · 14 comments
Labels
3.10 only security fixes extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error

Comments

@ebreck
Copy link
Mannequin

ebreck mannequin commented May 25, 2011

BPO 12178
Nosy @taleinat, @merwok, @berkerpeksag, @xflr6, @avatarofhope2
PRs
  • bpo-12178: Fix escaping of escapechar in csv.writer() #13710
  • Files
  • pybug.zip
  • 0eb420ce6567.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-09-20.06:40:59.124>
    created_at = <Date 2011-05-25.18:27:10.453>
    labels = ['extension-modules', 'type-bug', '3.10']
    title = "csv writer doesn't escape escapechar"
    updated_at = <Date 2021-07-13.19:26:06.249>
    user = 'https://bugs.python.org/ebreck'

    bugs.python.org fields:

    activity = <Date 2021-07-13.19:26:06.249>
    actor = 'xflr6'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-09-20.06:40:59.124>
    closer = 'taleinat'
    components = ['Extension Modules']
    creation = <Date 2011-05-25.18:27:10.453>
    creator = 'ebreck'
    dependencies = []
    files = ['22112', '22596']
    hgrepos = ['38']
    issue_num = 12178
    keywords = ['patch', 'needs review']
    message_count = 13.0
    messages = ['136881', '136973', '139953', '140002', '273746', '273784', '309815', '349720', '352447', '355118', '377206', '377207', '397440']
    nosy_count = 11.0
    nosy_names = ['taleinat', 'eric.araujo', 'catalin.iacob', 'ebreck', 'berker.peksag', 'xflr6', 'Apathymannequin', 'Dmitriy Shashkin', 'Alex Shpilkin', 'avatarofhope', 'Aldrian Obaja']
    pr_nums = ['13710']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue12178'
    versions = ['Python 3.10']

    @ebreck
    Copy link
    Mannequin Author

    ebreck mannequin commented May 25, 2011

    Consider the attached two files. A reader and writer with the same dialect parameters (escapechar \ quotechar " doublequote False) read, then write a CSV cell that looks like "C\\". It's written "C\". The problem is, when doublequote=False, the escapechar isn't used to escape itself, and the writer writes something that in the same dialect would be understood differently (\" isn't \ then end of string, it's an escaped quotechar within the string).

    Execute python err.py first.csv to see.

    @ebreck ebreck mannequin added the type-bug An unexpected behavior, bug, or error label May 25, 2011
    @merwok
    Copy link
    Member

    merwok commented May 26, 2011

    Thanks for the report. It would be best if you could attach files as plain text instead of archives.

    @merwok merwok added the extension-modules C modules in the Modules dir label May 26, 2011
    @cataliniacob
    Copy link
    Mannequin

    cataliniacob mannequin commented Jul 6, 2011

    I looked at this and tried to provide a patch + tests. Please review.

    The bug is that a writer can use writerow on some input data but if a reader with the same dialect reads them back they are different from the input ones. This happens when the input data contains escapechar. Contrary to msg136881, this happens regardless whether doublequote is True or False.

    The docs say "On reading, the escapechar removes any special meaning from the following character". Therefore, I understand that on writing, escapechar must always be escaped by itself. If that doesn't happen, when reading it back, escapechar alters the thing that follows it instead of counting as escapechar which is precisely what this bug is about.

    @merwok
    Copy link
    Member

    merwok commented Jul 7, 2011

    Thanks for the patch. The tests look good at first glance. I can’t comment on the C code, I don’t know C. Hopefully someone will do it, otherwise if you don’t get feedback in say four weeks you can ask for a review on python-dev.

    @Apathymannequin
    Copy link
    Mannequin

    Apathymannequin mannequin commented Aug 27, 2016

    Hi,

    I can confirm that this behavior still exists in my current python versions (3.5.2 & 2.7.11). I'm happy to take a look at the code, but considering I made this account specifically to comment on this issue I assume someone else will want to, as well.

    If you want to make sure this is still broken, just use any of the docs' reader and writer examples, adding a trailing escape char to the end.

    @smontanaro
    Copy link
    Contributor

    The patch looked okay to me, and when applied to the 2.7 source, the new tests pass muster.

    I'm not going to pretend I know where this patch should be applied. That's for someone else to pronounce.

    @xflr6
    Copy link
    Mannequin

    xflr6 mannequin commented Jan 11, 2018

    Hi, is there something we can do to get this going? As the issue breaks round-trip, it currently requires work-arounds like this: https://github.com/cldf/csvw/blob/1324550266c821ef32d1e79c124191e93aefbfa8/csvw/dsv.py#L67-L71

    @avatarofhope2
    Copy link
    Mannequin

    avatarofhope2 mannequin commented Aug 14, 2019

    I needed this patch for a project, so I compiled python with the patch applied, and tested my specific use case, and it worked. Thanks for the patch!

    @taleinat
    Copy link
    Contributor

    FWIW, though this is arguably fixing a bug, IMO this shouldn't be back-ported to 2.7 or 3.7, but only be in 3.8+, to avoid breaking backwards-compatibility.

    That being said, it would be great to get this into 3.8.0!

    @AldrianObaja
    Copy link
    Mannequin

    AldrianObaja mannequin commented Oct 22, 2019

    I think this issue needs to be escalated, as this is clearly a bug that makes it troublesome to use csv.reader and csv.writer with escapechar.

    @larryhastings larryhastings added 3.8 only security fixes 3.9 only security fixes labels Oct 22, 2019
    @taleinat
    Copy link
    Contributor

    New changeset 5c0eed7 by Berker Peksag in branch 'master':
    bpo-12178: Fix escaping of escapechar in csv.writer() (GH-13710)
    5c0eed7

    @taleinat
    Copy link
    Contributor

    After a great deal of delay, a fix for this has finally been merged, and will be available in Python 3.10.

    Thanks to everyone involved!

    @taleinat taleinat added 3.10 only security fixes and removed 3.8 only security fixes 3.9 only security fixes labels Sep 20, 2020
    @xflr6
    Copy link
    Mannequin

    xflr6 mannequin commented Jul 13, 2021

    Thanks Tal.

    AFAICT there was an undocumented change in behaviour related to this fix.

    Python 3.9 quotes values with escapechar:

    import csv
    import io
    
    kwargs = {'escapechar': '\\'}
    
    value = 'spam\\eggs'
    
    print(value)
    
    with io.StringIO() as buf:
        writer = csv.writer(buf, **kwargs)
        writer.writerow([value])
        line = buf.getvalue()
    
    print(line.strip())
    
    with io.StringIO(line) as buf:
        reader = csv.reader(buf, **kwargs)
        (new_value,), = reader
    
    print(new_value)
    spam\eggs
    "spam\eggs"
    spameggs
    
    • quotes escapechar
    • fails to double the escapechar (this bug)

    Btw, from
    https://docs.python.org/3/library/csv.html#csv.QUOTE_MINIMAL

    only quote those fields which contain special characters
    such as delimiter, quotechar or any of the characters in lineterminator.

    this seems incorrect because escapechar is not mentioned (but at the same time it says 'such as') and maybe better matching the name 'minimal' (or one might expect 'more' quoting as a better default).

    Python 3.10:

    self._write_test(['\\', 'a'], '\\\\,a',
    escapechar='\\', quoting=csv.QUOTE_MINIMAL)

    See also https://github.com/xflr6/csv23/actions/runs/1027687524

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @eshmu
    Copy link

    eshmu commented Sep 23, 2022

    Hey @taleinat @berkerpeksag -- So this change actually introduced a pretty annoying bug as the following code now fails with the need to escape, but no escapechar set error:

    import csv
    
    value = 'my\x00string'
    
    with io.StringIO() as buf:
        writer = csv.writer(buf)
        writer.writerow([value])
    

    This code correctly worked on Python 3.9.

    The issue is this C code snipped added by the fix:

       else if (c == dialect->escapechar) {
                         want_escape = 1;
                     }
    

    When no escapechar is specified then dialect->escarechar is equal to 0 and thus matches the null character (0x00 above) in the string to be written. This results in the code trying to "escape the escape character" even tho no escape character was specified and thus, no escaping is required.

    The only workaround is to always specify an escapechar but that is often not desirable.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants