New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
csv writer doesn't escape escapechar #56387
Comments
Consider the attached two files. A reader and writer with the same dialect parameters (escapechar \ quotechar " doublequote False) read, then write a CSV cell that looks like "C\\". It's written "C\". The problem is, when doublequote=False, the escapechar isn't used to escape itself, and the writer writes something that in the same dialect would be understood differently (\" isn't \ then end of string, it's an escaped quotechar within the string). Execute python err.py first.csv to see. |
Thanks for the report. It would be best if you could attach files as plain text instead of archives. |
I looked at this and tried to provide a patch + tests. Please review. The bug is that a writer can use writerow on some input data but if a reader with the same dialect reads them back they are different from the input ones. This happens when the input data contains escapechar. Contrary to msg136881, this happens regardless whether doublequote is True or False. The docs say "On reading, the escapechar removes any special meaning from the following character". Therefore, I understand that on writing, escapechar must always be escaped by itself. If that doesn't happen, when reading it back, escapechar alters the thing that follows it instead of counting as escapechar which is precisely what this bug is about. |
Thanks for the patch. The tests look good at first glance. I can’t comment on the C code, I don’t know C. Hopefully someone will do it, otherwise if you don’t get feedback in say four weeks you can ask for a review on python-dev. |
Hi, I can confirm that this behavior still exists in my current python versions (3.5.2 & 2.7.11). I'm happy to take a look at the code, but considering I made this account specifically to comment on this issue I assume someone else will want to, as well. If you want to make sure this is still broken, just use any of the docs' reader and writer examples, adding a trailing escape char to the end. |
The patch looked okay to me, and when applied to the 2.7 source, the new tests pass muster. I'm not going to pretend I know where this patch should be applied. That's for someone else to pronounce. |
Hi, is there something we can do to get this going? As the issue breaks round-trip, it currently requires work-arounds like this: https://github.com/cldf/csvw/blob/1324550266c821ef32d1e79c124191e93aefbfa8/csvw/dsv.py#L67-L71 |
I needed this patch for a project, so I compiled python with the patch applied, and tested my specific use case, and it worked. Thanks for the patch! |
FWIW, though this is arguably fixing a bug, IMO this shouldn't be back-ported to 2.7 or 3.7, but only be in 3.8+, to avoid breaking backwards-compatibility. That being said, it would be great to get this into 3.8.0! |
I think this issue needs to be escalated, as this is clearly a bug that makes it troublesome to use csv.reader and csv.writer with escapechar. |
After a great deal of delay, a fix for this has finally been merged, and will be available in Python 3.10. Thanks to everyone involved! |
Thanks Tal. AFAICT there was an undocumented change in behaviour related to this fix. Python 3.9 quotes values with escapechar:
Btw, from
this seems incorrect because escapechar is not mentioned (but at the same time it says 'such as') and maybe better matching the name 'minimal' (or one might expect 'more' quoting as a better default). Python 3.10: Lines 207 to 208 in 5c0eed7
See also https://github.com/xflr6/csv23/actions/runs/1027687524 |
Hey @taleinat @berkerpeksag -- So this change actually introduced a pretty annoying bug as the following code now fails with the
This code correctly worked on Python 3.9. The issue is this C code snipped added by the fix:
When no The only workaround is to always specify an |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: