New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTF7 decoding is far too strict #48676
Comments
UTF-7 decoding raises an exception for any character not in the RFC2152 Looking at the source of unicodeobject.c, the call to the SPECIAL macro |
Can you write some tests to help fixing this issue? Stupid example (I
don't know UTF-8 encoding):
>>> all((byte.encode("utf-7") == byte) for byte in '<=>[]@')
>>> all((byte.decode("utf-7") == byte) for byte in '<=>[]@') |
# Note, this test covers issues 4425 and 4426 # Direct encoded characters:
set_d =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'(),-./:?" # Optional direct characters: all((c.encode('utf7') == c) for c in set_d) |
On 2008-11-25 12:11, Nick Barnes wrote:
Thanks for noticing this. Apparently, the UTF-7 codec is not used The tests we have do check round-trip safety, but not the special Also note that the code for the codec was contributed and is, AFAIK, |
Well, I could submit a diff for unicodeobject.c, but I have never |
Attach a patch file (unified diff, yes). |
On 2008-11-25 19:56, Nick Barnes wrote:
Please send unified diffs and attach them to the ticket. While you're at it: perhaps you could try to refactor the code a bit Thanks,Marc-Andre Lemburg 2008-11-12: Released mxODBC.Connect 0.9.3 http://python.egenix.com/ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 |
I'll try to get to this next week. Right now I'm snowed under. I don't |
My original defect report here was incorrect, or possibly only relates Any UTF-7 encoder has two boolean parameters: whether to base-64 encode |
Here is my patch. This is a rewrite of the UTF7 encoder and decoder. |
I'm not in a position to comment on the encoding algorithm itself but I
* your patch fails on another test:
Traceback (most recent call last):
File "Lib/test/test_unicode.py", line 538, in test_codecs_utf7
self.assertRaises(UnicodeDecodeError, '+\xc1'.decode, 'utf-7')
AssertionError: UnicodeDecodeError not raised |
I updated the patch to python trunk. I was hard because "patch -p1" Changes from utf7patch:
I didn't read the UTF-7 encoder or decoder code. I just updated the |
(oops, i stripped spaces in my last patch) |
A quick comment on the patch: it seems to invert (quite futily I'd say) |
This was my first contribution to Python. I don't know what the rules The name "encodeSetO" was meaningless: the function encodes *all* the Ditto for the "encodeWhiteSpace" name. Here's a trunk patch with the meaning of those parameters reverted, and |
Thanks for the update. Functions like PyUnicode_EncodeUTF7() are part of |
Committed in r72283 and r72285. Thanks! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: