New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
punycode codec ignores the error handler argument #56472
Comments
b'abc\xff'.decode('punycode', 'ignore') raises the same error than b'abc\xff'.decode('punycode', 'replace') or b'abc\xff'.decode('punycode'): it uses the strict error handler, and simply ignores the error handler. punycodec, as idna, should raise an exception if the error handler is different than strict. |
base64_codec.py uses an assertion to check the error handler: it should use an if+raise (assertions are ignored in optimized mode, python -O). |
punycode_errors.patch:
I don't think that the change should be documented, because punycode has no section in Doc/library/codecs.rst, and I hope that nobody uses something different than strict :-) |
@martin: Can you review my patch? |
What's the point of disallowing the replace error handler? That's a slightly incompatible change, isn't it? |
Oh, I forgot to give a little bit more details. b'abc\xff-'.decode('punycode', 'ignore') and b'abc\xff-'.decode('punycode', 'replace') raise a UnicodeDecodeError: the error handler is just useless (ignored) here. With my patch, b'abc\xff-'.decode('punycode', 'ignore') gives 'abc'. (If I change the code to accept replace) b'abc\xff-'.decode('punycode', 'replace') gives also 'abc', but 'replace' doesn't work correctly in the part after "-" contain illegal byte sequences. For example, b'a\xff-\xffb\xffga\xff'.decode("punycode", "replace") gives 'a�', whereas I would expect 'a�é' or 'aé�'. b'a-bga\xff'.decode("punycode", "replace") gives 'aé' as b'a-bga'.decode("punycode", "replace"), whereas I would expect 'aé�' or something like that.
It's just that I'm unable to patch punycode decoder to support the replace handler. Do you want to "implement" it?
I don't think so because I consider that the punycode decoder never supported error handlers (other than strict) in Python 3. What do you think? |
That's not my point: b"foo".decode("punycode","replace") currently succeeds, but raises an UnicodeError under the patch. |
I prefer to not touch punycode right now, it works, there is no need to modify it. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: