msg200117 - (view) |
Author: Guillaume Lebourgeois (glebourgeois) |
Date: 2013-10-17 09:55 |
After the fetch of a webpage with a wrongly declared encoding, the use of codecs module for a conversion crashes.
The issue is reproducible this way :
>>> content = b"+1911\' rel=\'stylesheet\' type=\'text/css\' />\n<link rel="alternate" type="application/rss+xml"
>>> codecs.utf_7_decode(content, "replace", True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
SystemError: invalid maximum character passed to PyUnicode_New
Original issue here : https://github.com/kennethreitz/requests/issues/1682
|
msg200132 - (view) |
Author: Matthew Barnett (mrabarnett) *  |
Date: 2013-10-17 14:54 |
The bytestring literal isn't valid. It starts with b" and later on has an unescaped " followed by more characters.
Also, the usual way to decode by using the .decode method.
I get this:
>>> content = b"+1911\' rel=\'stylesheet\' type=\'text/css\' />\n<link rel=\"alternate\" type=\"application/rss+xml\""
>>> content.decode("utf-7", "strict")
Traceback (most recent call last):
File "<pyshell#10>", line 1, in <module>
content.decode("utf-7", "strict")
File "C:\Python33\lib\encodings\utf_7.py", line 12, in decode
return codecs.utf_7_decode(input, errors, True)
UnicodeDecodeError: 'utf7' codec can't decode bytes in position 0-5: partial character in shift sequence
|
msg200133 - (view) |
Author: Guillaume Lebourgeois (glebourgeois) |
Date: 2013-10-17 15:07 |
My fault, bad paste. Should have written :
>>> content = b'+1911\' rel=\'stylesheet\' type=\'text/css\' />\n<link rel="alternate" type="application/rss+xml'
>>> codecs.utf_7_decode(content, "replace", True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
SystemError: invalid maximum character passed to PyUnicode_New
|
msg200134 - (view) |
Author: Guillaume Lebourgeois (glebourgeois) |
Date: 2013-10-17 15:13 |
"Also, the usual way to decode by using the .decode method."
The original bug happened using requests library, so I have no leverage on the used method for decoding.
But if you used the "replace" mode with your methodology, you would have raised the same Exception :
>>> content = b'+1911\' rel=\'stylesheet\' type=\'text/css\' />\n<link rel="alternate" type="application/rss+xml'
>>> content.decode("utf-7", "replace")
File "<stdin>", line 1, in <module>
File "/lib/python3.3/encodings/utf_7.py", line 12, in decode
return codecs.utf_7_decode(input, errors, True)
SystemError: invalid maximum character passed to PyUnicode_New
|
msg200135 - (view) |
Author: Alyssa Coghlan (ncoghlan) *  |
Date: 2013-10-17 15:41 |
Indeed, 'utf-7' and the 'replace' error handler don't get along in this case.
|
msg200136 - (view) |
Author: Alyssa Coghlan (ncoghlan) *  |
Date: 2013-10-17 15:41 |
That is, I can locally reproduce the behaviour Guillaume describes on the latest tip build.
|
msg200144 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-10-17 16:29 |
Here is a patch for 3.3+.
Other versions are affected too. They don't raise SystemError, but produce illegal unicode string on wide build.
E.g. in Python 2.7:
>>> 'a+/,+IKw-b'.decode('utf-7', 'replace')
u'a\ufffd\U003f20acb'
\U003f20ac is illegal code.
As encoding and encoded data can come from external source, this can be used in secure attacks.
|
msg200253 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-10-18 13:47 |
And here is a patch for 2.7.
|
msg200263 - (view) |
Author: Barry A. Warsaw (barry) *  |
Date: 2013-10-18 14:33 |
2.6.9 doesn't produce a SystemError afaict:
Python 2.6.9rc1+ (unknown, Oct 18 2013, 10:29:22)
[GCC 4.4.3] on linux3
Type "help", "copyright", "credits" or "license" for more information.
>>> content = b'+1911\' rel=\'stylesheet\' type=\'text/css\' />\n<link rel="alternate" type="application/rss+xml'
>>> content.decode("utf-7", "replace")
u'\ud7dd\ufffd rel=\'stylesheet\' type=\'text\ufffdcss\' \ufffd>\n<link rel="alternate" type="application\ufffdrss\uc669\ufffd'
|
msg200264 - (view) |
Author: Barry A. Warsaw (barry) *  |
Date: 2013-10-18 14:36 |
On Oct 18, 2013, at 02:33 PM, Barry A. Warsaw wrote:
>2.6.9 doesn't produce a SystemError afaict:
Please note that 2.6.9 is security only, so the threshold for worrying about
things is a remotely exploitable security vulnerability that cannot be
reasonably worked around in Python code.
|
msg200353 - (view) |
Author: Larry Hastings (larry) *  |
Date: 2013-10-19 01:24 |
Ping. Please fix before "beta 1".
|
msg200450 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2013-10-19 17:39 |
New changeset 214c0aac7540 by Serhiy Storchaka in branch '2.7':
Issue #19279: UTF-7 decoder no more produces illegal unicode strings.
http://hg.python.org/cpython/rev/214c0aac7540
New changeset f471f2f05621 by Serhiy Storchaka in branch '3.3':
Issue #19279: UTF-7 decoder no more produces illegal strings.
http://hg.python.org/cpython/rev/f471f2f05621
New changeset 7dde9c553f16 by Serhiy Storchaka in branch 'default':
Issue #19279: UTF-7 decoder no more produces illegal strings.
http://hg.python.org/cpython/rev/7dde9c553f16
|
msg200465 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2013-10-19 18:17 |
New changeset 73ab6aba24e5 by Serhiy Storchaka in branch '3.3':
Fixed tests for issue #19279.
http://hg.python.org/cpython/rev/73ab6aba24e5
|
msg201508 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2013-10-27 23:26 |
@Serhiy: What is the status of the issue?
|
msg201515 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-10-28 06:27 |
The bug is fixed on maintenance releases. Maintainer of 3.2 can backport the fix to 3.2 if it worth.
|
msg207788 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2014-01-09 19:39 |
Georg, is this issue wort to be fixed in 3.2? If yes, use the patch against 2.7.
|
msg215458 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2014-04-03 17:00 |
> Georg, is this issue wort to be fixed in 3.2? If yes, use the patch against 2.7.
Ping?
|
msg222203 - (view) |
Author: Mark Lawrence (BreamoreBoy) * |
Date: 2014-07-03 17:51 |
To repeat the question do we or don't we fix this in 3.2?
|
msg222223 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2014-07-03 21:41 |
I suggest to close the issue. It's "just" another way to crash Python 3.2, like any other bug fix. Python 3.2 does not accept bug fixes anymore.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:52 | admin | set | github: 63478 |
2014-07-04 18:39:11 | serhiy.storchaka | set | title: UTF-7 can produce inconsistent Unicode string -> UTF-7 decoder can produce inconsistent Unicode string |
2014-07-04 18:38:35 | serhiy.storchaka | set | status: open -> closed title: UTF-7 to UTF-8 decoding crash -> UTF-7 can produce inconsistent Unicode string stage: patch review -> resolved resolution: fixed versions:
+ Python 2.7, Python 3.3, Python 3.4, - Python 3.2 |
2014-07-03 21:41:45 | vstinner | set | messages:
+ msg222223 |
2014-07-03 17:51:26 | BreamoreBoy | set | nosy:
+ BreamoreBoy messages:
+ msg222203
|
2014-04-03 17:00:34 | vstinner | set | messages:
+ msg215458 |
2014-01-09 19:39:08 | serhiy.storchaka | set | messages:
+ msg207788 |
2013-11-22 07:09:49 | mcepl | set | nosy:
+ mcepl
|
2013-10-28 06:27:12 | serhiy.storchaka | set | messages:
+ msg201515 |
2013-10-27 23:26:38 | vstinner | set | messages:
+ msg201508 |
2013-10-22 17:31:20 | serhiy.storchaka | set | assignee: serhiy.storchaka -> versions:
- Python 2.7, Python 3.3, Python 3.4 |
2013-10-19 18:17:20 | python-dev | set | messages:
+ msg200465 |
2013-10-19 17:39:55 | python-dev | set | nosy:
+ python-dev messages:
+ msg200450
|
2013-10-19 01:24:47 | larry | set | messages:
+ msg200353 |
2013-10-18 14:40:57 | barry | set | versions:
- Python 2.6 |
2013-10-18 14:36:25 | barry | set | messages:
+ msg200264 |
2013-10-18 14:33:18 | barry | set | messages:
+ msg200263 |
2013-10-18 13:47:06 | serhiy.storchaka | set | files:
+ utf7_errors-2.7.patch
messages:
+ msg200253 |
2013-10-18 10:32:53 | piotr.dobrogost | set | nosy:
+ piotr.dobrogost
|
2013-10-17 16:29:57 | serhiy.storchaka | set | files:
+ utf7_errors.patch priority: normal -> release blocker type: crash -> security
versions:
+ Python 2.6, Python 2.7, Python 3.2 keywords:
+ patch nosy:
+ larry, benjamin.peterson, barry, georg.brandl
messages:
+ msg200144 stage: needs patch -> patch review |
2013-10-17 15:41:54 | ncoghlan | set | messages:
+ msg200136 |
2013-10-17 15:41:05 | ncoghlan | set | nosy:
+ ncoghlan messages:
+ msg200135
|
2013-10-17 15:13:00 | glebourgeois | set | messages:
+ msg200134 |
2013-10-17 15:07:30 | glebourgeois | set | messages:
+ msg200133 |
2013-10-17 14:54:02 | mrabarnett | set | nosy:
+ mrabarnett messages:
+ msg200132
|
2013-10-17 10:02:11 | vstinner | set | nosy:
+ vstinner
|
2013-10-17 09:57:27 | serhiy.storchaka | set | versions:
+ Python 3.4 nosy:
+ ezio.melotti, serhiy.storchaka
assignee: serhiy.storchaka components:
+ Unicode stage: needs patch |
2013-10-17 09:55:36 | glebourgeois | create | |