This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Integer overflow in unicode-escape decoder
Type: behavior Stage: resolved
Components: Interpreter Core, Unicode Versions: Python 3.2, Python 3.3, Python 3.4, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: benjamin.peterson, ezio.melotti, lemburg, mark.dickinson, pitrou, python-dev, serhiy.storchaka, skrah, terry.reedy, vstinner
Priority: normal Keywords: patch

Created on 2012-10-26 22:55 by serhiy.storchaka, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
decode_unicode_escape_overflow-3.3.patch serhiy.storchaka, 2012-11-09 14:41 review
decode_unicode_escape_overflow-3.2.patch serhiy.storchaka, 2012-11-09 14:41 review
decode_unicode_escape_overflow-2.7.patch serhiy.storchaka, 2012-11-09 14:41 review
Messages (27)
msg173902 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-26 22:55
Size of parsed Unicode character name casted to int in unicode-escape decoder.  This can cause integer overflow.
msg174169 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-10-30 00:48
If I understood correctly, (b'\\N{' + b'x' * (INT_MAX+1)) + '}').decode('unicode-decode') may crash? Did you try such string?
msg174190 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-30 09:39
(b'\\N{WHITE SMILING FACE' + b'x' * 2**32 + '}').decode('unicode-escape') may pass on platform with 32-bit int and more than 32-bit size_t if there is enough memory.

I don't have so much memory.
msg174381 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-10-31 22:08
I have 12 GB of RAM. Let's test.

$ ./python 
Python 3.4.0a0 (default:8573a86c11b5+, Oct 31 2012, 22:17:00) 
[GCC 4.6.3 20120306 (Red Hat 4.6.3-2)] on linux
>>> x=(b'\\N{WHITE SMILING FACE' + b'x' * 2**32 + b'}')
>>> len(x)
4294967318
>>> y=x.decode('unicode-escape')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError

There is no crash, but it would be better to get a SyntaxError("(unicode error) 'unicodeescape' codec can't decode bytes in position 0-6: unknown Unicode character name") instead.

I propose to only fix this issue in Python 3.4.
msg174383 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-31 22:35
> MemoryError

It's because you need >4GB for source bytes + at least >8GB (>12GB on Windows) for temporary UCS2 string.
msg174579 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-11-02 21:14
I don't know what to make of this, but...
Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AMD64)] on win32
Win7 pro, 24 gb mem

>>> x=(b'\\N{WHITE SMILING FACE' + b'x' * 2**32 + b'}')
>>> len(x)
4294967318
>>> y=x.decode('unicode-escape')
>>> len(y)
1
>>> y
'☺'
msg174585 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-02 21:30
Wow!

Do we need a test for this case?
msg174597 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-11-03 00:56
>>> x=(b'\\N{WHITE SMILING FACE' + b'x' * 2**16 + b'}')
>>> y=x.decode('unicode-escape')
Traceback (most recent call last):
  File "<pyshell#1>", line 1, in <module>
    y=x.decode('unicode-escape')
UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 0-65557: unknown Unicode character name
>>> x=(b'\\N{WHITE SMILING FACE}')
>>> y=x.decode('unicode-escape')
>>> y
'☺'

A manageable number of extra spaces raises (I presume correctly), an unmagageable number are ignored (as it seems), is bizarre. Creating the long version took about 15 seconds on a fast machine, so test should be limited to test all (slowly) on machine with high memory.
msg175226 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-11-09 12:04
Tests would be good.  You could use test.support.bigmemtest.
msg175241 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-09 14:41
Here are patches for different Python versions. Test added. Victor, now you can try it on 12GB.  Unfortunately, I can't run the tests.
msg175832 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-17 23:42
Terry, can you measure how much memory tests really needed (3.2 and 3.3 should want different quantity)? Looks as I wrong in my assumptions.
msg175843 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-11-18 04:43
Serhiy, please be more specific as to 'measure' and 'how much' for what effect. I ran two examples, one ran (with error), the other raised.
msg175845 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-18 08:31
I add tests. Victor ran the test and got MemoryError. This means that I incorrectly calculated the minimal memory size for bigmem. This is unacceptable, the test should skip or pass. Only someone with enough memory for test can measure a minimal memory requirement (I don't know how to do this).

May be apply the fix without a test? You tested this manually and this test too cumbersome for regular automatic testing.
msg180332 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-01-21 09:46
New changeset 7625866f8127 by Serhiy Storchaka in branch '3.2':
Issue #16335: Fix integer overflow in unicode-escape decoder.
http://hg.python.org/cpython/rev/7625866f8127

New changeset 494d341e9143 by Serhiy Storchaka in branch '3.3':
Issue #16335: Fix integer overflow in unicode-escape decoder.
http://hg.python.org/cpython/rev/494d341e9143

New changeset 8488febf7d79 by Serhiy Storchaka in branch 'default':
Issue #16335: Fix integer overflow in unicode-escape decoder.
http://hg.python.org/cpython/rev/8488febf7d79
msg180333 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-01-21 09:49
New changeset f4d30d1a529e by Serhiy Storchaka in branch '2.7':
Issue #16335: Fix integer overflow in unicode-escape decoder.
http://hg.python.org/cpython/rev/f4d30d1a529e
msg180334 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-21 09:51
I rewrote the test in EAFP style.
msg180336 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-01-21 11:06
New changeset f84a6c89ccbc by Serhiy Storchaka in branch '3.2':
Fix memory error in test_ucn.
http://hg.python.org/cpython/rev/f84a6c89ccbc

New changeset 7c2aae472b27 by Serhiy Storchaka in branch '3.3':
Fix memory error in test_ucn.
http://hg.python.org/cpython/rev/7c2aae472b27

New changeset f90d6ce49772 by Serhiy Storchaka in branch 'default':
Fix memory error in test_ucn.
http://hg.python.org/cpython/rev/f90d6ce49772

New changeset 38a10d0778d2 by Serhiy Storchaka in branch '2.7':
Fix memory error in test_ucn.
http://hg.python.org/cpython/rev/38a10d0778d2
msg180348 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-01-21 18:30
New changeset ec3a35ab3659 by Serhiy Storchaka in branch '2.7':
Add bigmemtest decorator to test of issue #16335.
http://hg.python.org/cpython/rev/ec3a35ab3659

New changeset 6e0c3e4136b1 by Serhiy Storchaka in branch '3.2':
Add bigmemtest decorator to test of issue #16335.
http://hg.python.org/cpython/rev/6e0c3e4136b1

New changeset 0e622d2cbcf8 by Serhiy Storchaka in branch '3.3':
Use bigmemtest decorator for test of issue #16335.
http://hg.python.org/cpython/rev/0e622d2cbcf8

New changeset cdd1e60d31e5 by Serhiy Storchaka in branch 'default':
Use bigmemtest decorator for test of issue #16335.
http://hg.python.org/cpython/rev/cdd1e60d31e5
msg180552 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013-01-25 00:14
I just ran the 2.7 tests while dealing with another issue, and
I'm getting a memory error or excessive swapping in test_ucn:


The statement

  x = b'\\N{SPACE' + b'x' * int(_testcapi.UINT_MAX + 1) + b'}'

uses over 8GB on my system, so I think that

  minsize=_testcapi.UINT_MAX + 1


is too low.
msg180556 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-01-25 08:18
New changeset fc21f8e83062 by Serhiy Storchaka in branch '2.7':
Don't run the test for issue #16335 when -M is not specified.
http://hg.python.org/cpython/rev/fc21f8e83062

New changeset e3d1b68d34e3 by Serhiy Storchaka in branch '3.2':
Increase the memory limit in the test for issue #16335.
http://hg.python.org/cpython/rev/e3d1b68d34e3

New changeset 43907b88ce93 by Serhiy Storchaka in branch '3.3':
Increase the memory limit in the test for issue #16335.
http://hg.python.org/cpython/rev/43907b88ce93

New changeset fcdb35b114ab by Serhiy Storchaka in branch 'default':
Increase the memory limit in the test for issue #16335.
http://hg.python.org/cpython/rev/fcdb35b114ab
msg180558 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-25 08:31
Bigmem test in 2.7 ran even if -M option is not specified and this causes the memory error. But memuse parameter should be increased (I tested with smaller sizes and found that 1 + 4 // len(u'\U00010000') is not enough, but 2 + 4 // len(u'\U00010000') is enough). Let's see if it helps.
msg180559 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-01-25 09:00
> Bigmem test in 2.7 ran even if -M option is not specified and this
> causes the memory error.

Ah, yes, that's because you should have used `size` instead
of `_testcapi.UINT_MAX` inside the test.
msg180560 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-25 09:43
> Ah, yes, that's because you should have used `size` instead
> of `_testcapi.UINT_MAX` inside the test.

This test has sense only if size % (_testcapi.UINT_MAX + 1) == 0.
msg180561 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013-01-25 09:55
The test is fixed here, thanks.

The limits appear to be different in 2.7 and 3.4:  In 2.7 the bigmem tests
are executed with -M x > 16G, in 3.4 with -M x >= 12G.

I don't know if that's deliberate, just mentioning it.
msg180563 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-25 10:20
Due to PEP 393 Python 3.3+ requires less memory for temporary output buffer. As for difference between ">" and ">=", the meaning of -M parameter a little differs in 2.7 and 3.x -- in 2.7 some overhead (5MiB) counted up.
msg180569 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-01-25 11:37
> Serhiy Storchaka added the comment:
> 
> > Ah, yes, that's because you should have used `size` instead
> > of `_testcapi.UINT_MAX` inside the test.
> 
> This test has sense only if size % (_testcapi.UINT_MAX + 1) == 0.

Why so? Does it fail otherwise?
msg180570 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-25 11:57
The test passed in any case, but for different size it doesn't check that the bug is fixed. Due to the bug bytes b'\\N{SPACExxxxxxxxxxxx...xxx'}' decoded as b'\\N{SPACE'}' if the number of x-es divisible by (UINT_MAX + 1). In this case unicode-escape decoding doesn't failed when the bug not fixed and failed (as expected) when the bug fixed. For all other numbers (>0) the decoding fails as when the bug fixed so when the bug is not fixed. And for other numbers the test is not relevant.
History
Date User Action Args
2022-04-11 14:57:37adminsetgithub: 60539
2013-01-28 13:29:44serhiy.storchakasetstatus: open -> closed
2013-01-25 11:57:20serhiy.storchakasetmessages: + msg180570
2013-01-25 11:37:44pitrousetmessages: + msg180569
2013-01-25 10:20:52serhiy.storchakasetmessages: + msg180563
2013-01-25 09:55:17skrahsetmessages: + msg180561
2013-01-25 09:43:49serhiy.storchakasetmessages: + msg180560
2013-01-25 09:00:14pitrousetmessages: + msg180559
2013-01-25 08:31:42serhiy.storchakasetstatus: closed -> open

messages: + msg180558
2013-01-25 08:18:01python-devsetmessages: + msg180556
2013-01-25 00:14:46skrahsetnosy: + skrah
messages: + msg180552
2013-01-21 18:30:04python-devsetmessages: + msg180348
2013-01-21 11:06:21python-devsetmessages: + msg180336
2013-01-21 09:51:30serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg180334

stage: patch review -> resolved
2013-01-21 09:49:14python-devsetmessages: + msg180333
2013-01-21 09:46:20python-devsetnosy: + python-dev
messages: + msg180332
2013-01-07 18:37:35serhiy.storchakasetassignee: serhiy.storchaka
2012-11-18 08:31:55serhiy.storchakasetmessages: + msg175845
2012-11-18 04:43:55terry.reedysetmessages: + msg175843
2012-11-17 23:42:55serhiy.storchakasetmessages: + msg175832
2012-11-09 14:41:48serhiy.storchakasetfiles: + decode_unicode_escape_overflow-3.3.patch, decode_unicode_escape_overflow-3.2.patch, decode_unicode_escape_overflow-2.7.patch

messages: + msg175241
2012-11-09 14:41:43serhiy.storchakasetfiles: - decode_unicode_escape_overflow.patch
2012-11-09 12:04:45ezio.melottisetmessages: + msg175226
2012-11-03 00:56:56terry.reedysetmessages: + msg174597
2012-11-02 21:30:02serhiy.storchakasetmessages: + msg174585
2012-11-02 21:14:48terry.reedysetnosy: + terry.reedy
messages: + msg174579
2012-10-31 22:35:49serhiy.storchakasetmessages: + msg174383
2012-10-31 22:08:32vstinnersetmessages: + msg174381
2012-10-30 09:39:05serhiy.storchakasetmessages: + msg174190
2012-10-30 00:48:09vstinnersetmessages: + msg174169
2012-10-26 22:55:21serhiy.storchakacreate