msg173902 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-10-26 22:55 |
Size of parsed Unicode character name casted to int in unicode-escape decoder. This can cause integer overflow.
|
msg174169 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2012-10-30 00:48 |
If I understood correctly, (b'\\N{' + b'x' * (INT_MAX+1)) + '}').decode('unicode-decode') may crash? Did you try such string?
|
msg174190 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-10-30 09:39 |
(b'\\N{WHITE SMILING FACE' + b'x' * 2**32 + '}').decode('unicode-escape') may pass on platform with 32-bit int and more than 32-bit size_t if there is enough memory.
I don't have so much memory.
|
msg174381 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2012-10-31 22:08 |
I have 12 GB of RAM. Let's test.
$ ./python
Python 3.4.0a0 (default:8573a86c11b5+, Oct 31 2012, 22:17:00)
[GCC 4.6.3 20120306 (Red Hat 4.6.3-2)] on linux
>>> x=(b'\\N{WHITE SMILING FACE' + b'x' * 2**32 + b'}')
>>> len(x)
4294967318
>>> y=x.decode('unicode-escape')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
MemoryError
There is no crash, but it would be better to get a SyntaxError("(unicode error) 'unicodeescape' codec can't decode bytes in position 0-6: unknown Unicode character name") instead.
I propose to only fix this issue in Python 3.4.
|
msg174383 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-10-31 22:35 |
> MemoryError
It's because you need >4GB for source bytes + at least >8GB (>12GB on Windows) for temporary UCS2 string.
|
msg174579 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2012-11-02 21:14 |
I don't know what to make of this, but...
Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AMD64)] on win32
Win7 pro, 24 gb mem
>>> x=(b'\\N{WHITE SMILING FACE' + b'x' * 2**32 + b'}')
>>> len(x)
4294967318
>>> y=x.decode('unicode-escape')
>>> len(y)
1
>>> y
'☺'
|
msg174585 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-11-02 21:30 |
Wow!
Do we need a test for this case?
|
msg174597 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2012-11-03 00:56 |
>>> x=(b'\\N{WHITE SMILING FACE' + b'x' * 2**16 + b'}')
>>> y=x.decode('unicode-escape')
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
y=x.decode('unicode-escape')
UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 0-65557: unknown Unicode character name
>>> x=(b'\\N{WHITE SMILING FACE}')
>>> y=x.decode('unicode-escape')
>>> y
'☺'
A manageable number of extra spaces raises (I presume correctly), an unmagageable number are ignored (as it seems), is bizarre. Creating the long version took about 15 seconds on a fast machine, so test should be limited to test all (slowly) on machine with high memory.
|
msg175226 - (view) |
Author: Ezio Melotti (ezio.melotti) * |
Date: 2012-11-09 12:04 |
Tests would be good. You could use test.support.bigmemtest.
|
msg175241 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-11-09 14:41 |
Here are patches for different Python versions. Test added. Victor, now you can try it on 12GB. Unfortunately, I can't run the tests.
|
msg175832 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-11-17 23:42 |
Terry, can you measure how much memory tests really needed (3.2 and 3.3 should want different quantity)? Looks as I wrong in my assumptions.
|
msg175843 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2012-11-18 04:43 |
Serhiy, please be more specific as to 'measure' and 'how much' for what effect. I ran two examples, one ran (with error), the other raised.
|
msg175845 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2012-11-18 08:31 |
I add tests. Victor ran the test and got MemoryError. This means that I incorrectly calculated the minimal memory size for bigmem. This is unacceptable, the test should skip or pass. Only someone with enough memory for test can measure a minimal memory requirement (I don't know how to do this).
May be apply the fix without a test? You tested this manually and this test too cumbersome for regular automatic testing.
|
msg180332 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2013-01-21 09:46 |
New changeset 7625866f8127 by Serhiy Storchaka in branch '3.2':
Issue #16335: Fix integer overflow in unicode-escape decoder.
http://hg.python.org/cpython/rev/7625866f8127
New changeset 494d341e9143 by Serhiy Storchaka in branch '3.3':
Issue #16335: Fix integer overflow in unicode-escape decoder.
http://hg.python.org/cpython/rev/494d341e9143
New changeset 8488febf7d79 by Serhiy Storchaka in branch 'default':
Issue #16335: Fix integer overflow in unicode-escape decoder.
http://hg.python.org/cpython/rev/8488febf7d79
|
msg180333 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2013-01-21 09:49 |
New changeset f4d30d1a529e by Serhiy Storchaka in branch '2.7':
Issue #16335: Fix integer overflow in unicode-escape decoder.
http://hg.python.org/cpython/rev/f4d30d1a529e
|
msg180334 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-01-21 09:51 |
I rewrote the test in EAFP style.
|
msg180336 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2013-01-21 11:06 |
New changeset f84a6c89ccbc by Serhiy Storchaka in branch '3.2':
Fix memory error in test_ucn.
http://hg.python.org/cpython/rev/f84a6c89ccbc
New changeset 7c2aae472b27 by Serhiy Storchaka in branch '3.3':
Fix memory error in test_ucn.
http://hg.python.org/cpython/rev/7c2aae472b27
New changeset f90d6ce49772 by Serhiy Storchaka in branch 'default':
Fix memory error in test_ucn.
http://hg.python.org/cpython/rev/f90d6ce49772
New changeset 38a10d0778d2 by Serhiy Storchaka in branch '2.7':
Fix memory error in test_ucn.
http://hg.python.org/cpython/rev/38a10d0778d2
|
msg180348 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2013-01-21 18:30 |
New changeset ec3a35ab3659 by Serhiy Storchaka in branch '2.7':
Add bigmemtest decorator to test of issue #16335.
http://hg.python.org/cpython/rev/ec3a35ab3659
New changeset 6e0c3e4136b1 by Serhiy Storchaka in branch '3.2':
Add bigmemtest decorator to test of issue #16335.
http://hg.python.org/cpython/rev/6e0c3e4136b1
New changeset 0e622d2cbcf8 by Serhiy Storchaka in branch '3.3':
Use bigmemtest decorator for test of issue #16335.
http://hg.python.org/cpython/rev/0e622d2cbcf8
New changeset cdd1e60d31e5 by Serhiy Storchaka in branch 'default':
Use bigmemtest decorator for test of issue #16335.
http://hg.python.org/cpython/rev/cdd1e60d31e5
|
msg180552 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2013-01-25 00:14 |
I just ran the 2.7 tests while dealing with another issue, and
I'm getting a memory error or excessive swapping in test_ucn:
The statement
x = b'\\N{SPACE' + b'x' * int(_testcapi.UINT_MAX + 1) + b'}'
uses over 8GB on my system, so I think that
minsize=_testcapi.UINT_MAX + 1
is too low.
|
msg180556 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2013-01-25 08:18 |
New changeset fc21f8e83062 by Serhiy Storchaka in branch '2.7':
Don't run the test for issue #16335 when -M is not specified.
http://hg.python.org/cpython/rev/fc21f8e83062
New changeset e3d1b68d34e3 by Serhiy Storchaka in branch '3.2':
Increase the memory limit in the test for issue #16335.
http://hg.python.org/cpython/rev/e3d1b68d34e3
New changeset 43907b88ce93 by Serhiy Storchaka in branch '3.3':
Increase the memory limit in the test for issue #16335.
http://hg.python.org/cpython/rev/43907b88ce93
New changeset fcdb35b114ab by Serhiy Storchaka in branch 'default':
Increase the memory limit in the test for issue #16335.
http://hg.python.org/cpython/rev/fcdb35b114ab
|
msg180558 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-01-25 08:31 |
Bigmem test in 2.7 ran even if -M option is not specified and this causes the memory error. But memuse parameter should be increased (I tested with smaller sizes and found that 1 + 4 // len(u'\U00010000') is not enough, but 2 + 4 // len(u'\U00010000') is enough). Let's see if it helps.
|
msg180559 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-01-25 09:00 |
> Bigmem test in 2.7 ran even if -M option is not specified and this
> causes the memory error.
Ah, yes, that's because you should have used `size` instead
of `_testcapi.UINT_MAX` inside the test.
|
msg180560 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-01-25 09:43 |
> Ah, yes, that's because you should have used `size` instead
> of `_testcapi.UINT_MAX` inside the test.
This test has sense only if size % (_testcapi.UINT_MAX + 1) == 0.
|
msg180561 - (view) |
Author: Stefan Krah (skrah) * |
Date: 2013-01-25 09:55 |
The test is fixed here, thanks.
The limits appear to be different in 2.7 and 3.4: In 2.7 the bigmem tests
are executed with -M x > 16G, in 3.4 with -M x >= 12G.
I don't know if that's deliberate, just mentioning it.
|
msg180563 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-01-25 10:20 |
Due to PEP 393 Python 3.3+ requires less memory for temporary output buffer. As for difference between ">" and ">=", the meaning of -M parameter a little differs in 2.7 and 3.x -- in 2.7 some overhead (5MiB) counted up.
|
msg180569 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-01-25 11:37 |
> Serhiy Storchaka added the comment:
>
> > Ah, yes, that's because you should have used `size` instead
> > of `_testcapi.UINT_MAX` inside the test.
>
> This test has sense only if size % (_testcapi.UINT_MAX + 1) == 0.
Why so? Does it fail otherwise?
|
msg180570 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-01-25 11:57 |
The test passed in any case, but for different size it doesn't check that the bug is fixed. Due to the bug bytes b'\\N{SPACExxxxxxxxxxxx...xxx'}' decoded as b'\\N{SPACE'}' if the number of x-es divisible by (UINT_MAX + 1). In this case unicode-escape decoding doesn't failed when the bug not fixed and failed (as expected) when the bug fixed. For all other numbers (>0) the decoding fails as when the bug fixed so when the bug is not fixed. And for other numbers the test is not relevant.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:37 | admin | set | github: 60539 |
2013-01-28 13:29:44 | serhiy.storchaka | set | status: open -> closed |
2013-01-25 11:57:20 | serhiy.storchaka | set | messages:
+ msg180570 |
2013-01-25 11:37:44 | pitrou | set | messages:
+ msg180569 |
2013-01-25 10:20:52 | serhiy.storchaka | set | messages:
+ msg180563 |
2013-01-25 09:55:17 | skrah | set | messages:
+ msg180561 |
2013-01-25 09:43:49 | serhiy.storchaka | set | messages:
+ msg180560 |
2013-01-25 09:00:14 | pitrou | set | messages:
+ msg180559 |
2013-01-25 08:31:42 | serhiy.storchaka | set | status: closed -> open
messages:
+ msg180558 |
2013-01-25 08:18:01 | python-dev | set | messages:
+ msg180556 |
2013-01-25 00:14:46 | skrah | set | nosy:
+ skrah messages:
+ msg180552
|
2013-01-21 18:30:04 | python-dev | set | messages:
+ msg180348 |
2013-01-21 11:06:21 | python-dev | set | messages:
+ msg180336 |
2013-01-21 09:51:30 | serhiy.storchaka | set | status: open -> closed resolution: fixed messages:
+ msg180334
stage: patch review -> resolved |
2013-01-21 09:49:14 | python-dev | set | messages:
+ msg180333 |
2013-01-21 09:46:20 | python-dev | set | nosy:
+ python-dev messages:
+ msg180332
|
2013-01-07 18:37:35 | serhiy.storchaka | set | assignee: serhiy.storchaka |
2012-11-18 08:31:55 | serhiy.storchaka | set | messages:
+ msg175845 |
2012-11-18 04:43:55 | terry.reedy | set | messages:
+ msg175843 |
2012-11-17 23:42:55 | serhiy.storchaka | set | messages:
+ msg175832 |
2012-11-09 14:41:48 | serhiy.storchaka | set | files:
+ decode_unicode_escape_overflow-3.3.patch, decode_unicode_escape_overflow-3.2.patch, decode_unicode_escape_overflow-2.7.patch
messages:
+ msg175241 |
2012-11-09 14:41:43 | serhiy.storchaka | set | files:
- decode_unicode_escape_overflow.patch |
2012-11-09 12:04:45 | ezio.melotti | set | messages:
+ msg175226 |
2012-11-03 00:56:56 | terry.reedy | set | messages:
+ msg174597 |
2012-11-02 21:30:02 | serhiy.storchaka | set | messages:
+ msg174585 |
2012-11-02 21:14:48 | terry.reedy | set | nosy:
+ terry.reedy messages:
+ msg174579
|
2012-10-31 22:35:49 | serhiy.storchaka | set | messages:
+ msg174383 |
2012-10-31 22:08:32 | vstinner | set | messages:
+ msg174381 |
2012-10-30 09:39:05 | serhiy.storchaka | set | messages:
+ msg174190 |
2012-10-30 00:48:09 | vstinner | set | messages:
+ msg174169 |
2012-10-26 22:55:21 | serhiy.storchaka | create | |