Message 92704 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	mwizard
Recipients	mwizard
Date	2009-09-16.17:38:14
SpamBayes Score	0.0
Marked as misclassified	No
Message-id	<1253122696.64.0.653718788287.issue6922@psf.upfronthosting.co.za>
In-reply-to

Content
* Prerequisites: Python 2.6.2 (r262:71605, Apr 14 2009, 22:40:02) [MSC v.1500 32 bit (Intel)] on win32 * Description: 'utf_32_le' and 'utf_32_be' codecs are overconsuming memory when input data are damaged and kwarg 'errors' to str.decode is other than 'strict'. * Steps: 1. Start interpreter 2. Type: '\x01'.decode('utf_32_le', 'replace') or '\x01'.decode('utf32', 'ignore') or ('something'.encode('utf32') + '\x00').decode('utf32', 'ignore') 3. Execute * Notes: 1. seems like any stream raising UnicodeDecodeError in 'strict' mode causes hangup in 'ignore' or 'replace'. * Expected result: 1. AssertionError on "assert errors == 'strict'" raised, just as bz2_codec does, if utf32 cannot be partially decoded at all. 2. Behaviour that 'utf8' and 'utf16' implement for such cases. * Received result: 1. Interpreter hangs, uses up to 100% of CPU kernel and starts to consume RAM. 2. Grows large enough to consume all the RAM it could get (takes up to several minutes on my machine). 3. Produces following traceback: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Python26\lib\encodings\utf_32_be.py", line 11, in decode return codecs.utf_32_be_decode(input, errors, True) MemoryError 4. Sometimes traceback is printed, but text "MemoryError" is not, just leaving blank line in the place.

*** Prerequisites:
Python 2.6.2 (r262:71605, Apr 14 2009, 22:40:02) [MSC v.1500 32 bit
(Intel)] on win32

*** Description:
'utf_32_le' and 'utf_32_be' codecs are overconsuming memory when input
data are damaged and kwarg 'errors' to str.decode is other than 'strict'.

*** Steps:
1. Start interpreter
2. Type:
   '\x01'.decode('utf_32_le', 'replace')
or
   '\x01'.decode('utf32', 'ignore')
or
   ('something'.encode('utf32') + '\x00').decode('utf32', 'ignore')
3. Execute

*** Notes:
1. seems like any stream raising UnicodeDecodeError in 'strict' mode
causes hangup in 'ignore' or 'replace'.

*** Expected result:
1. AssertionError on "assert errors == 'strict'" raised, just as
bz2_codec does, if utf32 cannot be partially decoded at all.
2. Behaviour that 'utf8' and 'utf16' implement for such cases.

*** Received result:
1. Interpreter hangs, uses up to 100% of CPU kernel and starts to
consume RAM.
2. Grows large enough to consume all the RAM it could get (takes up to
several minutes on my machine).
3. Produces following traceback:
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python26\lib\encodings\utf_32_be.py", line 11, in decode
    return codecs.utf_32_be_decode(input, errors, True)
MemoryError
4. Sometimes traceback is printed, but text "MemoryError" is not, just
leaving blank line in the place.

History
Date	User	Action	Args
2009-09-16 17:38:16	mwizard	set	recipients: + mwizard
2009-09-16 17:38:16	mwizard	set	messageid: <1253122696.64.0.653718788287.issue6922@psf.upfronthosting.co.za>
2009-09-16 17:38:15	mwizard	link	issue6922 messages
2009-09-16 17:38:14	mwizard	create