This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author mwizard
Recipients mwizard
Date 2009-09-16.17:38:14
SpamBayes Score 0.0
Marked as misclassified No
Message-id <1253122696.64.0.653718788287.issue6922@psf.upfronthosting.co.za>
In-reply-to
Content
*** Prerequisites:
Python 2.6.2 (r262:71605, Apr 14 2009, 22:40:02) [MSC v.1500 32 bit
(Intel)] on win32

*** Description:
'utf_32_le' and 'utf_32_be' codecs are overconsuming memory when input
data are damaged and kwarg 'errors' to str.decode is other than 'strict'.

*** Steps:
1. Start interpreter
2. Type:
   '\x01'.decode('utf_32_le', 'replace')
or
   '\x01'.decode('utf32', 'ignore')
or
   ('something'.encode('utf32') + '\x00').decode('utf32', 'ignore')
3. Execute

*** Notes:
1. seems like any stream raising UnicodeDecodeError in 'strict' mode
causes hangup in 'ignore' or 'replace'.

*** Expected result:
1. AssertionError on "assert errors == 'strict'" raised, just as
bz2_codec does, if utf32 cannot be partially decoded at all.
2. Behaviour that 'utf8' and 'utf16' implement for such cases.

*** Received result:
1. Interpreter hangs, uses up to 100% of CPU kernel and starts to
consume RAM.
2. Grows large enough to consume all the RAM it could get (takes up to
several minutes on my machine).
3. Produces following traceback:
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python26\lib\encodings\utf_32_be.py", line 11, in decode
    return codecs.utf_32_be_decode(input, errors, True)
MemoryError
4. Sometimes traceback is printed, but text "MemoryError" is not, just
leaving blank line in the place.
History
Date User Action Args
2009-09-16 17:38:16mwizardsetrecipients: + mwizard
2009-09-16 17:38:16mwizardsetmessageid: <1253122696.64.0.653718788287.issue6922@psf.upfronthosting.co.za>
2009-09-16 17:38:15mwizardlinkissue6922 messages
2009-09-16 17:38:14mwizardcreate