This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ajung
Recipients ajung
Date 2009-12-20.16:39:49
SpamBayes Score 9.114931e-13
Marked as misclassified No
Message-id <1261327192.12.0.403194301804.issue7551@psf.upfronthosting.co.za>
In-reply-to
Content
We encountered a pretty bizarre behavior of Python 2.4.6 while decoding a 600MB long unicode string 
'data':

Python 2.4.6 (8GB RAM, 64 bit)

(Pdb) type(data)
<type 'unicode'>

(Pdb) len(data)
601794657

(Pdb) data2=data.encode('utf-8')
*** SystemError: Negative size passed to PyString_FromStringAndSize

Assuming that this has something to do with a 512MB limit:

(Pdb) data2=data[:512*1024*1024].encode('utf-8')
*** SystemError: Negative size passed to PyString_FromStringAndSize

Same bug...now with 512MB - 1 byte:

(Pdb) data2=data[:(256*1024*1024)-1].encode('utf-8')
OverflowError

Cross-check on a different Linux box (4GB RAM, 4 GB Swap, 64 bit)

ajung@blackmoon:~> python2.4
Python 2.4.5 (#1, Jun  9 2008, 10:35:12) 
[GCC 4.2.1 (SUSE Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> data = u'x'*601794657
>>> data2= data.encode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
MemoryError

Where is this different behavior coming from?
History
Date User Action Args
2009-12-20 16:39:52ajungsetrecipients: + ajung
2009-12-20 16:39:52ajungsetmessageid: <1261327192.12.0.403194301804.issue7551@psf.upfronthosting.co.za>
2009-12-20 16:39:50ajunglinkissue7551 messages
2009-12-20 16:39:49ajungcreate