New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
string.decode() fails on long strings #45862
Comments
s.decode("utf-8") sometimes silently truncates the result if s has more than 2E9 Bytes, Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/usr/lib64/python2.5/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
TypeError: utf_8_decode() argument 1 must be (unspecified), not str |
Can you attach a (small) example that demonstrates the bug? |
For instance: Python 2.5.1 (r251:54863, Aug 30 2007, 16:15:51)
[GCC 4.1.0 (SUSE Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
__[1] >>> s=" "*int(5E9)
6.050000 sec
__[1] >>> u=s.decode("utf-8")
4.710000 sec
__[1] >>> len(u)
705032704
__[2] >>> len(s)
5000000000
__[3] >>> I would have expected both lengths to be 5E9 |
An instance of the other problem: Python 2.5.1 (r251:54863, Aug 30 2007, 16:15:51)
[GCC 4.1.0 (SUSE Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
__[1] >>> s=" "*int(25E8)
2.990000 sec
__[1] >>> u=s.decode("utf-8")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File
"/home/cl-home/eisele/lns-root-07/lib/python2.5/encodings/utf_8.py",
line 16, in decode
return codecs.utf_8_decode(input, errors, True)
TypeError: utf_8_decode() argument 1 must be (unspecified), not str
__[1] >>> |
I don't have any 64bit machine to test with, Look for the variables declared as "int count;". I suggest to replace it Shouldn't the compiler emit some warning in this case? |
Here is a patch, with a unit test (I was surprised that test_bigmem.py But I still don't have access to any 64bit machine. |
Thanks a lot for the patch, which indeed seems to solve the issue. |
|
Tried |
|
Thanks for the hints on giving the maximal available size explicitly, |
minsize=_2G + 2 should trigger your second problem (where the size wraps
to a negative number). Then 7G is "enough" for the test to run. |
yes, indeed, thanks for pointing this out. What else needs to be done to make sure your patch finds it's way to the |
Nothing I suppose. It appears like an inconsistency in the source code, |
Committed revision 59241. Will backport after the buildbots run the test. |
Committed revision 59244 in release25-maint. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: