This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: integer overflow in encoding unicode
Type: security Stage: resolved
Components: Versions: Python 3.3, Python 3.4, Python 3.5, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, benjamin.peterson, georg.brandl, pkt, python-dev, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2014-09-29 21:01 by pkt, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
poc_encode_latin1.py pkt, 2014-09-29 21:01
codecs_error_handlers_overflow_tests.patch serhiy.storchaka, 2014-11-26 11:27 Attempt to test integer overflow review
Messages (17)
msg227837 - (view) Author: paul (pkt) Date: 2014-09-29 21:01
# static PyObject *
# unicode_encode_ucs1(PyObject *unicode,
#                     const char *errors,
#                     unsigned int limit)
# {
#     ...
#     while (pos < size) {
#       ...
#             case 4: /* xmlcharrefreplace */
#                 /* determine replacement size */
#                 for (i = collstart, repsize = 0; i < collend; ++i) {
#                     Py_UCS4 ch = PyUnicode_READ(kind, data, i);
#                     ...
#                     else if (ch < 100000)
# 1                       repsize += 2+5+1;
#                     ...
#                 }
# 2               requiredsize = respos+repsize+(size-collend);
#                 if (requiredsize > ressize) {
#                     ...
#                     if (_PyBytes_Resize(&res, requiredsize))
#                     ...
#                 }
#                 /* generate replacement */
#                 for (i = collstart; i < collend; ++i) {
# 3                   str += sprintf(str, "&#%d;", PyUnicode_READ(kind, data, i)); 
#                 }
# 
# 1. ch=0xffff<100000, so repsize = (number of unicode chars in string)*8
#    =2^29*2^3=2^32 == 0 (mod 2^32)
# 2. respos==0, collend==0, so requiredsize=repsize==0, so the destination buffer
#    isn't resized
# 3. overwrite
msg227843 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-09-29 21:19
Looks very similar to issue22470.
msg227853 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-09-29 22:55
New changeset b2e68274aa8e by Benjamin Peterson in branch '2.7':
cleanup overflowing handling in unicode_decode_call_errorhandler and unicode_encode_ucs1 (closes #22518)
https://hg.python.org/cpython/rev/b2e68274aa8e

New changeset 3b7e93249700 by Benjamin Peterson in branch '2.7':
add NEWS note for #22518
https://hg.python.org/cpython/rev/3b7e93249700

New changeset 3c67d19c624f by Benjamin Peterson in branch '3.3':
cleanup overflowing handling in unicode_decode_call_errorhandler and unicode_encode_ucs1 (closes #22518)
https://hg.python.org/cpython/rev/3c67d19c624f

New changeset 88332ea4c140 by Benjamin Peterson in branch '3.3':
NEWS issue for #22518
https://hg.python.org/cpython/rev/88332ea4c140

New changeset 7dab27fffff2 by Benjamin Peterson in branch '3.4':
merge 3.3 (closes #22518)
https://hg.python.org/cpython/rev/7dab27fffff2

New changeset f86fde20e9ce by Benjamin Peterson in branch 'default':
merge 3.4 (closes #22518)
https://hg.python.org/cpython/rev/f86fde20e9ce
msg227908 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-09-30 13:05
> New changeset f86fde20e9ce by Benjamin Peterson in branch 'default':
> merge 3.4 (closes #22518)
> https://hg.python.org/cpython/rev/f86fde20e9ce

This changeset added ">>>> other". It looks like you commited a conflict.

-        if (requiredsize<2*outsize)
+        if (outsize <= PY_SSIZE_T_MAX/2 && requiredsize < 2*outsize)
             requiredsize = 2*outsize;

I'm not sure that this change is correct. Why not raising an exception on overflow?
msg227910 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-09-30 13:09
It would be nice to add a bigmem test to check that repr('\x00'*(2**30+1)) doesn't crash anymore.
msg227912 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-09-30 13:13
> It would be nice to add a bigmem test to check that repr('\x00'*(2**30+1)) doesn't crash anymore.

Ooops, wrong issue, the test is : ("\uffff" * (2**29)).encode("latin1", errors="xmlcharrefreplace").
msg227915 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-09-30 13:30
> I'm not sure that this change is correct. Why not raising an exception on
> overflow?

This is correct. This check prevents overflow.
msg227916 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-09-30 13:39
> This is correct. This check prevents overflow.

Oh, I didn't understand that "requiredsize = 2*outsize;" is only used for performances, to overallocate the buffer. So I agree that it's fine to not overallocate if it would overflow.
msg227917 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2014-09-30 13:46
Benjamin, could you make a patch for 3.2 as well?
msg227923 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-09-30 14:01
> Ooops, wrong issue, the test is : ("\uffff" * (2**29)).encode("latin1",
> errors="xmlcharrefreplace").

("\uffff" * (sys.maxsize//8+1)).encode("latin1", errors="xmlcharrefreplace")

or

("\xff" * (sys.maxsize//6+1)).encode("ascii", errors="xmlcharrefreplace")
msg228447 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-10-04 11:24
New changeset 3f7519f633ed by Serhiy Storchaka in branch '2.7':
Issue #22518: Fixed integer overflow issues in "backslashreplace" and
https://hg.python.org/cpython/rev/3f7519f633ed

New changeset ec9b7fd246b6 by Serhiy Storchaka in branch '3.4':
Issue #22518: Fixed integer overflow issues in "backslashreplace",
https://hg.python.org/cpython/rev/ec9b7fd246b6

New changeset 2df4cc31c36e by Serhiy Storchaka in branch 'default':
Issue #22518: Fixed integer overflow issues in "backslashreplace",
https://hg.python.org/cpython/rev/2df4cc31c36e
msg228448 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-10-04 11:53
New changeset d1be1f355f59 by Serhiy Storchaka in branch '2.7':
Fixed compilation error introduced in 3f7519f633ed (issue #22518).
https://hg.python.org/cpython/rev/d1be1f355f59
msg228455 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-10-04 13:09
New changeset 51317c9786f5 by Serhiy Storchaka in branch '3.3':
Issue #22518: Fixed integer overflow issues in "backslashreplace",
https://hg.python.org/cpython/rev/51317c9786f5
msg228456 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-04 13:11
Sorry for noise, these changes are related to issue22470.
msg231679 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-25 19:52
Do you want to add a bigmem test or close this issue Benjamin?
msg231681 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2014-11-25 20:41
I wouldn't object if you had a patch.
msg231704 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-26 11:27
Integer overflow errors were fixed in 4 error handlers: surrogatepass, backslashreplace, namereplace, and xmlcharrefreplace. Is is hard to write general robust tests. In worst cases they requires more than sys.maxsize or even sys.maxsize*2 memory and for sure fail with MemoryError. It is possible to write a test for xmlcharrefreplace, but it will not robust, after changes of implementation details it could raise MemoryError instead of OverflowError after consuming all address space.

So I suggest close this issue without tests. Such tests are useless.
History
Date User Action Args
2022-04-11 14:58:08adminsetgithub: 66708
2014-12-10 19:03:11benjamin.petersonsetstatus: open -> closed
resolution: fixed
2014-11-26 11:27:28serhiy.storchakasetfiles: + codecs_error_handlers_overflow_tests.patch
keywords: + patch
messages: + msg231704
2014-11-25 20:41:55benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg231681
2014-11-25 19:52:22serhiy.storchakasetmessages: + msg231679
2014-10-04 13:11:16serhiy.storchakasetmessages: + msg228456
2014-10-04 13:09:56python-devsetmessages: + msg228455
2014-10-04 11:53:02python-devsetmessages: + msg228448
2014-10-04 11:24:02python-devsetmessages: + msg228447
2014-09-30 14:01:31serhiy.storchakasetmessages: + msg227923
2014-09-30 13:46:56georg.brandlsetnosy: + georg.brandl
messages: + msg227917
2014-09-30 13:41:39vstinnersettype: crash -> security
2014-09-30 13:39:24vstinnersetmessages: + msg227916
2014-09-30 13:30:17serhiy.storchakasetmessages: + msg227915
2014-09-30 13:13:47vstinnersetstatus: closed -> open
resolution: fixed -> (no value)
2014-09-30 13:13:37vstinnersetmessages: + msg227912
2014-09-30 13:09:21vstinnersetmessages: + msg227910
2014-09-30 13:05:37vstinnersetnosy: + vstinner
messages: + msg227908
2014-09-29 23:21:05Arfreversetnosy: + Arfrever
2014-09-29 22:55:13python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg227853

resolution: fixed
stage: resolved
2014-09-29 22:51:20benjamin.petersonsetversions: + Python 3.3
2014-09-29 22:51:03benjamin.petersonsetversions: + Python 2.7, Python 3.5
2014-09-29 21:19:07serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg227843
2014-09-29 21:01:25pktcreate