New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integer overflow in zipimport.c #64082
Comments
MSVC complains about "conversion from 'Py_ssize_t' to 'long', possible loss of data" in zipimport.c. header_offset is a Py_ssize_t but fseek() only takes a long. On 64bit Windows Py_ssize_t is a 64bit data type but long is still a 32bit data type. It's safe to assume that the header will be smaller than 2 GB for the next couple of years. :) |
read_directory() uses fseek() and ftell() which don't support offset larger than LONG_MAX (2 GB on 32-bit system). I don't know if it's an issue. What happens if the file is longer? "header_offset += arc_offset;" can overflow or not? This instuction looks weird. header_position = ftell(fp);
...
header_offset = get_long((unsigned char *)endof_central_dir + 16);
arc_offset = header_position - header_offset - header_size;
header_offset += arc_offset; If I computed correctly, the final line can be replaced with: arc_offset = header_position - header_offset - header_size;
header_offset = header_position - header_size; (It is weird to reuse header_position for two different values, a new variable may be added.) Instead of checking that "header_offset > LONG_MAX", it may be safer to check that:
If all these values must be positive according to ZIP format, get_long() may be replaced with get_ulong() to simplify these checks. |
See also zipfile.py which is probably more correct than zipimport.c: zipfile uses for example "L" format for struct.unpack (unsigned long) to decode header fields. |
Yes, these fields are unsingned. |
zipimport.c makes no attempt to support zip files larger than 2GiB or zip64 files. |
Here is a work-in-progress patch. PyMarshal_ReadShortFromFile() and PyMarshal_ReadLongFromFile() are still wrong: new Unsigned version should be added to marshal.c. I don't know if a C cast to unsigned is enough because long can be larger than 32-bit (ex: on Linux 64-bit):
#if SIZEOF_LONG > 4
/* Sign extension for 64-bit machines */
x |= -(x & 0x80000000L);
#endif I didn't test my patch. Anyone interested to finish the patch? |
comments added to the patch. |
Ping. |
Here is revised patch. It addresses Gregory's comments, uses properly integer types and converters for all values, and adds additional checks for integer overflows and ZIP file validity. As a side effect the performance can be increased due to less memory allocations and IO operations. |
Sorry, the patch contained parts of the advanced patch that will be submitted in separate issue. |
Synchronized with current sources. |
Updated patch addresses Victor's comments and adds (mandatory now) parenthesis. Thank you Victor. |
Serhiy Storchaka added the comment:
Do you mean braces {...}? The new patch looks good to me, thanks for taking all my comments in account ;-) |
New changeset 687be1cbe587 by Serhiy Storchaka in branch '3.5': New changeset f4631dc56ecf by Serhiy Storchaka in branch 'default': New changeset cebcd2fd3e1f by Serhiy Storchaka in branch '2.7': |
This seems to be causing an infinite loop in 2.7, in test_cmd_line_script.CmdLineTest.test_module_in_package_in_zipfile(): test_module_in_package_in_zipfile (test.test_cmd_line_script.CmdLineTest) ... ^C |
New changeset 82ee3c24bb86 by Serhiy Storchaka in branch '2.7': |
Thank you Martin. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: