Message 160655 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	meador.inge
Recipients	alanmcintyre, eric.araujo, loewis, mark.dickinson, meador.inge, pleed, serhiy.storchaka, terry.reedy, ubershmekel
Date	2012-05-14.18:50:40
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<CAK1QoophRwwEmmaiSgsBX514MJOVrATSvzgdsohr6AzJ8CHJbQ@mail.gmail.com>
In-reply-to	<1337016833.3422.28.camel@raxxla>

Content
On Mon, May 14, 2012 at 12:31 PM, Serhiy Storchaka <report@bugs.python.org> wrote: > Serhiy Storchaka <storchaka@gmail.com> added the comment: > >> This is definitely not a padding issue. > > This is definitely a padding issue. All uncompressed files are located > so that the data starts with a 4-byte boundary (1190+30+15+1=1236, 27486 > +30+17+3=27536, etc). This is, probably, allows the use of mmap for the > resources. So? Someone may be using the extra fields to pad things, but for the purpose of this issue that is completely irrelevant. We only care about the proper structure of the file. Besides, without clear reference to source code or a specification any hypothesis of padding is hearsay. Did you look at the decoding I sent? The extra length field length is clearly reported as a size of one and the contents of the extra field are set to '\x00'. The extra field of size one is the actual problem, not padding. >> As Martin pointed out, the standard says that things must be in >> multiples of 4-bytes. > > More precisely, the extra field must have at least 4-bytes length to fit > a header. The standard is insufficiently defined in terms of what would > happen if the rest of the field is less than 4 bytes (this is hidden > behind by ellipsis). How is it insufficiently defined at all? It says [1]: In order to allow different programs and different types of information to be stored in the 'extra' field in .ZIP files, the following structure should be used for all programs storing data in this field: header1+data1 + header2+data2 . . . Each header should consist of: Header ID - 2 bytes Data Size - 2 bytes Note: all fields stored in Intel low-byte/high-byte order. The ellipsis is just a standard convention for indicating a repeating pattern. Extra fields which are not multiples of four bytes are not properly formed. >> So the record is non-portable. > > De jure the record is non-portable. De facto the record is portable > (many other tools supports it). But even if it does not portable, we are > dealing with the expansion of the zip format, which is very easy support > for reading. Like I said before, I am all for dropping extra fields we can not interpret. However, let us be clear that with respect to the standard we are implementing that zip files constructed like this are ill-formed. [1] http://www.pkware.com/documents/casestudies/APPNOTE.TXT

On Mon, May 14, 2012 at 12:31 PM, Serhiy Storchaka
<report@bugs.python.org> wrote:

> Serhiy Storchaka <storchaka@gmail.com> added the comment:
>
>> This is definitely *not* a padding issue.
>
> This is definitely a padding issue. All uncompressed files are located
> so that the data starts with a 4-byte boundary (1190+30+15+1=1236, 27486
> +30+17+3=27536, etc). This is, probably, allows the use of mmap for the
> resources.

So?  Someone may be using the extra fields to pad things, but for the purpose
of this issue that is completely irrelevant.  We only care about the
proper structure
of the file.  Besides, without clear reference to source code or a
specification any
hypothesis of padding is hearsay.

Did you look at the decoding I sent?  The extra length field length is clearly
reported as a size of one and the contents of the extra field are set to '\x00'.
The extra field of size one is the actual problem, not padding.

>> As Martin pointed out, the standard says that things must be in
>> multiples of 4-bytes.
>
> More precisely, the extra field must have at least 4-bytes length to fit
> a header. The standard is insufficiently defined in terms of what would
> happen if the rest of the field is less than 4 bytes (this is hidden
> behind by ellipsis).

How is it insufficiently defined at all?  It says [1]:

          In order to allow different programs and different types
          of information to be stored in the 'extra' field in .ZIP
          files, the following structure should be used for all
          programs storing data in this field:

          header1+data1 + header2+data2 . . .

          Each header should consist of:

            Header ID - 2 bytes
            Data Size - 2 bytes

          Note: all fields stored in Intel low-byte/high-byte order.

The ellipsis is just a standard convention for indicating a repeating
pattern.  Extra fields which are not multiples of four bytes are not
properly formed.

>>   So the record is non-portable.
>
> De jure the record is non-portable. De facto the record is portable
> (many other tools supports it). But even if it does not portable, we are
> dealing with the expansion of the zip format, which is very easy support
> for reading.

Like I said before, I am all for dropping extra fields we can not
interpret.  However,
let us be clear that with respect to the standard we are implementing
that zip files
constructed like this are ill-formed.

[1] http://www.pkware.com/documents/casestudies/APPNOTE.TXT

History
Date	User	Action	Args
2012-05-14 18:50:41	meador.inge	set	recipients: + meador.inge, loewis, terry.reedy, mark.dickinson, alanmcintyre, eric.araujo, ubershmekel, serhiy.storchaka, pleed
2012-05-14 18:50:40	meador.inge	link	issue14315 messages
2012-05-14 18:50:40	meador.inge	create