This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author meador.inge
Recipients alanmcintyre, eric.araujo, loewis, mark.dickinson, meador.inge, pleed, serhiy.storchaka, terry.reedy, ubershmekel
Date 2012-05-14.18:50:40
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <CAK1QoophRwwEmmaiSgsBX514MJOVrATSvzgdsohr6AzJ8CHJbQ@mail.gmail.com>
In-reply-to <1337016833.3422.28.camel@raxxla>
Content
On Mon, May 14, 2012 at 12:31 PM, Serhiy Storchaka
<report@bugs.python.org> wrote:

> Serhiy Storchaka <storchaka@gmail.com> added the comment:
>
>> This is definitely *not* a padding issue.
>
> This is definitely a padding issue. All uncompressed files are located
> so that the data starts with a 4-byte boundary (1190+30+15+1=1236, 27486
> +30+17+3=27536, etc). This is, probably, allows the use of mmap for the
> resources.

So?  Someone may be using the extra fields to pad things, but for the purpose
of this issue that is completely irrelevant.  We only care about the
proper structure
of the file.  Besides, without clear reference to source code or a
specification any
hypothesis of padding is hearsay.

Did you look at the decoding I sent?  The extra length field length is clearly
reported as a size of one and the contents of the extra field are set to '\x00'.
The extra field of size one is the actual problem, not padding.

>> As Martin pointed out, the standard says that things must be in
>> multiples of 4-bytes.
>
> More precisely, the extra field must have at least 4-bytes length to fit
> a header. The standard is insufficiently defined in terms of what would
> happen if the rest of the field is less than 4 bytes (this is hidden
> behind by ellipsis).

How is it insufficiently defined at all?  It says [1]:

          In order to allow different programs and different types
          of information to be stored in the 'extra' field in .ZIP
          files, the following structure should be used for all
          programs storing data in this field:

          header1+data1 + header2+data2 . . .

          Each header should consist of:

            Header ID - 2 bytes
            Data Size - 2 bytes

          Note: all fields stored in Intel low-byte/high-byte order.

The ellipsis is just a standard convention for indicating a repeating
pattern.  Extra fields which are not multiples of four bytes are not
properly formed.

>>   So the record is non-portable.
>
> De jure the record is non-portable. De facto the record is portable
> (many other tools supports it). But even if it does not portable, we are
> dealing with the expansion of the zip format, which is very easy support
> for reading.

Like I said before, I am all for dropping extra fields we can not
interpret.  However,
let us be clear that with respect to the standard we are implementing
that zip files
constructed like this are ill-formed.

[1] http://www.pkware.com/documents/casestudies/APPNOTE.TXT
History
Date User Action Args
2012-05-14 18:50:41meador.ingesetrecipients: + meador.inge, loewis, terry.reedy, mark.dickinson, alanmcintyre, eric.araujo, ubershmekel, serhiy.storchaka, pleed
2012-05-14 18:50:40meador.ingelinkissue14315 messages
2012-05-14 18:50:40meador.ingecreate