classification
Title: Work with an extra field of gzip and zip files
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: dmi.baranov, nikratio, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2013-04-09 15:03 by serhiy.storchaka, last changed 2018-07-13 12:03 by serhiy.storchaka.

Files
File name Uploaded Description Edit
gzip_extra.diff serhiy.storchaka, 2013-11-16 19:18 review
zipfile_extra.diff serhiy.storchaka, 2013-11-16 19:19 review
README.dz serhiy.storchaka, 2013-11-16 19:19
README.zip serhiy.storchaka, 2013-11-16 19:20
Messages (4)
msg186423 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-04-09 15:03
Gzip files can contains an extra field and some applications use this for extending gzip format. The current GzipFile implementation ignores this field on input and doesn't allow to create a new file with an extra field.

I propose to save an extra field data on reading as a GzipFile attribute and add new parameter for GzipFile constructor for creating new file with an extra field.
msg190295 - (view) Author: Dmi Baranov (dmi.baranov) * Date: 2013-05-29 12:07
I'll be glad to do it, but having some questions for discussing.

First about FEXTRA format - it consists of a series of subfields [1] and current Lib/test/test_gzip.py :: test_read_with_extra having a bit incorrect extra field - sure, if somebody using format from RFC1952. You having a real samples with extra field?.
Should we parse subfields here (I have already asked Jean-Loup Gailly, maintainer of registry of subfield IDs, for current registry values and waiting reply) or will just provide extra header as byte string?

Next about GzipFile's public interface - GzipFile(...).extra look ugly. Should I extend this ticket to support all metadata headers? FNAME, FCOMMENT, FHCRC, etc - correctly reading now, but no ways to get it outside (and no ways to create a file with FCOMMENT and FHCRC now).

Eg, something to like this:
GzipFile(...).metadata.FNAME == 'sample.gz'
GzipFile(..., extra=b'AP6Test', comment='comment')


[1] http://tools.ietf.org/html/rfc1952#section-2.3.1.1
msg190301 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-05-29 12:44
I have an almost ready patch but I doubt about interface. It can be discussed. ZIP file entries have similar extra field and I'm planning to add similar feature to the zipfile module too.

Here are preliminary patches.
msg203077 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-16 19:24
Some examples:

>>> import zipfile
>>> z = zipfile.ZipFile('README.zip')
>>> z.filelist[0].extra
b'UT\x05\x00\x03\xe0\xc3\x87Rux\x0b\x00\x01\x04\xe8\x03\x00\x00\x04\xe8\x03\x00\x00'
>>> z.filelist[0].extra_map
<zipfile.ExtraMap object at 0xb6fe8bec>
>>> list(z.filelist[0].extra_map.items())
[(21589, b'\x03\xe0\xc3\x87R'), (30837, b'\x01\x04\xe8\x03\x00\x00\x04\xe8\x03\x00\x00')]
>>> import gzip
>>> gz = gzip.open('README.dz')
>>> gz.extra_bytes
b''
>>> gz.extra_map
<gzip.ExtraMap object at 0xb6fd04ac>
>>> list(gz.extra_map.items())
[]
>>> gz.read(1)
b'T'
>>> gz.extra_bytes
b'RA\x08\x00\x01\x00\xcb\xe3\x01\x00T\x0b'
>>> list(gz.extra_map.items())
[(b'RA', b'\x01\x00\xcb\xe3\x01\x00T\x0b')]
History
Date User Action Args
2018-07-13 12:03:05serhiy.storchakasetversions: + Python 3.8, - Python 3.4
2014-01-24 05:23:00nikratiosetnosy: + nikratio
2013-11-16 19:24:33serhiy.storchakasetmessages: + msg203077
stage: needs patch -> patch review
2013-11-16 19:20:24serhiy.storchakasetfiles: + README.zip
2013-11-16 19:19:58serhiy.storchakasetfiles: + README.dz
2013-11-16 19:19:13serhiy.storchakasetfiles: + zipfile_extra.diff
2013-11-16 19:18:40serhiy.storchakasetfiles: + gzip_extra.diff
2013-11-16 19:17:55serhiy.storchakasetfiles: - zip_extra.diff
2013-11-16 19:17:43serhiy.storchakasetfiles: - gzip_extra.diff
2013-05-29 12:45:13serhiy.storchakasetfiles: + zip_extra.diff
2013-05-29 12:44:36serhiy.storchakasetfiles: + gzip_extra.diff
keywords: + patch
messages: + msg190301

title: Work with an extra field of gzip files -> Work with an extra field of gzip and zip files
2013-05-29 12:07:24dmi.baranovsetnosy: + dmi.baranov
messages: + msg190295
2013-04-09 15:03:01serhiy.storchakacreate