classification
Title: Wrong OS header on file created by gzip module
Type: behavior Stage: resolved
Components: Extension Modules Versions: Python 3.6, Python 3.5, Python 2.7
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: ddorda, ned.deily, r.david.murray, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2016-07-15 21:14 by ddorda, last changed 2016-07-17 15:14 by r.david.murray. This issue is now closed.

Files
File name Uploaded Description Edit
gzip_os_header_fix_and_tests.patch ddorda, 2016-07-15 21:14 OS header fix patch (and tests) review
gzip_os_header_fix_and_tests_mac_fix.patch ddorda, 2016-07-16 08:57 Patch fixed for Mac OS's review
Messages (6)
msg270515 - (view) Author: Dor Dankner (ddorda) * Date: 2016-07-15 21:14
The gzip module generates files with wrong OS header, by putting "Unknown" OS, instead of checking and filling the user's OS.
From the gzip RFC (rfc1952): "This identifies the type of file system on which compression took place. This may be useful in determining end-of-line convention for text files."

The following patch contains a fix that fills the current OS flag (and testcase ;) )

* The bug is relevant to python 2.x too, but I did not test the patch on it.
** also, I did not run the testcase on Win/Mac, however it should work.
msg270520 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2016-07-15 23:12
It appears that not including the OS value in the header is probably more of a feature rather than a bug considering how long the gzip module has been around with this behavior.  Have you found any real-life cases where it makes a difference?

In any case, the os.platform value 'mac' referred to ancient Mac OS Classic (Mac OS 9 and earlier), platforms current Pythons do not support.  Current Mac OS X systems are platform 'darwin' and would use the same gzip value as platform 'posix'.
msg270549 - (view) Author: Dor Dankner (ddorda) * Date: 2016-07-16 08:57
It does not seem like a feature after you see this issue #27521.
The OS was just hardcoded, the same way the compression level was hardcoded. you may say it's a feature too :P

The way I see it, the gzip module should at least work like the original GNU gzip, if not following the RFC (which I believe is even better).

Anyways i verified that OS X fills a UNIX OS header, and fixed my code (patch included) :)
msg270559 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-16 13:07
This is a harder call than #27521.  There (IUUC), the *wrong* compression level was specified.  Here, 'unknown' is a valid value.  The biggest disadvantage of changing this is the maintenance burden of keeping it up to date with platform changes (eg: we're eventually adding android support, and andriod (may not be?) posix.  Without a defined benefit, I'm -0.5 on changing this.
msg270607 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-07-17 05:32
This byte identifies the type of file system. How can you determine it? Note that different filesystems can be used on the same OS. FAT still widely used on Windows (and what about exFAT, should it be considered as a flavour of FAT or separate type?), virtually any file system can be mounted on Linux.

See 2.3.1.2 in RFC 1952:

         A compliant compressor must produce files with correct ID1,
         ID2, CM, CRC32, and ISIZE, but may set all the other fields in
         the fixed-length part of the header to default values (255 for
         OS, 0 for all others).

         [...] a decompressor may ignore FTEXT and OS
         and always produce binary output, and still be compliant.

Since the gzip module never sets the FTEXT flag, the value of the OS field is useless.
msg270610 - (view) Author: Dor Dankner (ddorda) * Date: 2016-07-17 05:52
I'm convinced.
Thank you all for your comments :)
History
Date User Action Args
2016-07-17 15:14:52r.david.murraysetresolution: rejected
stage: resolved
2016-07-17 05:52:11ddordasetstatus: open -> closed

messages: + msg270610
2016-07-17 05:32:54serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg270607
2016-07-16 13:07:12r.david.murraysetnosy: + r.david.murray
messages: + msg270559
2016-07-16 08:57:17ddordasetfiles: + gzip_os_header_fix_and_tests_mac_fix.patch

messages: + msg270549
2016-07-15 23:12:52ned.deilysetnosy: + ned.deily
messages: + msg270520
2016-07-15 21:14:53ddordacreate