classification
Title: imghdr doesn't recognize variant jpeg formats
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Claudiu.Popa, ezio.melotti, intgr, jcea, joril, kovid, mvignali, r.david.murray, vstinner
Priority: normal Keywords: patch

Created on 2012-11-20 10:21 by joril, last changed 2018-07-18 12:18 by gov_vj.

Files
File name Uploaded Description Edit
peanuts15.jpg joril, 2012-11-20 10:21 JPEG including an ICC profile
imghdr_icc_jpeg.patch joril, 2012-11-20 17:17 patch against hg head review
Pull Requests
URL Status Linked Edit
PR 8322 open gov_vj, 2018-07-18 12:18
Repositories containing patches
https://bitbucket.org/intgr/cpython
Messages (13)
msg175984 - (view) Author: Joril (joril) Date: 2012-11-20 10:21
imghdr doesn't support jpegs that include an ICC Profile.
This is because imghdr looks for "JFIF" somewhere at the beginning of the file, but the ICC_PROFILE shifts that further.
(The ICC spec is here http://www.color.org/specification/ICC1v43_2010-12.pdf, annex B)
msg175985 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-11-20 11:15
Can you provide a patch?
msg175986 - (view) Author: Joril (joril) Date: 2012-11-20 11:16
I can try, yes. I'll add one ASAP
msg176009 - (view) Author: Joril (joril) Date: 2012-11-20 17:17
Here it is... It is against the latest hg version, should I write one for 2.7 too?
msg176010 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-11-20 17:31
Thanks for the patch.

> should I write one for 2.7 too?

Not necessary, 2.7 only gets bugs fixes.
OTOH it would be nice to have some tests for this new features (and for the module in general), but there doesn't seem to be any Lib/test/test_imghdr.py file.  The module itself seems to contain some kind of tests at the end though.
msg176045 - (view) Author: Joril (joril) Date: 2012-11-21 08:19
It looks like the test just walks a directory recursively while trying to identify its files, there's no "classic" test of the "this is a JPEG, is it detected correctly"-type
msg184742 - (view) Author: Kovid Goyal (kovid) Date: 2013-03-20 06:23
The attached patch is insufficient, for example, it fails on http://nationalpostnews.files.wordpress.com/2013/03/budget.jpeg?w=300&h=1571

Note that the linux file utility identifies a files as "JPEG Image data" if the first two bytes of the file are \xff\xd8.

A slightly stricter test that catches more jpeg files:

def test_jpeg(h, f):
    if (h[6:10] in (b'JFIF', b'Exif')) or (h[:2] == b'\xff\xd8' and b'JFIF' in h[:32]):
        return 'jpeg'
msg198034 - (view) Author: (intgr) * Date: 2013-09-18 20:30
I vote we forget about JFIF/Exif headers and only use \xff\xd8 to identify the file. They are optional and there are tons of files out in the wild without such headers, for example: https://coverartarchive.org/release/5044b557-a9ed-4a74-b763-e20580ced85d/3354872309.jpg

Proposed patch at https://bitbucket.org/intgr/cpython/commits/012cde305316e22a999d674a0a009200d3e76fdb
msg220345 - (view) Author: Claudiu Popa (Claudiu.Popa) * (Python triager) Date: 2014-06-12 12:54
Using \xff\xd8 sounds good to me.
msg220346 - (view) Author: Kovid Goyal (kovid) Date: 2014-06-12 13:09
FYI, the test I currently use in calibre, which has not failed so far for millions of users:

def test_jpeg(h, f):    
    if (h[6:10] in (b'JFIF', b'Exif')) or (h[:2] == b'\xff\xd8' and (b'JFIF' in h[:32] or b'8BIM' in h[:32])):
        return 'jpeg'
msg220409 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-06-13 01:11
Issue 21230 reports a parallel problem with recognizing photoshop images.

We need a patch with tests covering the variant types we know about.  I don't have a strong opinion on the simple two byte test versus the more complex test in msg220346, but following 'file' makes sense to me.
msg221185 - (view) Author: Martin Vignali (mvignali) Date: 2014-06-21 18:44
I'm okay with just testing the first two bytes, it's the method we currently use for our
internal tools.

But maybe it can be interesting, to add another test, in order to detect incomplete file
(created when a camera make a recording error for example, and very useful to detect, because an incomplete jpeg file, make a crash for a lot of application)

We use this patch of imghdr :

--------------------------------------
def test_jpeg(h, f):
    """JPEG data in JFIF or Exif format"""
    if not h.startswith(b'\xff\xd8'):#Test empty files, and incorrect start of file
        return None
    else:
        if f:#if we test a file, test end of jpeg
            f.seek(-2,2)
            if f.read(2).endswith(b'\xff\xd9'):
                return 'jpeg'
        else:#if we just test the header, consider this is a valid jpeg and not test end of file
            return 'jpeg'
-------------------------------------
msg221214 - (view) Author: Kovid Goyal (kovid) Date: 2014-06-22 03:39
You cannot assume the file like object passed to imghdr is seekable. And IMO it is not the job of imghdr to check file validity, especially since it does not do that for all formats.
History
Date User Action Args
2018-07-18 12:18:26gov_vjsetstage: test needed -> patch review
pull_requests: + pull_request7860
2014-09-26 08:12:37Claudiu.Popasetstage: patch review -> test needed
2014-06-22 03:39:43kovidsetmessages: + msg221214
2014-06-21 18:44:22mvignalisetnosy: + mvignali
messages: + msg221185
2014-06-13 01:11:51r.david.murraysetnosy: + r.david.murray

messages: + msg220409
title: imghdr doesn't support jpegs with an ICC profile -> imghdr doesn't recognize variant jpeg formats
2014-06-13 01:02:26r.david.murraylinkissue21230 superseder
2014-06-12 13:09:27kovidsetmessages: + msg220346
2014-06-12 13:04:30vstinnersetnosy: + vstinner
2014-06-12 12:54:30Claudiu.Popasetmessages: + msg220345
2014-06-12 12:44:05Claudiu.Popasetnosy: + Claudiu.Popa
2014-06-12 12:43:54Claudiu.Popasetversions: + Python 3.5, - Python 3.4
2013-09-18 20:30:26intgrsetnosy: + intgr

messages: + msg198034
hgrepos: + hgrepo210
2013-03-20 06:23:29kovidsetnosy: + kovid
messages: + msg184742
2012-11-24 01:10:26jceasetnosy: + jcea
2012-11-21 08:19:12jorilsetmessages: + msg176045
2012-11-20 17:31:24ezio.melottisetstage: needs patch -> patch review
messages: + msg176010
components: + Library (Lib)
versions: + Python 3.4, - Python 2.7
2012-11-20 17:17:39jorilsetfiles: + imghdr_icc_jpeg.patch
keywords: + patch
messages: + msg176009
2012-11-20 11:16:16jorilsetmessages: + msg175986
versions: + Python 2.7, - Python 3.4
2012-11-20 11:15:14ezio.melottisetversions: + Python 3.4, - Python 2.7
nosy: + ezio.melotti

messages: + msg175985

stage: needs patch
2012-11-20 10:21:20jorilcreate