classification
Title: Support for DICOM image file format in imghdr module
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: berker.peksag, claw, pam, terry.reedy, trrhodes
Priority: normal Keywords: patch

Created on 2021-01-13 00:28 by claw, last changed 2021-04-12 16:48 by trrhodes.

Pull Requests
URL Status Linked Edit
PR 24227 open trrhodes, 2021-01-16 12:42
Messages (7)
msg384989 - (view) Author: Charles Law (claw) * Date: 2021-01-13 00:28
DICOM is a file format used frequently in medical imaging (it is also a communications protocol). It has been used since the 80's, and is still widely used by modern medical equipment.

It has a well defined format: http://dicom.nema.org/dicom/2013/output/chtml/part10/chapter_7.html

This proposal is for the addiction of a check to imghdr module to detect DICOM files, with imghdr.what() and return 'dicom' if a dicom file is found.
msg385131 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2021-01-16 00:46
Can you submit a patch, or post here an test_dicom function?
msg385139 - (view) Author: Ross Rhodes (trrhodes) * Date: 2021-01-16 13:27
Hello Charles,

Following the format provided, I've opened a PR to implement your proposal. Feedback welcome.
msg385594 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2021-01-24 22:49
Copying my comment from GitHub:

> I think DICOM is too specific to be added to the stdlib. I'd prefer
> improving documentation of imghdr.tests to make adding custom file
> types clearer.
msg386658 - (view) Author: Ross Rhodes (trrhodes) * Date: 2021-02-08 20:28
Looking for input from other contributors here. Naturally with a PR already open I’m inclined to keep these changes, but if the majority agree that it is too specific a format then I’m happy to hear alternative suggestions?

Ross
msg390792 - (view) Author: Pierre-Alain Moret (pam) Date: 2021-04-11 19:23
The DICOM format is indeed very widely used in the medical field and for me it deserves to be added in stdlib. I do not see why it is more specific than rast format which is included. Moreover it should be easy to add because even if the complete format is very complex with all the medical modalities, its enough to test the first 132 bytes of image that should be:
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00DICM' 
Of course, its not enough to test that we have a valid DICOM image, but it is also not the case with other formats.

For example, with this simple corrupted jpeg image :
imghdr.what('dummy', h= b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\xff\xd9')
'jpeg' is returned.

That is why I strongly advocate in favor of adding DICOM format in imghdr.

Pierre-Alain Moret
msg390866 - (view) Author: Ross Rhodes (trrhodes) * Date: 2021-04-12 16:48
PR already open to support DICOM based on the first 132 characters. Marked “stale” since I haven’t had any feedback on GitHub, yet.
History
Date User Action Args
2021-04-12 16:48:53trrhodessetmessages: + msg390866
2021-04-11 19:23:52pamsetnosy: + pam
messages: + msg390792
2021-02-08 20:28:53trrhodessetmessages: + msg386658
2021-01-24 22:49:41berker.peksagsetnosy: + berker.peksag
messages: + msg385594
2021-01-16 13:27:33trrhodessetmessages: + msg385139
2021-01-16 12:42:35trrhodessetkeywords: + patch
nosy: + trrhodes

pull_requests: + pull_request23050
stage: needs patch -> patch review
2021-01-16 00:46:51terry.reedysetnosy: + terry.reedy

messages: + msg385131
stage: needs patch
2021-01-13 00:28:46clawcreate