classification
Title: plistlib doesn't skip whitespace in XML format detection
Type: behavior Stage: patch review
Components: Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: ned.deily, ronaldoussoren, serhiy.storchaka, shaneg
Priority: normal Keywords: patch

Created on 2019-06-19 23:18 by shaneg, last changed 2019-07-14 16:13 by ronaldoussoren.

Pull Requests
URL Status Linked Edit
PR 14266 closed Kriyszig, 2019-06-20 16:26
Messages (7)
msg346089 - (view) Author: Shane G (shaneg) Date: 2019-06-19 23:18
plistlib in Python 3.7.3 (and earlier) does not autodetect plist data as XML if it contains whitespace before the "<?xml" tag.  Apple tools (and code) however, do.

Suggestion:  lstrip() the header bytes before comparison

Source Link: https://github.com/python/cpython/blob/3.7/Lib/plistlib.py#L493
msg346090 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2019-06-19 23:59
Thanks for the report.  Would you be interested in providing a pull request with a fix?
msg347912 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2019-07-14 13:19
I don't agree with calling lstrip() before checking which format is used because leading whitespace is invalid for binary plist files (and plutil agrees with me on that).
msg347920 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-07-14 14:30
lstrip() would not work with UTF-16 encoded plist files neither with BOM.

Who produces plist files with leading whitespaces?
msg347921 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2019-07-14 14:40
@shaneg, could you elaborate on why you created this issue?

I don't know of tooling that would generate such files, and it is highly unlikely that Apple's system tooling/libraries would do so.
msg347923 - (view) Author: Shane G (shaneg) Date: 2019-07-14 16:10
This issue was created because I ran across a plist like this when parsing entitlements in an IPA.  I assume that this happened by some unusual step in the toolchain when building the application.

To some other points:
* agreed lstrip()ing just the key would not work (unfortunately I suggested this before actually coding up a workaround for my case).
* agreed that binary plists should not have any stripping.
* I have not tried testing apple tools (e.g. plutil) against XML plists with BOMs before any leading whitespace.
msg347924 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2019-07-14 16:13
You can always strip the file yourself :-)

Ignoring leading whitespace for XML would be fairly invasive due to the way the code is set up. I'm not set against it, but I'm not in favour of complicating the implementation for an edge case like this.
History
Date User Action Args
2019-07-14 16:13:50ronaldoussorensetmessages: + msg347924
2019-07-14 16:10:14shanegsetmessages: + msg347923
2019-07-14 14:40:43ronaldoussorensetmessages: + msg347921
2019-07-14 14:30:45serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg347920
2019-07-14 13:19:46ronaldoussorensetmessages: + msg347912
2019-06-20 16:26:29Kriyszigsetkeywords: + patch
stage: patch review
pull_requests: + pull_request14098
2019-06-19 23:59:55ned.deilysetnosy: + ronaldoussoren, ned.deily

messages: + msg346090
versions: + Python 3.8, Python 3.9
2019-06-19 23:18:35shanegcreate