classification
Title: plistlib fails to parse bplist with 0x80 UID values
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: ronaldoussoren Nosy List: SilentGhost, bigfootjon, ronaldoussoren, serhiy.storchaka, slo.sleuth
Priority: normal Keywords: patch

Created on 2016-04-07 01:34 by slo.sleuth, last changed 2018-03-28 12:56 by bigfootjon.

Files
File name Uploaded Description Edit
plistlib_uid.diff slo.sleuth, 2016-04-07 01:33 diff file with proposed UID patch
issue26707.diff SilentGhost, 2016-04-07 07:51 review
plistlib_uid.diff slo.sleuth, 2016-04-07 20:54 diff file with updated UID patch
cat.plist bigfootjon, 2018-03-03 17:59
plist_hack.py bigfootjon, 2018-03-13 17:29
Pull Requests
URL Status Linked Edit
PR 5922 open python-dev, 2018-02-27 05:37
Messages (17)
msg262974 - (view) Author: John Lehr (slo.sleuth) Date: 2016-04-07 01:33
libplist raises an invalid file exception on loading properly formed binary plists containing UID (0x80) values.  The binary files were tested for form with plutil.

Comments at line 706 state the value is defined but not in use in plists, and the object is not handled.  However, I find them frequently in bplists, e.g., iOS Snapchat application files.  I have attached a proposed patch that I have tested on these files and can now successfully parse them with the _read_object method in the _BinaryPlistParser class.

My proposed patch is pasted below for others consideration while waiting for the issue to be resolved.

706,707c706,708
<         # tokenH == 0x80 is documented as 'UID' and appears to be used for
<         # keyed-archiving, not in plists.
---
>         elif tokenH == 0x80: #UID
>             s = self._get_size(tokenL)
>             return self._fp.read(s).decode('ascii')

Thanks for your consideration.
msg262984 - (view) Author: SilentGhost (SilentGhost) * (Python triager) Date: 2016-04-07 07:51
Here is the version of the patch suitable for the Rietveld. John, could you perhaps provide an example file that uses UID values?

Also, the code is identical to handling of 0x50 token, perhaps it could be incorporated into it.
msg262985 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-04-07 08:46
UID is rather int than bytes. And I would use a special UID type.

According to Apple's sources [1], the size of UID data is tokenL+1, not self._get_size(tokenL).

[1] http://www.opensource.apple.com/source/CF/CF-1153.18/CFBinaryPList.c
msg262999 - (view) Author: John Lehr (slo.sleuth) Date: 2016-04-07 18:14
I’m sorry, but the files in which I detected the problem cannot be circulated.  I will try to create a test account on Snapchat and generate some test data, but I can’t do this anytime soon.

> On Apr 7, 2016, at 12:51 AM, SilentGhost <report@bugs.python.org> wrote:
> 
> 
> SilentGhost added the comment:
> 
> Here is the version of the patch suitable for the Rietveld. John, could you perhaps provide an example file that uses UID values?
> 
> Also, the code is identical to handling of 0x50 token, perhaps it could be incorporated into it.
> 
> ----------
> nosy: +SilentGhost
> stage:  -> patch review
> versions: +Python 3.6
> Added file: http://bugs.python.org/file42390/issue26707.diff
> 
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue26707>
> _______________________________________
msg263000 - (view) Author: John Lehr (slo.sleuth) Date: 2016-04-07 18:15
I’m glad you found it in the Apple specification.  I looked, but missed it.  I would absolutely defer to you on your assessment of the decoding.

> On Apr 7, 2016, at 1:46 AM, Serhiy Storchaka <report@bugs.python.org> wrote:
> 
> 
> Serhiy Storchaka added the comment:
> 
> UID is rather int than bytes. And I would use a special UID type.
> 
> According to Apple's sources [1], the size of UID data is tokenL+1, not self._get_size(tokenL).
> 
> [1] http://www.opensource.apple.com/source/CF/CF-1153.18/CFBinaryPList.c
> 
> ----------
> 
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue26707>
> _______________________________________
msg263003 - (view) Author: John Lehr (slo.sleuth) Date: 2016-04-07 20:54
Based on the format specification pointed to by Serhiy, perhaps this a better patch, correcting size from previous patch submission and treating:

706,707c706,708
<         # tokenH == 0x80 is documented as 'UID' and appears to be used for
<         # keyed-archiving, not in plists.
---
>         elif tokenH == 0x80:  # UID
>             s = self._get_size(tokenL + 1)
>             return int.from_bytes(self._fp.read(s), 'big')

I have compared output with OS X plutil and plistlib.load() with this patch and the values are identical for UID fields.
msg263016 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2016-04-08 10:53
How can you create plist files that contain UID values using Apple's APIs?

The plist library is meant to be interoperable with files created using Apple's APIs for creating plist files, and I didn't find an API that created UID values at the time.
msg263021 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-04-08 11:53
The size of UID data is just tokenL + 1, not self._get_size(tokenL + 1).

FYI, the code for support UIDs and other types was proposed in issue14455, but was excluded from the final patch.
msg312987 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2018-02-27 09:25
Note that I'm still -1 on merging this patch because it is unclear how to create such plist files using public Apple APIs.

P.S. The low-level code for creating and reading binary plist files appears to be used for more than juist plist archives.
msg313189 - (view) Author: Jon Janzen (bigfootjon) * Date: 2018-03-03 17:59
Hello,

I have attached a file extracted from the database of the 2Do App for iOS and macOS. The file contains information about tags used in the app.

plistlib cannot currently parse this file because it lacks the ability to read byte 0x80 (UID).

I believe the documentation for generating these type of files can be found at: https://developer.apple.com/documentation/foundation/nskeyedarchiver
msg313240 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2018-03-05 09:29
@bigfootjon: Cocoa keyed archives are not plist files.
msg313246 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-03-05 12:25
But they use the plist format for serialization (as plists theirself use the XML format).

https://github.com/apple/swift-corelibs-foundation/blob/master/Foundation/NSKeyedArchiver.swift

Direct support of keyed archives would be better to implement in third-party package. But we can provide the support for low-level operations.

For distinguishing UIDs from integers and for being able to create plist files containing UIDs we need a special purposed class plist.UID. It will be a simpler wrapper around int with few methods: __index__(), __repr__(), __reduce__().
msg313253 - (view) Author: Jon Janzen (bigfootjon) * Date: 2018-03-05 15:55
@serhiy.storchaka: I've implemented a UID wrapper class

I've also updated the parser and writer classes to support the UID wrapper. The implementations for reading/writing XML UID tags match the implementations given by Apple's plutil distributed with macOS:

UID(x) becomes {'CF$UID': int(x)}
msg313751 - (view) Author: Jon Janzen (bigfootjon) * Date: 2018-03-13 14:33
Ping
msg313752 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2018-03-13 14:39
I'm still not too happy about supporting UIDs in plistlib, especially because I'm not sure it that's all that's needed. AFAIK I removed more types that the underlying encoder supported from plistlib because those are never used in plist files. 

The swift encoder for keyed archives is probably not the code that's actually used on the OS, AFAIK that still is Objective-C code.

P.S. I changed the version selection to 3.8, adding support for UIDs would be a feature change and not suited for back ports.
msg313764 - (view) Author: Jon Janzen (bigfootjon) * Date: 2018-03-13 17:29
Support for KeyedArchives are not limited to the Swift implementation I linked to. They have been supported since Mac OS X since 10.2 (long before Swift came around). The documentation (https://developer.apple.com/documentation/foundation/nskeyedarchiver?language=objc) shows that NSKeyedArchive can only output in plist format since outputFormat is of type NSPropertyListFormat (allowing to output in either XML or binary).

The other unimplemented binary token types (URL, UUID, set, ordset) are not used under NSKeyedArchive (see the "Encoding Data and Objects" section of the documentation mentioned above) so there's no concern that supporting 0x80 (UID) will suddenly necessitate implementing the other unimplemented types. If you feel that it would be necessary to implement them in order to accept the patch I would be happy to try and implement them.

I know I certainly have an use case (reading to-do list data from the 2Do app) and the creator of this bug wanted to read SnapChat data files.

Currently, I am using a hot-patched plistlib._BinaryPlistParser to read the data I need (see attached for a snippet) and I would rather not do that, but if you think my use case scope does not warrant inclusion in the standard library then I'll just have to deal with that.
msg314586 - (view) Author: Jon Janzen (bigfootjon) * Date: 2018-03-28 12:56
Ping
History
Date User Action Args
2018-03-28 12:56:42bigfootjonsetmessages: + msg314586
2018-03-13 17:29:20bigfootjonsetfiles: + plist_hack.py

messages: + msg313764
2018-03-13 14:52:52serhiy.storchakasettype: behavior -> enhancement
2018-03-13 14:39:35ronaldoussorensetmessages: + msg313752
versions: + Python 3.8, - Python 3.5, Python 3.6
2018-03-13 14:33:55bigfootjonsetmessages: + msg313751
2018-03-05 15:55:52bigfootjonsetmessages: + msg313253
2018-03-05 12:25:19serhiy.storchakasetmessages: + msg313246
2018-03-05 09:29:41ronaldoussorensetmessages: + msg313240
2018-03-03 17:59:03bigfootjonsetfiles: + cat.plist
nosy: + bigfootjon
messages: + msg313189

2018-02-27 09:25:45ronaldoussorensetmessages: + msg312987
2018-02-27 05:37:52python-devsetpull_requests: + pull_request5693
2016-04-08 11:53:19serhiy.storchakasetmessages: + msg263021
2016-04-08 10:53:28ronaldoussorensetmessages: + msg263016
2016-04-07 20:54:10slo.sleuthsetfiles: + plistlib_uid.diff

messages: + msg263003
2016-04-07 19:39:45serhiy.storchakasetassignee: ronaldoussoren
2016-04-07 18:15:55slo.sleuthsetmessages: + msg263000
2016-04-07 18:14:10slo.sleuthsetmessages: + msg262999
2016-04-07 08:46:31serhiy.storchakasetmessages: + msg262985
2016-04-07 07:51:26SilentGhostsetfiles: + issue26707.diff
versions: + Python 3.6
nosy: + SilentGhost

messages: + msg262984

stage: patch review
2016-04-07 04:05:25serhiy.storchakasetnosy: + ronaldoussoren, serhiy.storchaka
type: crash -> behavior
2016-04-07 01:34:01slo.sleuthcreate