Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plistlib fails to parse bplist with 0x80 UID values #70894

Closed
slosleuth mannequin opened this issue Apr 7, 2016 · 19 comments
Closed

plistlib fails to parse bplist with 0x80 UID values #70894

slosleuth mannequin opened this issue Apr 7, 2016 · 19 comments
Assignees
Labels
3.8 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@slosleuth
Copy link
Mannequin

slosleuth mannequin commented Apr 7, 2016

BPO 26707
Nosy @ronaldoussoren, @serhiy-storchaka, @bigfootjon
PRs
  • bpo-26707: make plistlib read UID Key-Archive data #5922
  • bpo-26707: enable plistlib to read UID keys #12153
  • Files
  • plistlib_uid.diff: diff file with proposed UID patch
  • issue26707.diff
  • plistlib_uid.diff: diff file with updated UID patch
  • cat.plist
  • plist_hack.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ronaldoussoren'
    closed_at = <Date 2019-05-15.20:15:41.815>
    created_at = <Date 2016-04-07.01:34:01.338>
    labels = ['3.8', 'type-feature', 'library']
    title = 'plistlib fails to parse bplist with 0x80 UID values'
    updated_at = <Date 2019-05-15.20:15:41.814>
    user = 'https://bugs.python.org/slosleuth'

    bugs.python.org fields:

    activity = <Date 2019-05-15.20:15:41.814>
    actor = 'serhiy.storchaka'
    assignee = 'ronaldoussoren'
    closed = True
    closed_date = <Date 2019-05-15.20:15:41.815>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2016-04-07.01:34:01.338>
    creator = 'slo.sleuth'
    dependencies = []
    files = ['42388', '42390', '42395', '47468', '47483']
    hgrepos = []
    issue_num = 26707
    keywords = ['patch']
    message_count = 19.0
    messages = ['262974', '262984', '262985', '262999', '263000', '263003', '263016', '263021', '312987', '313189', '313240', '313246', '313253', '313751', '313752', '313764', '314586', '337062', '342599']
    nosy_count = 5.0
    nosy_names = ['ronaldoussoren', 'SilentGhost', 'serhiy.storchaka', 'slo.sleuth', 'bigfootjon']
    pr_nums = ['5922', '12153']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue26707'
    versions = ['Python 3.8']

    @slosleuth
    Copy link
    Mannequin Author

    slosleuth mannequin commented Apr 7, 2016

    libplist raises an invalid file exception on loading properly formed binary plists containing UID (0x80) values. The binary files were tested for form with plutil.

    Comments at line 706 state the value is defined but not in use in plists, and the object is not handled. However, I find them frequently in bplists, e.g., iOS Snapchat application files. I have attached a proposed patch that I have tested on these files and can now successfully parse them with the _read_object method in the _BinaryPlistParser class.

    My proposed patch is pasted below for others consideration while waiting for the issue to be resolved.

    706,707c706,708
    < # tokenH == 0x80 is documented as 'UID' and appears to be used for
    < # keyed-archiving, not in plists.
    ---

        elif tokenH == 0x80: #UID
            s = self.\_get_size(tokenL)
            return self.\_fp.read(s).decode('ascii')
    

    Thanks for your consideration.

    @slosleuth slosleuth mannequin added type-crash A hard crash of the interpreter, possibly with a core dump stdlib Python modules in the Lib dir labels Apr 7, 2016
    @serhiy-storchaka serhiy-storchaka added type-bug An unexpected behavior, bug, or error and removed type-crash A hard crash of the interpreter, possibly with a core dump labels Apr 7, 2016
    @SilentGhost
    Copy link
    Mannequin

    SilentGhost mannequin commented Apr 7, 2016

    Here is the version of the patch suitable for the Rietveld. John, could you perhaps provide an example file that uses UID values?

    Also, the code is identical to handling of 0x50 token, perhaps it could be incorporated into it.

    @serhiy-storchaka
    Copy link
    Member

    UID is rather int than bytes. And I would use a special UID type.

    According to Apple's sources [1], the size of UID data is tokenL+1, not self._get_size(tokenL).

    [1] http://www.opensource.apple.com/source/CF/CF-1153.18/CFBinaryPList.c

    @slosleuth
    Copy link
    Mannequin Author

    slosleuth mannequin commented Apr 7, 2016

    I’m sorry, but the files in which I detected the problem cannot be circulated. I will try to create a test account on Snapchat and generate some test data, but I can’t do this anytime soon.

    On Apr 7, 2016, at 12:51 AM, SilentGhost <report@bugs.python.org> wrote:

    SilentGhost added the comment:

    Here is the version of the patch suitable for the Rietveld. John, could you perhaps provide an example file that uses UID values?

    Also, the code is identical to handling of 0x50 token, perhaps it could be incorporated into it.

    ----------
    nosy: +SilentGhost
    stage: -> patch review
    versions: +Python 3.6
    Added file: http://bugs.python.org/file42390/issue26707.diff


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue26707\>


    @slosleuth
    Copy link
    Mannequin Author

    slosleuth mannequin commented Apr 7, 2016

    I’m glad you found it in the Apple specification. I looked, but missed it. I would absolutely defer to you on your assessment of the decoding.

    On Apr 7, 2016, at 1:46 AM, Serhiy Storchaka <report@bugs.python.org> wrote:

    Serhiy Storchaka added the comment:

    UID is rather int than bytes. And I would use a special UID type.

    According to Apple's sources [1], the size of UID data is tokenL+1, not self._get_size(tokenL).

    [1] http://www.opensource.apple.com/source/CF/CF-1153.18/CFBinaryPList.c

    ----------


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue26707\>


    @slosleuth
    Copy link
    Mannequin Author

    slosleuth mannequin commented Apr 7, 2016

    Based on the format specification pointed to by Serhiy, perhaps this a better patch, correcting size from previous patch submission and treating:

    706,707c706,708
    < # tokenH == 0x80 is documented as 'UID' and appears to be used for
    < # keyed-archiving, not in plists.
    ---

        elif tokenH == 0x80:  # UID
            s = self.\_get_size(tokenL + 1)
            return int.from_bytes(self.\_fp.read(s), 'big')
    

    I have compared output with OS X plutil and plistlib.load() with this patch and the values are identical for UID fields.

    @ronaldoussoren
    Copy link
    Contributor

    How can you create plist files that contain UID values using Apple's APIs?

    The plist library is meant to be interoperable with files created using Apple's APIs for creating plist files, and I didn't find an API that created UID values at the time.

    @serhiy-storchaka
    Copy link
    Member

    The size of UID data is just tokenL + 1, not self._get_size(tokenL + 1).

    FYI, the code for support UIDs and other types was proposed in bpo-14455, but was excluded from the final patch.

    @ronaldoussoren
    Copy link
    Contributor

    Note that I'm still -1 on merging this patch because it is unclear how to create such plist files using public Apple APIs.

    P.S. The low-level code for creating and reading binary plist files appears to be used for more than juist plist archives.

    @Bigfootjon
    Copy link
    Mannequin

    Bigfootjon mannequin commented Mar 3, 2018

    Hello,

    I have attached a file extracted from the database of the 2Do App for iOS and macOS. The file contains information about tags used in the app.

    plistlib cannot currently parse this file because it lacks the ability to read byte 0x80 (UID).

    I believe the documentation for generating these type of files can be found at: https://developer.apple.com/documentation/foundation/nskeyedarchiver

    @ronaldoussoren
    Copy link
    Contributor

    @bigfootjon: Cocoa keyed archives are not plist files.

    @serhiy-storchaka
    Copy link
    Member

    But they use the plist format for serialization (as plists theirself use the XML format).

    https://github.com/apple/swift-corelibs-foundation/blob/master/Foundation/NSKeyedArchiver.swift

    Direct support of keyed archives would be better to implement in third-party package. But we can provide the support for low-level operations.

    For distinguishing UIDs from integers and for being able to create plist files containing UIDs we need a special purposed class plist.UID. It will be a simpler wrapper around int with few methods: __index__(), __repr__(), __reduce__().

    @Bigfootjon
    Copy link
    Mannequin

    Bigfootjon mannequin commented Mar 5, 2018

    @serhiy.storchaka: I've implemented a UID wrapper class

    I've also updated the parser and writer classes to support the UID wrapper. The implementations for reading/writing XML UID tags match the implementations given by Apple's plutil distributed with macOS:

    UID(x) becomes {'CF$UID': int(x)}

    @Bigfootjon
    Copy link
    Mannequin

    Bigfootjon mannequin commented Mar 13, 2018

    Ping

    @ronaldoussoren
    Copy link
    Contributor

    I'm still not too happy about supporting UIDs in plistlib, especially because I'm not sure it that's all that's needed. AFAIK I removed more types that the underlying encoder supported from plistlib because those are never used in plist files.

    The swift encoder for keyed archives is probably not the code that's actually used on the OS, AFAIK that still is Objective-C code.

    P.S. I changed the version selection to 3.8, adding support for UIDs would be a feature change and not suited for back ports.

    @ronaldoussoren ronaldoussoren added the 3.8 only security fixes label Mar 13, 2018
    @serhiy-storchaka serhiy-storchaka added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Mar 13, 2018
    @Bigfootjon
    Copy link
    Mannequin

    Bigfootjon mannequin commented Mar 13, 2018

    Support for KeyedArchives are not limited to the Swift implementation I linked to. They have been supported since Mac OS X since 10.2 (long before Swift came around). The documentation (https://developer.apple.com/documentation/foundation/nskeyedarchiver?language=objc) shows that NSKeyedArchive can only output in plist format since outputFormat is of type NSPropertyListFormat (allowing to output in either XML or binary).

    The other unimplemented binary token types (URL, UUID, set, ordset) are not used under NSKeyedArchive (see the "Encoding Data and Objects" section of the documentation mentioned above) so there's no concern that supporting 0x80 (UID) will suddenly necessitate implementing the other unimplemented types. If you feel that it would be necessary to implement them in order to accept the patch I would be happy to try and implement them.

    I know I certainly have an use case (reading to-do list data from the 2Do app) and the creator of this bug wanted to read SnapChat data files.

    Currently, I am using a hot-patched plistlib._BinaryPlistParser to read the data I need (see attached for a snippet) and I would rather not do that, but if you think my use case scope does not warrant inclusion in the standard library then I'll just have to deal with that.

    @Bigfootjon
    Copy link
    Mannequin

    Bigfootjon mannequin commented Mar 28, 2018

    Ping

    @Bigfootjon
    Copy link
    Mannequin

    Bigfootjon mannequin commented Mar 4, 2019

    I recently upgraded my python version and my hot-patch broke due to changes in bpo-32072 (GH-4455).

    It reminded me of this b.p.o., and after reading through the messages to remind myself where the patch stood I realized that my tone friendly towards the end was not overly.

    @ronaldoussoren, I apologize if I offended.

    In other news, I re-implemented the patch (and filed a new pull-request) due to the following:

    • Things got spread out over too many commits
    • NSKeyedArchiver only uses binary mode, so I removed the XML compatibility
    • I also wrote a few tests to verify the UID implementation

    @serhiy-storchaka
    Copy link
    Member

    New changeset c981ad1 by Serhiy Storchaka (Jon Janzen) in branch 'master':
    bpo-26707: Enable plistlib to read UID keys. (GH-12153)
    c981ad1

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants