Issue14455
Created on 2012-03-30 21:56 by d9pouces, last changed 2013-04-01 12:32 by d9pouces.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| plistlib.py | d9pouces, 2012-03-30 21:56 | New plistlib implementation, with read-support of the three plist formats | ||
| context.diff | d9pouces, 2012-03-30 22:50 | |||
| plistlib_ext.patch | serhiy.storchaka, 2012-03-31 07:55 | Ported to Python3 and cleaned | review | |
| plistlib_with_test.diff | d9pouces, 2012-04-08 08:31 | review | ||
| Messages (17) | |||
|---|---|---|---|
| msg157152 - (view) | Author: d9pouces (d9pouces) * | Date: 2012-03-30 21:56 | |
Hi, Plist files have actually three flavors : XML ones, binary ones, and now (starting from Mac OS X 10.7 Lion) json one. The plistlib.readPlist function can only read XML plist files and thus cannot read binary and json ones. The binary format is open and described by Apple (http://opensource.apple.com/source/CF/CF-550/CFBinaryPList.c). Here is the diff (from Python 2.7 implementation of plistlib) to transparently read both binary and json formats. API of plistlib remains unchanged, since format detection is done by plistlib.readPlist. An InvalidFileException is raised in case of malformed binary file. 57,58c57 < "Plist", "Data", "Dict", < "InvalidFileException", --- > "Plist", "Data", "Dict" 64d62 < import json 66d63 < import os 68d64 < import struct 81,89c77,78 < header = pathOrFile.read(8) < pathOrFile.seek(0) < if header == '<?xml ve' or header[2:] == '<?xml ': #XML plist file, without or with BOM < p = PlistParser() < rootObject = p.parse(pathOrFile) < elif header == 'bplist00': #binary plist file < rootObject = readBinaryPlistFile(pathOrFile) < else: #json plist file < rootObject = json.load(pathOrFile) --- > p = PlistParser() > rootObject = p.parse(pathOrFile) 195,285d183 < < # timestamp 0 of binary plists corresponds to 1/1/2001 (year of Mac OS X 10.0), instead of 1/1/1970. < MAC_OS_X_TIME_OFFSET = (31 * 365 + 8) * 86400 < < class InvalidFileException(ValueError): < def __str__(self): < return "Invalid file" < def __unicode__(self): < return "Invalid file" < < def readBinaryPlistFile(in_file): < """ < Read a binary plist file, following the description of the binary format: http://opensource.apple.com/source/CF/CF-550/CFBinaryPList.c < Raise InvalidFileException in case of error, otherwise return the root object, as usual < """ < in_file.seek(-32, os.SEEK_END) < trailer = in_file.read(32) < if len(trailer) != 32: < return InvalidFileException() < offset_size, ref_size, num_objects, top_object, offset_table_offset = struct.unpack('>6xBB4xL4xL4xL', trailer) < in_file.seek(offset_table_offset) < object_offsets = [] < offset_format = '>' + {1: 'B', 2: 'H', 4: 'L', 8: 'Q', }[offset_size] * num_objects < ref_format = {1: 'B', 2: 'H', 4: 'L', 8: 'Q', }[ref_size] < int_format = {0: (1, '>B'), 1: (2, '>H'), 2: (4, '>L'), 3: (8, '>Q'), } < object_offsets = struct.unpack(offset_format, in_file.read(offset_size * num_objects)) < def getSize(token_l): < """ return the size of the next object.""" < if token_l == 0xF: < m = ord(in_file.read(1)) & 0x3 < s, f = int_format[m] < return struct.unpack(f, in_file.read(s))[0] < return token_l < def readNextObject(offset): < """ read the object at offset. May recursively read sub-objects (content of an array/dict/set) """ < in_file.seek(offset) < token = in_file.read(1) < token_h, token_l = ord(token) & 0xF0, ord(token) & 0x0F #high and low parts < if token == '\x00': < return None < elif token == '\x08': < return False < elif token == '\x09': < return True < elif token == '\x0f': < return '' < elif token_h == 0x10: #int < result = 0 < for k in xrange((2 << token_l) - 1): < result = (result << 8) + ord(in_file.read(1)) < return result < elif token_h == 0x20: #real < if token_l == 2: < return struct.unpack('>f', in_file.read(4))[0] < elif token_l == 3: < return struct.unpack('>d', in_file.read(8))[0] < elif token_h == 0x30: #date < f = struct.unpack('>d', in_file.read(8))[0] < return datetime.datetime.utcfromtimestamp(f + MAC_OS_X_TIME_OFFSET) < elif token_h == 0x80: #data < s = getSize(token_l) < return in_file.read(s) < elif token_h == 0x50: #ascii string < s = getSize(token_l) < return in_file.read(s) < elif token_h == 0x60: #unicode string < s = getSize(token_l) < return in_file.read(s * 2).decode('utf-16be') < elif token_h == 0x80: #uid < return in_file.read(token_l + 1) < elif token_h == 0xA0: #array < s = getSize(token_l) < obj_refs = struct.unpack('>' + ref_format * s, in_file.read(s * ref_size)) < return map(lambda x: readNextObject(object_offsets[x]), obj_refs) < elif token_h == 0xC0: #set < s = getSize(token_l) < obj_refs = struct.unpack('>' + ref_format * s, in_file.read(s * ref_size)) < return set(map(lambda x: readNextObject(object_offsets[x]), obj_refs)) < elif token_h == 0xD0: #dict < result = {} < s = getSize(token_l) < key_refs = struct.unpack('>' + ref_format * s, in_file.read(s * ref_size)) < obj_refs = struct.unpack('>' + ref_format * s, in_file.read(s * ref_size)) < for k, o in zip(key_refs, obj_refs): < key = readNextObject(object_offsets[k]) < obj = readNextObject(object_offsets[o]) < result[key] = obj < return result < raise InvalidFileException() < return readNextObject(object_offsets[top_object]) < |
|||
| msg157154 - (view) | Author: R. David Murray (r.david.murray) * ![]() |
Date: 2012-03-30 22:14 | |
Thanks for the patch. Could you upload it as a context diff? |
|||
| msg157155 - (view) | Author: d9pouces (d9pouces) * | Date: 2012-03-30 22:50 | |
Here is the new patch. I assumed that you meant to use diff -c instead of the raw diff command. |
|||
| msg157159 - (view) | Author: R. David Murray (r.david.murray) * ![]() |
Date: 2012-03-30 23:31 | |
Hmm. Apparently what I meant was -u instead of -c (unified diff). I just use the 'hg diff' command myself, which does the right thing :) Of course, to do that you need to have a checkout. (We can probably use the context diff.) |
|||
| msg157166 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2012-03-31 07:55 | |
This patch is for Python 2. New features are accepted only for Python 3.3+. I ported the patch, but since I have no Mac, I can't check. To date code was specified incorrectly. The length of integers was calculated incorrectly. To convert integers, you can use int.from_bytes. Objects identity was not preserved. I'm not sure that the recognition of XML done enough. Should consider UTF-16 and UTF-32 with the BOM and without. Need tests. Also I'm a bit cleaned up and modernizing the code. I believe that it should be rewritten in a more object-oriented style. It is also worth to implement writer. |
|||
| msg157506 - (view) | Author: d9pouces (d9pouces) * | Date: 2012-04-04 21:06 | |
storchaka > I'm trying to take care of your remarks. So, I'm working on a more object-oriented code, with both write and read functions. I just need to write some test cases. IMHO, we should add a new parameter to the writePlist function, to allow the use of the binary or the json format of plist files instead of the default XML one. |
|||
| msg157668 - (view) | Author: Éric Araujo (eric.araujo) * ![]() |
Date: 2012-04-06 16:30 | |
Keep it simple: if a few functions work, there is no need at all to add classes. Before doing more work though I suggest you wait for the feedback of the Mac maintainers. |
|||
| msg157669 - (view) | Author: Ronald Oussoren (ronaldoussoren) * ![]() |
Date: 2012-04-06 16:44 | |
I (as one of the Mac maintainers) like the new functionality, but would like to see some changes:
1) as others have noted it is odd that binary and json plists can be read but not written
2) there need to be tests, and I'd add two or even three set of tests:
a. tests that read pre-generated files in the various formats
(tests that we're compatible with the format generated by Apple)
b. tests that use Apple tools to generated plists in various formats,
and check that the library can read them
(these tests would be skipped on platforms other than OSX)
c. if there are read and write functions: check that the writer
generates files that can be read back in.
3) there is a new public function for reading binary plist files,
I'd keep that private and add a "format" argument to readPlist
when there is a need for forcing the usage of a specific format
(and to mirror the (currently hypothetical) format argument for
writePlist).
Don't worry about rearchitecturing plistlib, it might need work in that regard but that need not be part of this issue and makes it harder to review the changes. I'm also far from convinced that a redesign of the code is needed.
|
|||
| msg157687 - (view) | Author: d9pouces (d9pouces) * | Date: 2012-04-06 20:34 | |
I'm working on a class, BinaryPlistParser, which allow to both read and write binary files.
I've also added a parameter fmt to writePlist and readPlist, to specify the format ('json', 'xml1' or 'binary1', using XML by default). These constants are used by Apple for its plutil program.
I'm now working on integrating these three formats to the test_plistlib.py. However, the json is less expressive than the other two, since it cannot handle dates.
|
|||
| msg157781 - (view) | Author: d9pouces (d9pouces) * | Date: 2012-04-08 08:31 | |
Here is the new patch, allowing read and write binary, json and xml plist files. It includes both the plistlib.py and test/test_plistlib.py patches. JSON format does not allow dates and data, so XML is used by default to write files. I use the json library to write JSON plist files, but its output is slightly different from the Apple default output: keys of dictionaries are in different order. Thus, I removed the test_appleformattingfromliteral test for JSON files. Similarly, my binary writer does not write the same binary files as the Apple library: my library writes the content of compound objects (dicts, lists and sets) before the object itself, while Apple writes the object before its content. Copying the Apple behavior results in some additional weird lines of code, for little benefit. Thus, I also removed the test_appleformattingfromliteral test for binary files. Other tests are made for all the three formats. |
|||
| msg164620 - (view) | Author: Mark Grandi (markgrandi) | Date: 2012-07-03 20:05 | |
Hi, I noticed in the latest message that d9pounces posted that "JSON format does not allow dates and data, so XML is used by default to write files.". Rthe XML version of plists also do not really 'support' those types, and they are converted as follows: NSData -> Base64 encoded data NSDate -> ISO 8601 formatted string (from http://en.wikipedia.org/wiki/Property_list#Mac_OS_X) So really it should be the same thing when converting to json no? |
|||
| msg165438 - (view) | Author: d9pouces (d9pouces) * | Date: 2012-07-14 09:46 | |
The plutil (Apple's command-line tool to convert plist files from a format to another) returns an error if you try to convert a XML plist with dates to JSON. |
|||
| msg168974 - (view) | Author: Mark Grandi (markgrandi) | Date: 2012-08-24 03:13 | |
Where are you even seeing these json property lists? I just checked the most recent documentation for NSPropertyListSerialization, and they have not updated the enum for NSPropertyListFormat. It seems that if even Apple doesn't support writing json property lists with their own apis then we shouldn't worry about supporting it? see: https://developer.apple.com/library/ios/#documentation/Cocoa/Reference/Foundation/Classes/NSPropertyListSerialization_Class/Reference/Reference.html enum { NSPropertyListOpenStepFormat = kCFPropertyListOpenStepFormat, NSPropertyListXMLFormat_v1_0 = kCFPropertyListXMLFormat_v1_0, NSPropertyListBinaryFormat_v1_0 = kCFPropertyListBinaryFormat_v1_0 }; NSPropertyListFormat; typedef NSUInteger NSPropertyListFormat; |
|||
| msg169000 - (view) | Author: Ronald Oussoren (ronaldoussoren) * ![]() |
Date: 2012-08-24 11:42 | |
plutil(1) supports writing json format. That written, the opensource parts of CoreFoundation on opensource.apple.com don't support reading or writing json files. I'm therefore -1 w.r.t. adding support for json formatted plist files, support for json can be added when Apple actually supports that it the system libraries and hence the format is stable. |
|||
| msg169100 - (view) | Author: Mark Grandi (markgrandi) | Date: 2012-08-24 23:51 | |
are any more changes needed to the code that is already posted as a patch in this bug report? or are the changes you wanted to see happen in msg157669 not happen yet? |
|||
| msg185734 - (view) | Author: Ronald Oussoren (ronaldoussoren) * ![]() |
Date: 2013-04-01 12:23 | |
d9pouces: are you willing to sign a contributor agreement? The agreement is needed before we can add these changes to the stdlib, and I'd like to that for the 3.4 release. More information on the contributor agreement: http://www.python.org/psf/contrib/contrib-form/ |
|||
| msg185736 - (view) | Author: d9pouces (d9pouces) * | Date: 2013-04-01 12:32 | |
I just signed this agreement. Thanks for accepting this patch! |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2013-04-01 12:32:29 | d9pouces | set | messages: + msg185736 |
| 2013-04-01 12:23:26 | ronaldoussoren | set | messages: + msg185734 |
| 2012-08-24 23:51:56 | markgrandi | set | messages: + msg169100 |
| 2012-08-24 11:42:42 | ronaldoussoren | set | messages:
+ msg169000 versions: + Python 3.4, - Python 3.3 |
| 2012-08-24 03:13:07 | markgrandi | set | messages: + msg168974 |
| 2012-07-14 09:46:53 | d9pouces | set | messages: + msg165438 |
| 2012-07-03 20:05:16 | markgrandi | set | nosy:
+ markgrandi messages: + msg164620 |
| 2012-04-08 08:31:26 | d9pouces | set | files:
+ plistlib_with_test.diff messages: + msg157781 |
| 2012-04-06 20:34:28 | d9pouces | set | messages: + msg157687 |
| 2012-04-06 16:44:17 | ronaldoussoren | set | messages: + msg157669 |
| 2012-04-06 16:30:16 | eric.araujo | set | nosy:
+ eric.araujo messages: + msg157668 |
| 2012-04-04 21:06:31 | d9pouces | set | messages: + msg157506 |
| 2012-04-02 15:37:22 | jrjsmrtn | set | nosy:
+ jrjsmrtn |
| 2012-03-31 07:55:15 | serhiy.storchaka | set | files:
+ plistlib_ext.patch nosy: + serhiy.storchaka messages: + msg157166 |
| 2012-03-31 04:19:10 | ned.deily | set | nosy:
+ ned.deily |
| 2012-03-30 23:31:51 | r.david.murray | set | messages: + msg157159 |
| 2012-03-30 22:50:26 | d9pouces | set | files:
+ context.diff keywords: + patch messages: + msg157155 |
| 2012-03-30 22:14:53 | r.david.murray | set | versions:
+ Python 3.3, - Python 2.7 nosy: + r.david.murray messages: + msg157154 stage: patch review |
| 2012-03-30 21:56:18 | d9pouces | create | |
