classification
Title: plistlib rejects strings containing control characters
Type: behavior Stage: test needed
Components: Library (Lib), macOS Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: ronaldoussoren Nosy List: Behdad.Esfahbod, MLModel, loewis, ronaldoussoren
Priority: low Keywords:

Created on 2010-12-18 19:39 by MLModel, last changed 2015-04-23 18:33 by Behdad.Esfahbod.

Files
File name Uploaded Description Edit
com.apple.Terminal.plist MLModel, 2010-12-18 19:39 output of plutil -convert xml1 ~/Library/Preferences/com.apple.Terminal.plist -o ~/tmp/com.apple.Terminal.plist
Messages (7)
msg124311 - (view) Author: Mitchell Model (MLModel) Date: 2010-12-18 19:39
plistlib rejects control characters found in XML plists that Apple's 'plutil lint' accepts. I have attached my Terminal preferences as an example. (plistlib accepts the contents of the default Terminal preferences file)
msg124313 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-12-18 20:00
This is tricky. It's clearly ill-formed XML, so I'm not sure that this needs to be bug-compatible with Apple's implementation.
msg124325 - (view) Author: Mitchell Model (MLModel) Date: 2010-12-18 22:14
I can see where that does make it tricky. (I also tried reading the plist after opening the file as binary, but no luck.) The problem here, of course, is that the only reason for the existence of this library is to read Apple's plist files, however XML-invalid some may be. (It is only a small number of my very many .plist files that have invalid characters -- I just happened to pick one of them to try to access in order to print a simple summary of its contents.) I guess since the plist is read using xml.parsers.expat, there's not much that can be done, and it wouldn't be worth anyone's time to hack around this for plistlib, especially since nearly all .plist files appear to be conforming. Thanks for the clarification.
msg124437 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2010-12-21 17:00
I agree with Martin that this is a tricky one. 

The file is problematic because it is invalid XML[1], however Apple's tools are perfectly happy to proces the file and as Mitchell notes plistlib exists to interoperate with Apple's plist files.

I'm therefore reopening the issue, but with a low priority. It is unlikely that I'll work on this in the near future though. 

Replacing all control characters by entities before trying to parse the Plist XML would likely be the best way forward. A patch (including testcases) would definitely be appreciated.

BTW. I've checked that Apple's Cocoa libraries will read the file, this is not just a bug in the xml1 output formatter of plutil.

Using PyObjC:
>>> from Foundation import NSDictionary
>>> d = NSDictionary.dictionaryWithContentsOfFile_('com.apple.Terminal.plist')



[1] It is invalid XML because it contains control characters which are invalid according to the XML specification (<http://www.w3.org/TR/2000/REC-xml-20001006#NT-Char>).
msg124608 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-12-24 20:36
Mitchell: 2.6 is closed to revision except for security issues
msg124625 - (view) Author: Mitchell Model (MLModel) Date: 2010-12-25 01:27
Thanks for letting me know (and with a personalized message, yet!). I wasn't paying attention -- i verified that the problem exists in 2.7 and 3.1 and I just dragged 3.1 down to 2.6. Although I've been working furiously in Python for the past six months, I haven't been writing or teaching so I haven't been combing the documentation or testing examples or using obscure or forgotten features, which together are the source of nearly all my bug reports. So I just automatically did what I used to do. I understand the issue and difference; I just didn't realize 2.6 was closed -- I didn't really mean that this should be fixed in anything other than the current or even next release of anything. 
On Dec 24, 2010, at 3:36 PM, Terry J. Reedy wrote:

> 
> Terry J. Reedy <tjreedy@udel.edu> added the comment:
> 
> Mitchell: 2.6 is closed to revision except for security issues
> 
> ----------
> nosy: +terry.reedy
> resolution: wont fix -> 
> versions: +Python 3.2 -Python 2.6
> 
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue10733>
> _______________________________________
msg241874 - (view) Author: Behdad Esfahbod (Behdad.Esfahbod) Date: 2015-04-23 18:33
> Replacing all control characters by entities before trying to parse the Plist XML would likely be the best way forward. 

That wouldn't work.  Control characters are disallowed in XML's character set, so they are invalid even if input as entities.

Unfortunately this causes a lot of trouble for clients [0], because it means that XML cannot represent the full Unicode repertoire.  I'm curious about alternates.  Perhaps the expat module can be extended to allow recovering from this if the client chooses to...

[0] eg. https://github.com/behdad/fonttools/issues/249
History
Date User Action Args
2015-04-23 18:33:14Behdad.Esfahbodsetnosy: + Behdad.Esfahbod
messages: + msg241874
2012-05-07 16:59:53ezio.melottisetversions: + Python 3.3, - Python 3.1
2010-12-25 01:27:14MLModelsetmessages: + msg124625
2010-12-24 22:35:34terry.reedysetnosy: - terry.reedy
2010-12-24 20:36:29terry.reedysetversions: + Python 3.2, - Python 2.6
nosy: + terry.reedy

messages: + msg124608

resolution: wont fix ->
2010-12-21 17:00:35ronaldoussorensetstatus: closed -> open
priority: normal -> low

messages: + msg124437
stage: resolved -> test needed
2010-12-20 03:13:00r.david.murraysetstatus: open -> closed
resolution: wont fix
stage: resolved
2010-12-18 22:14:45MLModelsetmessages: + msg124325
2010-12-18 20:00:28loewissetnosy: + loewis
messages: + msg124313
2010-12-18 19:39:58MLModelcreate