This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author scoder
Recipients effbot, eli.bendersky, mark, nikratio, scoder
Date 2014-01-19.09:01:29
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1390122089.49.0.406288540494.issue9521@psf.upfronthosting.co.za>
In-reply-to
Content
When you write "XML PI", do you mean the XML declaration? At least that's what Mark used in his original example.

ET avoids writing them out when they are not necessary, i.e. for UTF-8 compatible encodings. IMHO that's perfectly ok and definitely not an incorrect behaviour.

As for processing instructions (what you used in your test case patch), making them appear in the tree by default would be a behavioural change that might break existing ET code.

Note that lxml keeps PIs in the tree by default, unless you configure its parser explicitly with "remove_pis=True".

There is also a "remove_comments=True" in lxml. ET simply discards comments when parsing IIRC.

http://lxml.de/parsing.html#parser-options

IMHO, both behaviours are ok, which lxml having a tendency towards keeping the data as it came in rather than trying to find the easiest possible way for the user to work with the tree. PIs and comments are a bit 'special' to work with.

A fix could be to add the two keyword arguments also to ET's parser, but make them default to True (as opposed to False in lxml), so that users can enable them at need.
History
Date User Action Args
2014-01-19 09:01:29scodersetrecipients: + scoder, effbot, mark, eli.bendersky, nikratio
2014-01-19 09:01:29scodersetmessageid: <1390122089.49.0.406288540494.issue9521@psf.upfronthosting.co.za>
2014-01-19 09:01:29scoderlinkissue9521 messages
2014-01-19 09:01:29scodercreate