Message 208464 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	scoder
Recipients	effbot, eli.bendersky, mark, nikratio, scoder
Date	2014-01-19.09:01:29
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1390122089.49.0.406288540494.issue9521@psf.upfronthosting.co.za>
In-reply-to

Content
When you write "XML PI", do you mean the XML declaration? At least that's what Mark used in his original example. ET avoids writing them out when they are not necessary, i.e. for UTF-8 compatible encodings. IMHO that's perfectly ok and definitely not an incorrect behaviour. As for processing instructions (what you used in your test case patch), making them appear in the tree by default would be a behavioural change that might break existing ET code. Note that lxml keeps PIs in the tree by default, unless you configure its parser explicitly with "remove_pis=True". There is also a "remove_comments=True" in lxml. ET simply discards comments when parsing IIRC. http://lxml.de/parsing.html#parser-options IMHO, both behaviours are ok, which lxml having a tendency towards keeping the data as it came in rather than trying to find the easiest possible way for the user to work with the tree. PIs and comments are a bit 'special' to work with. A fix could be to add the two keyword arguments also to ET's parser, but make them default to True (as opposed to False in lxml), so that users can enable them at need.

When you write "XML PI", do you mean the XML declaration? At least that's what Mark used in his original example.

ET avoids writing them out when they are not necessary, i.e. for UTF-8 compatible encodings. IMHO that's perfectly ok and definitely not an incorrect behaviour.

As for processing instructions (what you used in your test case patch), making them appear in the tree by default would be a behavioural change that might break existing ET code.

Note that lxml keeps PIs in the tree by default, unless you configure its parser explicitly with "remove_pis=True".

There is also a "remove_comments=True" in lxml. ET simply discards comments when parsing IIRC.

http://lxml.de/parsing.html#parser-options

IMHO, both behaviours are ok, which lxml having a tendency towards keeping the data as it came in rather than trying to find the easiest possible way for the user to work with the tree. PIs and comments are a bit 'special' to work with.

A fix could be to add the two keyword arguments also to ET's parser, but make them default to True (as opposed to False in lxml), so that users can enable them at need.

History
Date	User	Action	Args
2014-01-19 09:01:29	scoder	set	recipients: + scoder, effbot, mark, eli.bendersky, nikratio
2014-01-19 09:01:29	scoder	set	messageid: <1390122089.49.0.406288540494.issue9521@psf.upfronthosting.co.za>
2014-01-19 09:01:29	scoder	link	issue9521 messages
2014-01-19 09:01:29	scoder	create