Message 94853 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ezio.melotti
Recipients	devon, effbot, ezio.melotti, moriyoshi
Date	2009-11-02.22:06:33
SpamBayes Score	0.00011491134
Marked as misclassified	No
Message-id	<1257199596.62.0.384083201988.issue7139@psf.upfronthosting.co.za>
In-reply-to

Content
If I understood correctly, the correct behavior while reading is: * literal newlines (\n or \r) and tabs (\t) should be collapsed and converted to a space * newlines ( or ) and tabs ( ) as entities should be converted to the literal equivalents (\n, \r and \t) (See http://www.w3.org/TR/2000/WD-xml-c14n-20000119.html#charescaping) This should be ok in both xml.minidom and etree. Instead, while writing, if literal newlines and tabs are written as they are (\n, \r and \t), they can't be read during the parsing phase because they are collapsed and converted to a space. They should therefore be converted to entities ( , and ) automatically, but this could be incompatible with the current behavior (i.e. \n, \r or \t that now are written and collapsed as a space during the parsing will then become significant). Moriyoshi, can you confirm that what I said is correct and the problem is similar to the one described in #5752? I also closed #6492 as duplicate of this.

If I understood correctly, the correct behavior while reading is:
  * literal newlines (\n or \r) and tabs (\t) should be collapsed and
converted to a space
  * newlines (&#xA; or &#xD;) and tabs (&#x9;) as entities should be
converted to the literal equivalents (\n, \r and \t)

(See http://www.w3.org/TR/2000/WD-xml-c14n-20000119.html#charescaping)

This should be ok in both xml.minidom and etree.


Instead, while writing, if literal newlines and tabs are written as they
are (\n, \r and \t), they can't be read during the parsing phase because
they are collapsed and converted to a space. They should therefore be
converted to entities (&#xA;, &#xD; and &#x9;) automatically, but this
could be incompatible with the current behavior (i.e. \n, \r or \t that
now are written and collapsed as a space during the parsing will then
become significant).

Moriyoshi, can you confirm that what I said is correct and the problem
is similar to the one described in #5752?
I also closed #6492 as duplicate of this.

History
Date	User	Action	Args
2009-11-02 22:06:36	ezio.melotti	set	recipients: + ezio.melotti, effbot, devon, moriyoshi
2009-11-02 22:06:36	ezio.melotti	set	messageid: <1257199596.62.0.384083201988.issue7139@psf.upfronthosting.co.za>
2009-11-02 22:06:34	ezio.melotti	link	issue7139 messages
2009-11-02 22:06:33	ezio.melotti	create