Message 338269 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	scoder
Recipients	Mariatta, dfrojas, eli.bendersky, lukasz.langa, matrixise, nedbat, rhettinger, scoder, serhiy.storchaka, sivert, taleinat, vstinner
Date	2019-03-18.17:44:30
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1552931070.91.0.413439137288.issue34160@roundup.psfhosted.org>
In-reply-to

Content
Victor, as much as I appreciate backwards compatibility, I really don't think it's a big deal in this case. In fact, it might not even apply. My (somewhat educated) gut feeling is that most users simply won't care or won't even notice the change. Out of those who do (or have to) care, many are better off by fixing their code to not rely on an entirely arbitrary (sorted by name) attribute order than by getting the old behaviour back. And for those few who really need attributes to be sorted by name, there's the recipe I posted which works on all ElementTree implementations out there with all alive CPython versions. > This recipe does modify the document and so changes the behaviour of the application when it iterates on attributes later This is actually a very rare thing. If I were to make up numbers, I'd guess that some 99% of the applications do XML serialisation as the last thing and then throw away the tree afterwards, without touching it (or its attributes) again. And the remaining cases are most probably covered by the "don't need to care" type of users. I don't think we should optimise for 0.05% of our user base by providing a new API option for them. Especially in ElementTree, which decidedly aims to be simple. The example that Ned gave refers to a very specific and narrow case: comparing serialised XML, at the byte level, in tests. He was very lucky that ElementTree was so stable over the last 10 Python releases that the output did not change at all. That is not something that an XML library needs to guarantee. There is some ambiguity in XML for everything that's outside of the XML Information set, and there is a good reason why the W3C has tackled this ambiguity with an explicit and separate specification: C14N. So, when you write: > Many XML parsers rely on the order of attributes It's certainly not many parsers, and could even be close to none. The order of attributes is explicitly excluded from the XML Information set: https://www.w3.org/TR/xml-infoset/#omitted Despite this, cases where the order of the attributes matters to the application are not unheard of. But for them, attributes sorted by their name are most likely the problem and not a solution. Raymond mentioned one such example. Sorting attributes by their name really only fulfils a single purpose: to get reproducible output in cases where the order does not matter. For all the (few) cases where the order does matter, it gets in the way. But by removing the sorting, as this change does, we still get predictable output due to dict ordering. So this use case is still covered. It's just not necessarily the same output as before, because now the ordering is entirely in the hands of the users. Meaning, those users who do care can now actually influence the ordering, which was very difficult and hackish to achieve before. We are allowing users to remove these hacks, not forcing them to add new ones.

Victor, as much as I appreciate backwards compatibility, I really don't think it's a big deal in this case. In fact, it might not even apply.

My (somewhat educated) gut feeling is that most users simply won't care or won't even notice the change. Out of those who do (or have to) care, many are better off by fixing their code to not rely on an entirely arbitrary (sorted by name) attribute order than by getting the old behaviour back. And for those few who really need attributes to be sorted by name, there's the recipe I posted which works on all ElementTree implementations out there with all alive CPython versions.

> This recipe does modify the document and so changes the behaviour of the application when it iterates on attributes later

This is actually a very rare thing. If I were to make up numbers, I'd guess that some 99% of the applications do XML serialisation as the last thing and then throw away the tree afterwards, without touching it (or its attributes) again. And the remaining cases are most probably covered by the "don't need to care" type of users. I don't think we should optimise for 0.05% of our user base by providing a new API option for them. Especially in ElementTree, which decidedly aims to be simple.

The example that Ned gave refers to a very specific and narrow case: comparing serialised XML, at the byte level, in tests. He was very lucky that ElementTree was so stable over the last 10 Python releases that the output did not change at all. That is not something that an XML library needs to guarantee. There is some ambiguity in XML for everything that's outside of the XML Information set, and there is a good reason why the W3C has tackled this ambiguity with an explicit and *separate* specification: C14N. So, when you write:

> Many XML parsers rely on the order of attributes

It's certainly not many parsers, and could even be close to none. The order of attributes is explicitly excluded from the XML Information set:

https://www.w3.org/TR/xml-infoset/#omitted

Despite this, cases where the order of the attributes matters to the *application* are not unheard of. But for them, attributes sorted by their name are most likely the problem and not a solution. Raymond mentioned one such example. Sorting attributes by their name really only fulfils a single purpose: to get reproducible output in cases where the order does *not* matter. For all the (few) cases where the order *does* matter, it gets in the way.

But by removing the sorting, as this change does, we still get predictable output due to dict ordering. So this use case is still covered. It's just not necessarily the same output as before, because now the ordering is entirely in the hands of the users. Meaning, those users who *do* care can now actually influence the ordering, which was very difficult and hackish to achieve before. We are allowing users to remove these hacks, not forcing them to add new ones.

History
Date	User	Action	Args
2019-03-18 17:44:30	scoder	set	recipients: + scoder, rhettinger, vstinner, taleinat, nedbat, eli.bendersky, lukasz.langa, serhiy.storchaka, matrixise, sivert, Mariatta, dfrojas
2019-03-18 17:44:30	scoder	set	messageid: <1552931070.91.0.413439137288.issue34160@roundup.psfhosted.org>
2019-03-18 17:44:30	scoder	link	issue34160 messages
2019-03-18 17:44:30	scoder	create