classification
Title: XML version is ignored
Type: behavior Stage: resolved
Components: XML Versions: Python 3.11
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: christian.heimes, eli.bendersky, iritkatriel, loewis, scoder, tkuhn
Priority: normal Keywords:

Created on 2013-11-29 13:33 by tkuhn, last changed 2021-06-18 11:21 by scoder. This issue is now closed.

Messages (4)
msg204720 - (view) Author: Tobias Kuhn (tkuhn) Date: 2013-11-29 13:33
The first line of an XML file should be something like this:

    <?xml version='1.0' encoding='UTF-8'?>

The XML parser of xml.sax, however, seems to ignore the value of "version":

    <?xml version='X' encoding='UTF-8'?>

This should give an error, but it doesn't. It's not a very serious problem, but this should raise an error to be standards-compliant.

I experienced this bug in the rdflib package:

    https://github.com/RDFLib/rdflib/issues/347
msg204737 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2013-11-29 16:10
If (as I assume) XML 1.1 isn't supported, then rejecting anything but "1.0" would be correct.

Not for Py2.7 anymore, though, I guess, more something to fix for 3.4.
msg396044 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2021-06-18 10:57
Reproduced in 3.11:

>>> xml.sax.parseString("<?xml version='X' encoding='UTF-8'?><root>blah</root>", xml.sax.ContentHandler())
>>>
msg396048 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2021-06-18 11:21
After reading up a bit, version "X" should probably be rejected, whereas "1.[0-9]+" is meant to be allowed also by a 1.0 parser, according to the spec:

https://www.w3.org/TR/REC-xml/#sec-prolog-dtd

"""
When an XML 1.0 processor encounters a document that specifies a 1.x version number other than '1.0', it will process it as a 1.0 document. This means that an XML 1.0 processor will accept 1.x documents provided they do not use any non-1.0 features.
"""

However, this is not so much an issue with the SAX framework but with the underlying parser, which would be expat. Not sure why that doesn't care about the version.

Personally, I don't really care. There are only two XML versions, 1.0 and 1.1, and an XML 1.x parser is supposed to deal with both of them nicely. Anyone who writes something different in their XML version probably does so deliberately and wrongly. As long as the rest is XML, I don't see a reason to reject such an input document.

I'll close this as "won't fix", since there is no practical effect, it would need effort, and it doesn't look like anyone cared in almost 8 years.
History
Date User Action Args
2021-06-18 11:21:16scodersetstatus: open -> closed
resolution: wont fix
messages: + msg396048

stage: resolved
2021-06-18 10:57:13iritkatrielsetnosy: + iritkatriel

messages: + msg396044
versions: + Python 3.11, - Python 2.7
2013-11-29 16:10:42scodersetmessages: + msg204737
2013-11-29 15:32:28pitrousetnosy: + loewis
2013-11-29 15:31:08serhiy.storchakasetnosy: + scoder, christian.heimes, eli.bendersky
2013-11-29 13:33:09tkuhncreate