classification
Title: minidom xmlns not handling spaces in xmlns attribute value field
Type: behavior Stage: resolved
Components: XML Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: amathew, hfischer, mstepniowski, python-dev, r.david.murray, terry.reedy
Priority: normal Keywords: patch

Created on 2011-05-30 21:06 by hfischer, last changed 2014-04-20 04:49 by r.david.murray. This issue is now closed.

Files
File name Uploaded Description Edit
test.xml hfischer, 2011-05-30 21:06
minidom_space_char_in_namespace.patch amathew, 2013-04-13 18:22 review
minidom_space_char_in_namespace_with_test.patch mstepniowski, 2014-04-14 15:39 review
minidom_space_char_in_namespace_unsupported.patch mstepniowski, 2014-04-15 09:00 review
Messages (10)
msg137329 - (view) Author: Herm Fischer (hfischer) Date: 2011-05-30 21:06
Minidom raises an exception if there's a space anywhere in the URI of an xmlns, but it is legal (but terrible practice) to have spaces in URIs.  I think this should work or politely raise a syntax error.  E.g., this fails:  xmlns:abc="http:abc.com/de f g/hi/j k".

The attachment xml file from an end user has this xmlns:

  xmlns:verrels=" http://xbrl.org/2010/versioning-relationship-sets"

which causes minidom to raise a ValueError exception, instead of a sensible syntax error message.  

The relevant python code is expabuilder.py, method _parse_ns_name, which does not have an elif for len(parts) != 2 (to raise a syntax error which identifies the bad construct).
msg137609 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-06-03 22:58
SyntaxErrors refer to Python syntax errors; they are raised during parsing of *Python* code. An error in the value given to a Python sensibly raises a ValueError unless a module does something more specific.

From the xml.dom doc
"DOM Level 2 recommendation defines a single exception, DOMException"
One subclass is "exception xml.dom.SyntaxErr -- Raised when an invalid or illegal string is specified." which would be appropriate here.

However, "The xml.dom.minidom module is essentially a DOM 1.0-compatible DOM with some DOM 2 features (primarily namespace features)." In particular, "DOMException is currently not supported in xml.dom.minidom. Instead, xml.dom.minidom uses standard Python exceptions such as TypeError and AttributeError." or ValueError.

An improved error report could go into 2.7/3.2.
A change in minidom spec to use DOMException would be a feature request for 3.3 or later (and a bigger project -- code welcome). For the moment, I am assuming that you are requesting the former.

A Python exception is not a crash. A crash is a Segmentation Fault (*nix) or 'Your program stopped unexpectedly' (Windows)
msg186780 - (view) Author: (amathew) * Date: 2013-04-13 18:22
I added a more descriptive error message for invalid namespaces. I agree that it would be great to eventually move to DOMException's.
msg187979 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-04-28 13:11
Thanks for the patch.  It would be nice to have a test before we commit this.  The tests should use assertRaisesRegex to look for something specific to this error...probably the word 'syntax'...in the error text.

On the other hand, if the spaces are technically legal, is calling it a syntax error appropriate?  Perhaps the message should instead say something like "spaces in URIs is not supported"?
msg187992 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-04-28 16:07
'unsupported syntax' would be more accurate, but I agree that saying what it is that is unsupported is even better.
msg216099 - (view) Author: Marek Stepniowski (mstepniowski) * Date: 2014-04-14 15:39
Added test to amathew's patch.
msg216221 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-04-14 21:13
Thanks.  Could you also change 'Invalid syntax' to 'Unsupported syntax', per the last bit of the discussion between Terry and I?
msg216282 - (view) Author: Marek Stepniowski (mstepniowski) * Date: 2014-04-15 09:00
I agree that "Unsupported syntax" is a more accurate message. Changed in the newest patch.
msg216897 - (view) Author: Roundup Robot (python-dev) Date: 2014-04-20 04:48
New changeset 13c1c5e3d2ee by R David Murray in branch '3.4':
#12220: improve minidom error when URI contains spaces.
http://hg.python.org/cpython/rev/13c1c5e3d2ee

New changeset 3e67d923a0df by R David Murray in branch 'default':
Merge: #12220: improve minidom error when URI contains spaces.
http://hg.python.org/cpython/rev/3e67d923a0df
msg216898 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-04-20 04:49
Thanks, amathew and Marek.
History
Date User Action Args
2014-04-20 04:49:39r.david.murraysetstatus: open -> closed
resolution: fixed
messages: + msg216898

stage: needs patch -> resolved
2014-04-20 04:48:40python-devsetnosy: + python-dev
messages: + msg216897
2014-04-15 09:00:27mstepniowskisetfiles: + minidom_space_char_in_namespace_unsupported.patch

messages: + msg216282
2014-04-14 21:13:10r.david.murraysetmessages: + msg216221
2014-04-14 15:39:57mstepniowskisetfiles: + minidom_space_char_in_namespace_with_test.patch
nosy: + mstepniowski
messages: + msg216099

2013-04-28 16:07:21terry.reedysetmessages: + msg187992
2013-04-28 13:11:55r.david.murraysetnosy: + r.david.murray

messages: + msg187979
versions: + Python 3.4, - Python 3.2
2013-04-13 18:22:01amathewsetfiles: + minidom_space_char_in_namespace.patch

nosy: + amathew
messages: + msg186780

keywords: + patch
2011-06-03 22:58:04terry.reedysetnosy: + terry.reedy
messages: + msg137609
2011-05-30 23:42:07ned.deilylinkissue11612 superseder
2011-05-30 23:41:53ned.deilysettype: crash -> behavior
stage: needs patch
2011-05-30 21:06:13hfischercreate