This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: xmlrpc library returns string which contain null ( \x00 )
Type: behavior Stage: patch review
Components: XML Versions: Python 3.7, Python 3.6, Python 3.3, Python 3.4, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Alex Corcoles, Steven.Hartland, effbot, fredrikhl, iritkatriel, loewis, pitrou, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2010-01-17 19:59 by Steven.Hartland, last changed 2022-04-11 14:56 by admin.

Files
File name Uploaded Description Edit
xmlrpc_byte_string.patch vstinner, 2010-01-21 02:15
xmlrpc_dump_invalid_string-2.7_2.patch serhiy.storchaka, 2013-05-24 16:36 Patch for 2.7 review
Messages (13)
msg97972 - (view) Author: Steven Hartland (Steven.Hartland) Date: 2010-01-17 19:59
When using SimpleXMLRPCServer that is used to return data that includes strings that have a \x00 in them this data is returned, which is invalid.

The expected result is that the data should be treated as binary and base64 encoded.

The bug appears to be in the core xmlrpc library which relies on type( value ) to determine the data type. This returns str for a string even if it includes the null char.
msg98095 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-01-21 02:09
Marshaller.dump_string() encodes a byte string in <string>...</string> using the escape() function. A byte string can be encoded in base64 using <base64>...</base64>. It's described in the XML-RPC specification, but I don't know if all XML-RPC implementations do understand this type.
http://www.xmlrpc.com/spec

Should we change the default type to base64, or only fallback to base64 if the byte string cannot be encoded in XML. Test if a byte string can be encoded in XML can be slow, and set default type to base64 may cause compatibility issues :-/
msg98096 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-01-21 02:15
Here is an example of patch using the following test:

   all(32 <= ord(byte) <= 127 for byte in value)

I don't know how much slower is the patch, but at least it doesn't raise an "ExpatError: not well-formed (invalid token): ...".
msg98097 - (view) Author: Steven Hartland (Steven.Hartland) Date: 2010-01-21 02:26
One thing that springs to mind is how valid is that when applied to utf8 data?
msg189782 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2013-05-21 20:32
Even if the original patch is valid it will need reworking as xmlrpclib isn't in Python 3, the code is now in xmlrpc/client.  It also looks as if dump_string has been renamed dump_unicode.
msg189801 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-22 09:29
I don't really understand the issue. If you want to pass binary data (rather than unicode text), you should use a Binary object as explained in the docs:
http://docs.python.org/2/library/xmlrpclib.html#binary-objects
msg189803 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2013-05-22 11:28
The original report really includes two parts:
a) when a string containing \0 is marshalled, ill-formed XML is produced
b) the expected behavior is that base64 is used

IMO: While a) is correct, b) is not. Antoine is correct that xmlrpclib.Binary should be used if you want to transmit binary data. Consequently, an Error should be reported if an attempt is made to produce ill-formed XML.

OTOH, ill-formed XML can also be produced when sending a byte string that does not match the encoding declaration. Because of that, I propose to close this by documentating the limitations, rather than changing the code.
msg189808 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-05-22 12:11
The limitations is already documented:

"""However, it’s the caller’s responsibility to ensure that the string is free of characters that aren’t allowed in XML, such as the control characters with ASCII values between 0 and 31 (except, of course, tab, newline and carriage return); failing to do this will result in an XML-RPC request that isn’t well-formed XML. If you have to pass arbitrary bytes via XML-RPC, use the bytes class or the class:Binary wrapper class described below."""

Here is a patch which forbids creating ill-formed XML.
msg189822 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2013-05-22 15:02
Serhiy: The patch fixes the OP's concern, but not the extended concern about producing ill-formed XML (at least not for 2.7). If the string contains non-UTF-8 data, yet the XML declaration says UTF-8, it's still ill-formed, and not caught by your patch.

I wonder whether xmlrpclib.Error would be a better exception than ValueError (although ValueError is also plausible); either way, the case should be documented.
msg189831 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-05-22 18:07
Indeed, 2.7 needs more work. Here is a patch for 2.7.

UnicodeError (which subclasses ValueError) can be raised implicitly here, that is why I think ValueError is a good exception.

I'll be very grateful to you for your help with a documentation.
msg189851 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2013-05-23 07:02
I'm still skeptical that a new exception should be introduced in 2.7.x, or 3.3 (might this break existing setups?). I suggest to ask the release manager for a decision.

But if this is done, then I propose to add the following text to ServerProxy:

versionchanged (2.7.6): Sending strings with characters that are ill-formed in XML (e.g. \x00) now raises ValueError.
msg189919 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-05-24 16:36
Updating tests I found some related errors.

XML-RPC doesn't work in general case for non UTF-8 encoding:

>>> import xmlrpclib
>>> xmlrpclib.dumps(('\u20ac',), encoding='iso-8859-1')
'<params>\n<param>\n<value><string>\\u20ac</string></value>\n</param>\n</params>\n'
>>> xmlrpclib.dumps((u'\u20ac',), encoding='iso-8859-1')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/xmlrpclib.py", line 1085, in dumps
    data = m.dumps(params)
  File "/usr/lib/python2.7/xmlrpclib.py", line 632, in dumps
    dump(v, write)
  File "/usr/lib/python2.7/xmlrpclib.py", line 654, in __dump
    f(self, value, write)
  File "/usr/lib/python2.7/xmlrpclib.py", line 700, in dump_unicode
    value = value.encode(self.encoding)
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u20ac' in position 0: ordinal not in range(256)

We should use 'xmlcharrefreplace' error handler.

Non-ASCII strings is passed as Unicode strings (this should be documented).

>>> xmlrpclib.loads(xmlrpclib.dumps(('\xe2\x82\xac',)))
((u'\u20ac',), None)

'\r' and '\r\n' are deserialized as '\n'.

>>> xmlrpclib.loads(xmlrpclib.dumps(('\r',)))
(('\n',), None)
>>> xmlrpclib.loads(xmlrpclib.dumps(('\r\n',)))
(('\n',), None)
msg407580 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2021-12-03 12:22
2.7 is no longer relevant, and it looks like these examples are working now:

>>> xmlrpc.client.dumps(('\u20ac',), encoding='iso-8859-1')
'<params>\n<param>\n<value><string>€</string></value>\n</param>\n</params>\n'
>>> xmlrpc.client.dumps((u'\u20ac',), encoding='iso-8859-1')
'<params>\n<param>\n<value><string>€</string></value>\n</param>\n</params>\n'

There is possibly still a documentation enhancement to make regarding non-ascii strings. This is what I get now with Serhiy's examples:

>>> xmlrpc.client.loads(xmlrpc.client.dumps(('\xe2\x82\xac',)))
(('â\x82¬',), None)
>>> xmlrpc.client.loads(xmlrpc.client.dumps(('\r',)))
(('\n',), None)
>>> xmlrpc.client.loads(xmlrpc.client.dumps(('\r\n',)))
(('\n',), None)
History
Date User Action Args
2022-04-11 14:56:56adminsetgithub: 51976
2021-12-03 12:22:16iritkatrielsetnosy: + iritkatriel
messages: + msg407580
2018-09-05 11:39:06fredrikhlsetnosy: + fredrikhl
2017-07-15 09:44:29Alex Corcolessetversions: + Python 3.5, Python 3.6, Python 3.7
2017-07-15 09:44:14Alex Corcolessetnosy: + Alex Corcoles
2017-07-12 16:34:07serhiy.storchakalinkissue30909 superseder
2016-01-20 10:20:00serhiy.storchakalinkissue10066 superseder
2014-02-03 18:27:19BreamoreBoysetnosy: - BreamoreBoy
2013-05-25 15:48:15serhiy.storchakasetnosy: + effbot
2013-05-24 16:44:00serhiy.storchakasetfiles: - xmlrpc_dump_invalid_string-2.7.patch
2013-05-24 16:43:17serhiy.storchakasetfiles: - xmlrpc_dump_invalid_string.patch
2013-05-24 16:36:45serhiy.storchakasetfiles: + xmlrpc_dump_invalid_string-2.7_2.patch

messages: + msg189919
2013-05-23 07:02:48loewissetmessages: + msg189851
2013-05-22 18:07:29serhiy.storchakasetfiles: + xmlrpc_dump_invalid_string-2.7.patch

messages: + msg189831
2013-05-22 15:02:59loewissetmessages: + msg189822
2013-05-22 12:11:04serhiy.storchakasetfiles: + xmlrpc_dump_invalid_string.patch
versions: + Python 2.7, Python 3.3, Python 3.4, - Python 2.6
nosy: + serhiy.storchaka

messages: + msg189808

stage: test needed -> patch review
2013-05-22 11:28:09loewissetmessages: + msg189803
2013-05-22 09:29:48pitrousetnosy: + pitrou
messages: + msg189801
2013-05-21 20:32:52BreamoreBoysetnosy: + BreamoreBoy
messages: + msg189782
2010-01-21 02:26:22Steven.Hartlandsetmessages: + msg98097
2010-01-21 02:15:02vstinnersetfiles: + xmlrpc_byte_string.patch
keywords: + patch
messages: + msg98096
2010-01-21 02:09:11vstinnersetnosy: + vstinner
messages: + msg98095
2010-01-17 20:17:37brian.curtinsetpriority: normal
nosy: + loewis

stage: test needed
2010-01-17 19:59:27Steven.Hartlandcreate