msg97972 - (view) |
Author: Steven Hartland (Steven.Hartland) |
Date: 2010-01-17 19:59 |
When using SimpleXMLRPCServer that is used to return data that includes strings that have a \x00 in them this data is returned, which is invalid.
The expected result is that the data should be treated as binary and base64 encoded.
The bug appears to be in the core xmlrpc library which relies on type( value ) to determine the data type. This returns str for a string even if it includes the null char.
|
msg98095 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2010-01-21 02:09 |
Marshaller.dump_string() encodes a byte string in <string>...</string> using the escape() function. A byte string can be encoded in base64 using <base64>...</base64>. It's described in the XML-RPC specification, but I don't know if all XML-RPC implementations do understand this type.
http://www.xmlrpc.com/spec
Should we change the default type to base64, or only fallback to base64 if the byte string cannot be encoded in XML. Test if a byte string can be encoded in XML can be slow, and set default type to base64 may cause compatibility issues :-/
|
msg98096 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2010-01-21 02:15 |
Here is an example of patch using the following test:
all(32 <= ord(byte) <= 127 for byte in value)
I don't know how much slower is the patch, but at least it doesn't raise an "ExpatError: not well-formed (invalid token): ...".
|
msg98097 - (view) |
Author: Steven Hartland (Steven.Hartland) |
Date: 2010-01-21 02:26 |
One thing that springs to mind is how valid is that when applied to utf8 data?
|
msg189782 - (view) |
Author: Mark Lawrence (BreamoreBoy) * |
Date: 2013-05-21 20:32 |
Even if the original patch is valid it will need reworking as xmlrpclib isn't in Python 3, the code is now in xmlrpc/client. It also looks as if dump_string has been renamed dump_unicode.
|
msg189801 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-05-22 09:29 |
I don't really understand the issue. If you want to pass binary data (rather than unicode text), you should use a Binary object as explained in the docs:
http://docs.python.org/2/library/xmlrpclib.html#binary-objects
|
msg189803 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2013-05-22 11:28 |
The original report really includes two parts:
a) when a string containing \0 is marshalled, ill-formed XML is produced
b) the expected behavior is that base64 is used
IMO: While a) is correct, b) is not. Antoine is correct that xmlrpclib.Binary should be used if you want to transmit binary data. Consequently, an Error should be reported if an attempt is made to produce ill-formed XML.
OTOH, ill-formed XML can also be produced when sending a byte string that does not match the encoding declaration. Because of that, I propose to close this by documentating the limitations, rather than changing the code.
|
msg189808 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-05-22 12:11 |
The limitations is already documented:
"""However, it’s the caller’s responsibility to ensure that the string is free of characters that aren’t allowed in XML, such as the control characters with ASCII values between 0 and 31 (except, of course, tab, newline and carriage return); failing to do this will result in an XML-RPC request that isn’t well-formed XML. If you have to pass arbitrary bytes via XML-RPC, use the bytes class or the class:Binary wrapper class described below."""
Here is a patch which forbids creating ill-formed XML.
|
msg189822 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2013-05-22 15:02 |
Serhiy: The patch fixes the OP's concern, but not the extended concern about producing ill-formed XML (at least not for 2.7). If the string contains non-UTF-8 data, yet the XML declaration says UTF-8, it's still ill-formed, and not caught by your patch.
I wonder whether xmlrpclib.Error would be a better exception than ValueError (although ValueError is also plausible); either way, the case should be documented.
|
msg189831 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-05-22 18:07 |
Indeed, 2.7 needs more work. Here is a patch for 2.7.
UnicodeError (which subclasses ValueError) can be raised implicitly here, that is why I think ValueError is a good exception.
I'll be very grateful to you for your help with a documentation.
|
msg189851 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2013-05-23 07:02 |
I'm still skeptical that a new exception should be introduced in 2.7.x, or 3.3 (might this break existing setups?). I suggest to ask the release manager for a decision.
But if this is done, then I propose to add the following text to ServerProxy:
versionchanged (2.7.6): Sending strings with characters that are ill-formed in XML (e.g. \x00) now raises ValueError.
|
msg189919 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-05-24 16:36 |
Updating tests I found some related errors.
XML-RPC doesn't work in general case for non UTF-8 encoding:
>>> import xmlrpclib
>>> xmlrpclib.dumps(('\u20ac',), encoding='iso-8859-1')
'<params>\n<param>\n<value><string>\\u20ac</string></value>\n</param>\n</params>\n'
>>> xmlrpclib.dumps((u'\u20ac',), encoding='iso-8859-1')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/xmlrpclib.py", line 1085, in dumps
data = m.dumps(params)
File "/usr/lib/python2.7/xmlrpclib.py", line 632, in dumps
dump(v, write)
File "/usr/lib/python2.7/xmlrpclib.py", line 654, in __dump
f(self, value, write)
File "/usr/lib/python2.7/xmlrpclib.py", line 700, in dump_unicode
value = value.encode(self.encoding)
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u20ac' in position 0: ordinal not in range(256)
We should use 'xmlcharrefreplace' error handler.
Non-ASCII strings is passed as Unicode strings (this should be documented).
>>> xmlrpclib.loads(xmlrpclib.dumps(('\xe2\x82\xac',)))
((u'\u20ac',), None)
'\r' and '\r\n' are deserialized as '\n'.
>>> xmlrpclib.loads(xmlrpclib.dumps(('\r',)))
(('\n',), None)
>>> xmlrpclib.loads(xmlrpclib.dumps(('\r\n',)))
(('\n',), None)
|
msg407580 - (view) |
Author: Irit Katriel (iritkatriel) * |
Date: 2021-12-03 12:22 |
2.7 is no longer relevant, and it looks like these examples are working now:
>>> xmlrpc.client.dumps(('\u20ac',), encoding='iso-8859-1')
'<params>\n<param>\n<value><string>€</string></value>\n</param>\n</params>\n'
>>> xmlrpc.client.dumps((u'\u20ac',), encoding='iso-8859-1')
'<params>\n<param>\n<value><string>€</string></value>\n</param>\n</params>\n'
There is possibly still a documentation enhancement to make regarding non-ascii strings. This is what I get now with Serhiy's examples:
>>> xmlrpc.client.loads(xmlrpc.client.dumps(('\xe2\x82\xac',)))
(('â\x82¬',), None)
>>> xmlrpc.client.loads(xmlrpc.client.dumps(('\r',)))
(('\n',), None)
>>> xmlrpc.client.loads(xmlrpc.client.dumps(('\r\n',)))
(('\n',), None)
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:56 | admin | set | github: 51976 |
2021-12-03 12:22:16 | iritkatriel | set | nosy:
+ iritkatriel messages:
+ msg407580
|
2018-09-05 11:39:06 | fredrikhl | set | nosy:
+ fredrikhl
|
2017-07-15 09:44:29 | Alex Corcoles | set | versions:
+ Python 3.5, Python 3.6, Python 3.7 |
2017-07-15 09:44:14 | Alex Corcoles | set | nosy:
+ Alex Corcoles
|
2017-07-12 16:34:07 | serhiy.storchaka | link | issue30909 superseder |
2016-01-20 10:20:00 | serhiy.storchaka | link | issue10066 superseder |
2014-02-03 18:27:19 | BreamoreBoy | set | nosy:
- BreamoreBoy
|
2013-05-25 15:48:15 | serhiy.storchaka | set | nosy:
+ effbot
|
2013-05-24 16:44:00 | serhiy.storchaka | set | files:
- xmlrpc_dump_invalid_string-2.7.patch |
2013-05-24 16:43:17 | serhiy.storchaka | set | files:
- xmlrpc_dump_invalid_string.patch |
2013-05-24 16:36:45 | serhiy.storchaka | set | files:
+ xmlrpc_dump_invalid_string-2.7_2.patch
messages:
+ msg189919 |
2013-05-23 07:02:48 | loewis | set | messages:
+ msg189851 |
2013-05-22 18:07:29 | serhiy.storchaka | set | files:
+ xmlrpc_dump_invalid_string-2.7.patch
messages:
+ msg189831 |
2013-05-22 15:02:59 | loewis | set | messages:
+ msg189822 |
2013-05-22 12:11:04 | serhiy.storchaka | set | files:
+ xmlrpc_dump_invalid_string.patch versions:
+ Python 2.7, Python 3.3, Python 3.4, - Python 2.6 nosy:
+ serhiy.storchaka
messages:
+ msg189808
stage: test needed -> patch review |
2013-05-22 11:28:09 | loewis | set | messages:
+ msg189803 |
2013-05-22 09:29:48 | pitrou | set | nosy:
+ pitrou messages:
+ msg189801
|
2013-05-21 20:32:52 | BreamoreBoy | set | nosy:
+ BreamoreBoy messages:
+ msg189782
|
2010-01-21 02:26:22 | Steven.Hartland | set | messages:
+ msg98097 |
2010-01-21 02:15:02 | vstinner | set | files:
+ xmlrpc_byte_string.patch keywords:
+ patch messages:
+ msg98096
|
2010-01-21 02:09:11 | vstinner | set | nosy:
+ vstinner messages:
+ msg98095
|
2010-01-17 20:17:37 | brian.curtin | set | priority: normal nosy:
+ loewis
stage: test needed |
2010-01-17 19:59:27 | Steven.Hartland | create | |