Issue39875
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2020-03-06 11:45 by henrik242, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Messages (10) | |||
---|---|---|---|
msg363502 - (view) | Author: (henrik242) | Date: 2020-03-06 11:45 | |
curl correctly posts data to Solr: $ curl -v 'http://solr.example.no:12699/solr/my_coll/update?commit=true' \ --data '<add><doc><field name="key">KEY__9927.1</field><field name="value">\ {"result":0,"jobId":"9459695","jobNumber":"9927.1"}</field></doc></add>' The solr query log says: [20200306T111354,131] [my_coll_shard1_replica_n85] webapp=/solr path=/update params={commit=true} status=0 QTime=96 I'm trying to do the same thing with Python: >>> import urllib.request >>> data='<add><doc><field name="key">KEY__9927.1</field><field name="value">{"result":0,"jobId":"9459695","jobNumber":"9927.1"}</field></doc></add>' >>> url='http://solr.example.no:12699/solr/my_coll/update?commit=true' >>> req = urllib.request.Request(url=url, data=data.encode('utf-8'), method='POST') >>> res = urllib.request.urlopen(req) But now the solr query log shows that the POST data has been added to the query param string: [20200306T112358,780] [my_coll_shard1_replica_n87] webapp=/solr path=/update params={commit=true&<add><doc><field+name="key">KEY__9927.1</field><field+name%3D"value">{"result":0,"jobId":"9459695","jobNumber":"9927.1"}</field></doc></add>} status=0 QTime=30 What is happening here? $ python3 -VV Python 3.7.6 (default, Dec 30 2019, 19:38:26) [Clang 11.0.0 (clang-1100.0.33.16)] |
|||
msg363503 - (view) | Author: Adrian Petrescu (apetresc) | Date: 2020-03-06 13:14 | |
This is not a bug, you've just misunderstood the urllib API. If you want to pass POST data as a payload, it's the second `data` parameter to `urlopen`: https://bugs.python.org/?@action=confrego&otk=KX9AqsI0JnOLkplIY1AGKXAmDKa38COy |
|||
msg363504 - (view) | Author: Adrian Petrescu (apetresc) | Date: 2020-03-06 13:16 | |
(Oops, that was a bad paste! I meant this link: https://docs.python.org/2/library/urllib.html#urllib.urlopen) |
|||
msg363508 - (view) | Author: (henrik242) | Date: 2020-03-06 14:04 | |
But why can't the payload be in the Request object? From the api docs: class urllib.request.Request(url, data=None, headers={}, origin_req_host=None, unverifiable=False, method=None) data must be an object specifying additional data to send to the server, or None if no such data is needed. Currently HTTP requests are the only ones that use data. The supported object types include bytes, file-like objects, and iterables. https://docs.python.org/3.7/library/urllib.request.html#urllib.request.Request |
|||
msg363510 - (view) | Author: (henrik242) | Date: 2020-03-06 14:05 | |
Further: method should be a string that indicates the HTTP request method that will be used (e.g. 'HEAD'). If provided, its value is stored in the method attribute and is used by get_method(). The default is 'GET' if data is None or 'POST' otherwise. |
|||
msg363511 - (view) | Author: (henrik242) | Date: 2020-03-06 14:15 | |
Also, it seems that urllib.urlopen just creates a similar Request object when given a data paramenter: def open(self, fullurl, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT): # accept a URL or a Request object if isinstance(fullurl, str): req = Request(fullurl, data) else: req = fullurl if data is not None: req.data = data From https://github.com/python/cpython/blob/3.7/Lib/urllib/request.py#L507 via https://github.com/python/cpython/blob/3.7/Lib/urllib/request.py#L222 |
|||
msg363513 - (view) | Author: (henrik242) | Date: 2020-03-06 14:25 | |
The following gives the same failing result too :( >>> import urllib.request >>> data = '<add><doc><field name="key">KEY__9927.1</field><field name="value">{"result":0,"jobId":"9459695","jobNumber":"9927.1"}</field></doc></add>' >>> url = 'http://solr.example.no:12699/solr/my_coll/update?commit=true' >>> res = urllib.request.urlopen(url, data.encode('utf-8')) I guess I'll have to whip out Wireshark and see what's going on. |
|||
msg363515 - (view) | Author: (henrik242) | Date: 2020-03-06 14:44 | |
Here's the wireshark output. It seems that urllib adds a "Connection: close" which curl doesn't. Solr doesn't seem to like that. Curl message: POST /solr/my_coll/update?commit=true HTTP/1.1 Host: solr.example.no:12699 User-Agent: curl/7.64.1 Accept: */* Content-Length: 138 Content-Type: application/x-www-form-urlencoded <add><doc><field name="key">KEY__9927.1</field><field name="value">{"result":0,"jobId":"9459695","jobNumber":"9927.1"}</field></doc></add> Python message: POST /solr/my_coll/update?commit=true HTTP/1.1 Accept-Encoding: identity Content-Type: application/x-www-form-urlencoded Content-Length: 138 Host: solr.example.no:12699 User-Agent: Python-urllib/3.7 Connection: close <add><doc><field name="key">KEY__9927.1</field><field name="value">{"result":0,"jobId":"9459695","jobNumber":"9927.1"}</field></doc></add> |
|||
msg363546 - (view) | Author: (henrik242) | Date: 2020-03-06 20:34 | |
Root cause for this seems to be https://bugs.python.org/issue12849 |
|||
msg363696 - (view) | Author: (henrik242) | Date: 2020-03-09 06:58 | |
Solved! The problem was Solr which it has special handling of POSTed data with the User-Agent starts with 'curl/': https://github.com/apache/lucene-solr/blob/40661489cd590947f513e553a20707d0c82b82e5/solr/core/src/java/org/apache/solr/servlet/SolrRequestParsers.java#L782 In all other cases Solr expects the Content-Type to be text/xml. Setting that with urrlib.request makes the request work fine: >>> req = urllib.request.Request(url, data.encode('utf-8'), headers={'Content-Type': 'text/xml'}) >>> res = urllib.request.urlopen(req) A big thanks to https://stackoverflow.com/a/60586102/13365 for figuring this out |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:27 | admin | set | github: 84056 |
2020-03-09 06:58:17 | henrik242 | set | status: open -> closed resolution: not a bug messages: + msg363696 stage: resolved |
2020-03-06 20:34:35 | henrik242 | set | messages: + msg363546 |
2020-03-06 14:44:41 | henrik242 | set | messages: + msg363515 |
2020-03-06 14:25:57 | henrik242 | set | messages: + msg363513 |
2020-03-06 14:15:27 | henrik242 | set | messages: + msg363511 |
2020-03-06 14:05:44 | henrik242 | set | messages: + msg363510 |
2020-03-06 14:04:23 | henrik242 | set | messages: + msg363508 |
2020-03-06 13:16:04 | apetresc | set | messages: + msg363504 |
2020-03-06 13:14:20 | apetresc | set | nosy:
+ apetresc messages: + msg363503 |
2020-03-06 11:45:28 | henrik242 | create |