classification
Title: urlencode of a None value uses the string 'None'
Type: enhancement Stage: resolved
Components: Extension Modules Versions: Python 3.5
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: Claudiu.Popa, Joshua.Johnston, ezio.melotti, jayaddison, orsenthil, piotr.dobrogost, r.david.murray, taleinat
Priority: normal Keywords: patch

Created on 2013-08-27 19:54 by Joshua.Johnston, last changed 2020-10-18 00:02 by jayaddison. This issue is now closed.

Files
File name Uploaded Description Edit
urllib.patch Claudiu.Popa, 2013-09-15 20:35 review
urllib_null_value.patch Claudiu.Popa, 2013-09-15 21:09 review
test.html Joshua.Johnston, 2013-09-16 14:48
Pull Requests
URL Status Linked Edit
PR 19945 closed jayaddison, 2020-05-05 19:26
PR 19949 closed jayaddison, 2020-05-06 00:15
Messages (31)
msg196314 - (view) Author: Joshua Johnston (Joshua.Johnston) Date: 2013-08-27 19:54
This is strange behavior. When you encode nulls in other languages you don't get the string 'null' you usually get an empy string. Shouldn't str(None) == ''?

If not str(None) == 'None' and the string representation of a None value should not match a known string

>>> from urllib import urlencode
>>> urlencode({'josh': None})
'josh=None'
msg196318 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-08-27 20:47
In Python the str of a None value is indeed 'None', just as the str of a True value is 'True'.  Unless the protocol to which you are encoding supports null values, you shouldn't be using None values in the input to the serialization.  If you want an empty string in the output, use an empty string in the input.

Now, that said, it seems to me that while it is not (apparently) RFC compliant, query strings do have a natural way to support null values: a name without a value.  It would therefore be convenient if urlencode supported null values by turning something like {'josh': None, 'fred': 'abc'} into:

    josh&fred=abc

That is what I would expect it to do, so I'd be in favor of that enhancement.

I note that parse_qs and parse_qsl don't handle this case either, which surprises me since I think such value-less query string parameters are fairly common.
msg196319 - (view) Author: Joshua Johnston (Joshua.Johnston) Date: 2013-08-27 21:00
Hi David,

That is what I would expect it to do as well. I'm relatively new to Python but this is causing all kinds of problems with oauth signing using ims_lti_py as well as my own code using urlencode.
msg197831 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2013-09-15 20:35
Hello. I attached a simple patch for the urlencode case. I'll try to make one for parse_qs(l) as well.
msg197835 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2013-09-15 21:09
Added patch for parse_qsl as well.
msg197839 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2013-09-15 22:36
The patch looks good, but I have doubt in the claim that, this new behavior is actually a right thing to do.

RDM wrote:

Now, that said, it seems to me that while it is not (apparently) RFC compliant, query strings do have a natural way to support null values: a name without a value.  It would therefore be convenient if urlencode supported null values by turning something like {'josh': None, 'fred': 'abc'} into:

    josh&fred=abc


It is correctly recognized that it is not RFC Compliant. A reference to this behavior exhibited by the software may be helpful. The `application/x-www-form-urlencoded` type always looks for key=value kind of query string only. And changes submitted by patch will break for folks who think "a=None" is actually what they expected when the sent {a:None}.

I think, some present instances where this behavior is exhibited will be good to note.
msg197851 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-09-16 01:13
Thank you for working on the patches, Claudiu, but...

The backward compatibility concern is valid.

Furthermore, I did a bunch of googling looking for examples.  I did not turn up any examples of APIs that were documented to use parameters without '='...all the cases I looked at that mentioned "parameter without a value" specified the "xxx=" form.

Given that, and thinking about it further, it appears to me that a parameter with no '=' should be treated the same as one that has an '=' but no value...that is, the same as using an empty string.  It would be *logical* for it to be a "nul" value, but in fact the http RFCs have no concept of a "nul" value, so it is in fact out of place.  

Sorry to have sent people down the wrong path here.

So I'm back to saying that one should not use None in an application to represent no value for a query parameter, but rather the empty string.  The url parsing code should turn a parameter without an '=' into an empty string, I think, but again there are backward compatibility concerns there.  There may be code that unintentionally depends on them being discarded.

So, I *think* we could, and should, make them not ignored, but turned into empty strings, in 3.4, following the first part in Postel's law ("be generous in what you accept").  But following the second half of Postel's law ("be strict in what you generate") we really shouldn't generate parameters without even an '=', given that they are not RFC compliant.
msg197900 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2013-09-16 14:18
No problem, David, working on these patches is just an occasion for me to learn more and be useful at the same time.
msg197904 - (view) Author: Joshua Johnston (Joshua.Johnston) Date: 2013-09-16 14:48
I know that languages like php will treat ?josh= the same as ?josh

Using the attached test form in Google Chrome, you will see the data passed as josh= when empty in both GET and POST requests. This is probably the way to go, key=str(value) if value is not None else ''
msg198560 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2013-09-29 00:23
Hi Joshua, I did not setup a php server to test the html form that you uploaded. We cannot test this in browsers because HTML does not have None as values, it is simple empty strings and that is correct. 
I think that the cost of making this change is higher than the value that it will give and I am inclined to close this bug report as wont fix. If you strongly disagree, please feel free to reopen it (and can possibly discuss at python-dev). Thank you!
msg198633 - (view) Author: Joshua Johnston (Joshua.Johnston) Date: 2013-09-29 19:31
Hi Senthil,
You can open the html file with a browser and inspect the data posting to itself without a web server running. That is how I tested.
msg198634 - (view) Author: Joshua Johnston (Joshua.Johnston) Date: 2013-09-29 19:33
I still believe that since None represents the absence of a value it should not be urlencoded as the string 'None'. I am not sure what they best way to url encode it is, but I know that 'None' is not it.
msg198701 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-09-30 14:28
No, Senthil is correct.

My original liking for this idea came from my mistaken impression that a value without an '=' was different from a value with an '='.  But clearly the practice in the industry (the de facto standard) is that they are the same, and indicate that the value of the parameter is the empty string.

So, urls do *not* have any way of representing a null value (as differentiated from an empty string).  All url values are strings.  This means that the correct value to use for a python value is always its string representation, which in the case of None is "None", just as the correct representation of a Python boolean value is "True" or "False" (since http also does not have any specifically boolean type, only string values).
msg198705 - (view) Author: Joshua Johnston (Joshua.Johnston) Date: 2013-09-30 14:59
I agree with True == 'True' and False == 'False' but None should be empty since it represents the absence of a value akin to null, Nil, etc in other languages. Ultimately it is not my decision make so I can only agree to disagree.

Thanks for having this discussion with me. So far the python community has been a pleasant change from what I am used to!
msg198706 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-09-30 15:12
If we were making this decision de novo, we might decide it that way.  However, the current behavior has been in place for a long time, so backward compatibility concerns raise the bar high enough that the costs of the change outweigh any benefits, even if we got general agreement that there was a benefit :)
msg210488 - (view) Author: Joshua Johnston (Joshua.Johnston) Date: 2014-02-07 16:23
I'm sorry to reopen this but after it biting me quite a few times more I still cannot think of a valid use-case for this behavior that someone would be depending on 'None' being passed.

I think your backwards compatibility concerns are artificial. Can anyone describe a use-case that depended on arg=None being passed in a query string?

I am sure that anyone who is encountering this behavior is treating the string 'None' as None when encountered in a request query string.

Consider this example usage. A website presents a user with a form to search their twitter followers using the twitter api https://api.twitter.com/1.1/friends/ids.json

Form fields optional
screen_name: [________]
(assume more fields)

Handler gets the form post and builds the dict for the search query string.

# User entered nothing so params = {'screen_name': None, ..more fields}
params = {k: self.request.get(k, None) for k in self.request.GET}

url = "https://api.twitter.com/1.1/friends/ids.json?" + urllib.urlencode(params)

print url
"https://api.twitter.com/1.1/friends/ids.json?screen_name=None"

This would cause the twitter search api to look for your friends with None in their screen name. Not exactly what you'd expect right?
msg210492 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-02-07 16:48
What is returning a field dictionary containing None instead of an empty string?
msg210498 - (view) Author: Joshua Johnston (Joshua.Johnston) Date: 2014-02-07 17:25
In this exact example it would be an empty string. It was a fake setup to illustrate a real problem.

This is the important part:

params = dict(screen_name=None,count=300)
url = "https://api.twitter.com/1.1/friends/ids.json?" + urllib.urlencode(params)
print url  # "https://api.twitter.com/1.1/friends/ids.json?screen_name=None&count=300"

screen_name=None is not the behavior you would want.

Another example is in webapp2's uri_for function which uses urlencode internally.
ref: http://stackoverflow.com/questions/7081250/webapp2-jinja2-how-can-i-get-uri-for-working-in-jinja2-views

If you try to use uri_for in your jinja2 template you must jump through hoops like:

<script>
{% if screen_name %}
   var url = '{{ uri_for('something', screen_name=screen_name) }}';
{% else %}
   var url = '{{ uri_for('something') }}';
{% endif %}
</script>
msg210499 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-02-07 17:27
None of those problems exist if you correctly use the empty string to indicate an empty string value instead of trying to use None.
msg210531 - (view) Author: Joshua Johnston (Joshua.Johnston) Date: 2014-02-07 19:21
If this was a function to encode a dict into something then I would see your point and agree. urlencode is specifically designed to work within the domain or URIs. In this domain, it is acceptable to have an empty value for a key in a query string. None is a representation of nothing, empty, null, the absence of a value. Therefore you would expect a function in the domain of URIs to construct a valid URI component when you specifically tell it to use None. Valid is up to you, either ignore the key-value pair completely, or use key[=&] to represent the empty value.

Take Requests as an example that gets it right:

    >>> import requests
    >>> requests.get('http://www.google.com/', params={'key': None}).url
    u'http://www.google.com/'
    >>> requests.get('http://www.google.com/', params={'key': ''}).url
    u'http://www.google.com/?key='
msg210537 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-02-07 19:56
No, the domain of URIs does not have *any* concept of a null value.  It only has the concept of a string being empty or not empty (or the key not existing at all...ie: it doesn't exist in your params dict).

You are trying to map a Python concept (the singleton object None, which is used for a variety of purposes) into a domain that does not have an equivalent concept.

That said, following the requests model of omitting the key if the value is None is something we could consider as a convenience enhancement.  But we would again run into Python's strict backward compatibility policy.  I could imagine some programmer building an internal web service that turns the string 'None' back into a Python None value.  The fact that it would have to be an internal thing would mean we'd never hear about it...until we broke it :)

Whether or not adding this feature would require a new keyword argument to urlencode is a judgment call.  It might be an acceptable change in a feature release.
msg210542 - (view) Author: Joshua Johnston (Joshua.Johnston) Date: 2014-02-07 20:25
While the RFC makes no mention of empty values either way, it has become standard practice to either omit the key-value completely or pass a key (optional = sign) by itself in these situations so I would consider that as standard behavior.

While I stand by my position that the function is broken in regards to None, I do not have the clout to make this change without support from the community so I will leave it at that.

>>>Whether or not adding this feature would require a new keyword argument to urlencode is a judgment call.
>>>It might be an acceptable change in a feature release.
I do believe that it should be changed in a future release and a new keyword argument would suffice except for when the abstraction level is high enough to shield you from being able to specify this new argument.

>>I could imagine some programmer building an internal web service that 
>>turns the string 'None' back into a Python None value.  The fact that it 
>>would have to be an internal thing would mean we'd never hear about 
>>it...until we broke it :)
I was talking to someone about the same thing during lunch today. This whole bug report came about because I was finding 'None' string values in our app engine datastore because of optional parameters with None values being encoded as 'None'.
msg245811 - (view) Author: Piotr Dobrogost (piotr.dobrogost) Date: 2015-06-25 15:10
This problem came out in the bug "Cannot make URL query string with a parameter without a value (https://github.com/kennethreitz/requests/issues/2651) raised in urllib3 project (which itself is being used by Requests library mentioned in this bug).
msg245827 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-06-25 20:28
I see no reason for this issue to be open.  As suggested on the linked bug, the value should be passed as an empty string (which will produce, eg "&foo=&a='1'", which is the "correct" format for such a parameter.  Unless someone can point to a "real" web server that does something different with "&foo" than with "&foo=", there is no reason to make a change to Python.
msg368193 - (view) Author: James Addison (jayaddison) * Date: 2020-05-05 18:27
Chiming in here to add that I'd appreciate the ability to render 'standalone' (i.e. no '=') query-string keys in order to distinguish between absence-of-value and empty-string situations.

The backwards-compatibility concerns in here are genuine, so perhaps this could be introduced as an argument to urlencode with a disabled default value, allowing developers to opt-in.

>> Unless someone can point to a "real" web server that does something different with "&foo" than with "&foo=", there is no reason to make a change to Python.

There's a popular nodejs library that makes this serialization distinction explicit: https://github.com/sindresorhus/query-string#falsy-values

I've developed a Python 3.7-based set of commits[1] to address this issue.  I haven't yet opened this as a pull request since I see that Python 3.7 is in maintenance/bugfix mode[2].

In case a new urlencode flag would fall under the category of feature, I'll aim to develop a subsequent set of commits against the master development branch soon.

[1] - https://github.com/jayaddison/cpython/compare/3.7..9555467

[2] - https://devguide.python.org/#status-of-python-branches
msg368301 - (view) Author: James Addison (jayaddison) * Date: 2020-05-06 21:47
The pair of pull requests below implement None-preserving urlencode and parse_qs* via a default-disabled flag 'standalone_keys'.

- https://bugs.python.org/pull_request19259
- https://bugs.python.org/pull_request19264

(they're also already linked with this issue, thanks to the neat GitHub/BPO integration)

A benefit of the proposed serialization changes is that developers can opt-in to a scheme in which "{'a': None}" and "{'a': ''}" do not collide to the same encoded representation.

Would it be possible to re-open this issue for discussion?
msg368727 - (view) Author: James Addison (jayaddison) * Date: 2020-05-12 12:09
NB: There appears to be some relevant discussion in https://github.com/whatwg/url/issues/469
msg378826 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2020-10-17 19:51
Thanks for the PRs, James!

I've closed the PRs for now, to avoid having people spend time reviewing them while this issue is closed as "wontfix". That said, if further discussion changes that decision, the PRs could be reopened.
msg378838 - (view) Author: James Addison (jayaddison) * Date: 2020-10-17 22:30
No problem, and thanks for the heads-up Tal!  I'll raise this as a topic on python-dev if it still seems worth pursuing, after collecting some more thoughts about it.
msg378839 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2020-10-17 22:34
Hi James, I will give another look at it tonight or latest by Sunday PST ,
since I was involved in this and PRs. If we can make a decision within this
context, great, otherwise we can open it up to python-dev

On Sat, Oct 17, 2020 at 3:30 PM James Addison <report@bugs.python.org>
wrote:

>
> James Addison <jay+bpo@jp-hosting.net> added the comment:
>
> No problem, and thanks for the heads-up Tal!  I'll raise this as a topic
> on python-dev if it still seems worth pursuing, after collecting some more
> thoughts about it.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue18857>
> _______________________________________
>
msg378841 - (view) Author: James Addison (jayaddison) * Date: 2020-10-18 00:02
Thanks Senthil; please take your time.  This isn't urgent, and would likely benefit from further standardization of the URL query string and/or form-encoded data formats (outside the Python project) to achieve consensus.

A fully-considered answer at a later date would probably sit more comfortably with me than one that has any sense of time pressure.
History
Date User Action Args
2020-10-18 00:02:43jayaddisonsetmessages: + msg378841
2020-10-17 22:34:35orsenthilsetmessages: + msg378839
2020-10-17 22:30:19jayaddisonsetmessages: + msg378838
2020-10-17 19:51:08taleinatsetnosy: + taleinat
messages: + msg378826
2020-05-12 12:09:27jayaddisonsetmessages: + msg368727
2020-05-06 21:47:36jayaddisonsetmessages: + msg368301
2020-05-06 00:15:05jayaddisonsetpull_requests: + pull_request19264
2020-05-05 19:26:02jayaddisonsetpull_requests: + pull_request19259
2020-05-05 18:27:54jayaddisonsetnosy: + jayaddison
messages: + msg368193
2015-06-25 20:28:44r.david.murraysetstatus: open -> closed

messages: + msg245827
2015-06-25 15:10:49piotr.dobrogostsetnosy: + piotr.dobrogost
messages: + msg245811
2014-02-07 20:25:27Joshua.Johnstonsetmessages: + msg210542
2014-02-07 19:56:55r.david.murraysetmessages: + msg210537
versions: + Python 3.5, - Python 3.4
2014-02-07 19:21:06Joshua.Johnstonsetmessages: + msg210531
2014-02-07 17:27:34r.david.murraysetmessages: + msg210499
2014-02-07 17:25:39Joshua.Johnstonsetmessages: + msg210498
2014-02-07 16:48:27r.david.murraysetmessages: + msg210492
2014-02-07 16:23:26Joshua.Johnstonsetstatus: closed -> open

messages: + msg210488
2013-09-30 15:12:13r.david.murraysetmessages: + msg198706
2013-09-30 14:59:16Joshua.Johnstonsetmessages: + msg198705
2013-09-30 14:28:56r.david.murraysetstatus: open -> closed

messages: + msg198701
2013-09-29 19:33:50Joshua.Johnstonsetstatus: closed -> open

messages: + msg198634
2013-09-29 19:31:40Joshua.Johnstonsetmessages: + msg198633
2013-09-29 00:23:54orsenthilsetstatus: open -> closed
resolution: wont fix
messages: + msg198560

stage: needs patch -> resolved
2013-09-16 14:48:56Joshua.Johnstonsetfiles: + test.html

messages: + msg197904
2013-09-16 14:18:44Claudiu.Popasetmessages: + msg197900
2013-09-16 01:13:35r.david.murraysetmessages: + msg197851
2013-09-15 22:36:50orsenthilsetmessages: + msg197839
2013-09-15 21:09:50Claudiu.Popasetfiles: + urllib_null_value.patch

messages: + msg197835
2013-09-15 20:35:53Claudiu.Popasetfiles: + urllib.patch

nosy: + Claudiu.Popa
messages: + msg197831

keywords: + patch
2013-09-04 19:09:56ezio.melottisetnosy: + ezio.melotti
2013-08-27 21:00:54Joshua.Johnstonsetmessages: + msg196319
2013-08-27 20:47:25r.david.murraysetversions: + Python 3.4, - Python 2.7
nosy: + r.david.murray

messages: + msg196318

type: behavior -> enhancement
stage: needs patch
2013-08-27 20:18:05pitrousetnosy: + orsenthil
2013-08-27 19:54:15Joshua.Johnstoncreate