This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urlparse.parse_qs should take argument for query separator
Type: enhancement Stage:
Components: Versions: Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Kobi Gana, jacobtylerwalls, kc, luiz.poleto, orsenthil, r.david.murray, ruben.orduz
Priority: normal Keywords:

Created on 2014-01-03 19:39 by ruben.orduz, last changed 2022-04-11 14:57 by admin.

Files
File name Uploaded Description Edit
parse_querystring.py Kobi Gana, 2019-02-17 10:29 example of output
Messages (15)
msg207237 - (view) Author: Ruben D. Orduz (ruben.orduz) Date: 2014-01-03 19:39
Currently urlparse.parse_qs (http://hg.python.org/cpython/file/2.7/Lib/urlparse.py#l150) assumes and uses ';' as a query string separator with no way to overwrite that. There are several web service APIs out there that use ';' as list separator (e.g. [URL]?fruits=lemon;lime&family=citrus). Although ';' seems like a sensible choice for a default, there should be a way to overwrite it.
msg207241 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-01-03 20:01
As an enhancement, this could only go into 3.5.
msg207242 - (view) Author: Ruben D. Orduz (ruben.orduz) Date: 2014-01-03 20:08
So, are you suggesting I should change to a different type if desired for 2.7.x or leave for release to 3.5 and then submit a patch to backport it to 2.7.x? I apologize, not sure how the workflow works in these cases. Thanks.
msg207243 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-01-03 20:23
I'm saying that this is a change that can be made only in 3.5.  if you want to submit a patch here for 2.7 for other people to use that's fine, but it won't get applied.
msg207244 - (view) Author: Ruben D. Orduz (ruben.orduz) Date: 2014-01-03 20:24
Ah, gotcha. I think I will leave as is then. Thanks for clarifying.
msg207261 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2014-01-04 00:27
If you could point to RFC which states the list of characters which can be used as valid query string separators, we can include that list. (Of course in 3.5)
msg207262 - (view) Author: Ruben D. Orduz (ruben.orduz) Date: 2014-01-04 00:30
Senthil,

The RFC can be found here: http://tools.ietf.org/html/rfc3986#section-2.2
msg263491 - (view) Author: Luiz Poleto (luiz.poleto) * Date: 2016-04-15 13:09
If this bug is to be moved forward, we should consider this:

The RFC 3986 defines that a query can have any of these characters:
/?:@-._~!$&'()*+,;= ALPHA DIGIT %HH (encoded octet)

But does not define how the data should be interpreted, leaving that to the naming authority and the URI schema (although http/https doesn't specify it as well; see RFC 7230).

OTOH, parse_qs (both on 2.x and 3.x) is very specific that the query string is of type application/x-www-form-urlencoded; which defines that the name is separated from the value by '=' and name/value pairs are separated from each other by '&', although the use of ';' to separate the pairs is only suggested to be supported by HTTP server implementors.

It could be that adding support to the characters specified by RFC 3986 pose as a challenge since there is no fixed schema and they can be freely used by the naming authority so perhaps we could add a parameter to enable/disable ';' as a pair separator?
msg263798 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2016-04-20 05:52
Luiz,

The original question was about introducing a parameter to override query string separate ';'.

If we do with enable or disable, then we should provide another option for query string separator.

The OP provided one example of query string which had & as a separator along with ';'. I wonder how the parsing of that should be.

The pointer to the RFC makes me think that is alright to provide an option to 'override' the default separator instead of providing an enable/disable.  

I would like to hear opposite thoughts on this.
msg263842 - (view) Author: Luiz Poleto (luiz.poleto) * Date: 2016-04-20 14:08
Based on the example provided by the OP, it appears that he would expect
the output to be:
{'family': ['citrus'], 'fruits': ['lemon;lime']}

Since the W3C recommendation for the application/x-www-form-urlencoded type
specify using '&' to separate the parameters in the query string (';' is
not mentioned there), I recommended a parameter for disabling the use of
';' as a separator (but '&' will still be the separator to be used).

The only thing I see against using the RFC is that although it specifies
which characters are valid in a query string, it does not define how they
should be used; that is done by W3C's application/x-www-form-urlencoded and
it is very specific about using '&' as a separator.
msg263843 - (view) Author: Ruben D. Orduz (ruben.orduz) Date: 2016-04-20 14:16
Hi all,

OP here. My intent was to optionally pass a separator parameter, _not_ enable/disable toggle.
msg335768 - (view) Author: Kobi Gana (Kobi Gana) Date: 2019-02-17 10:29
Hi all,

Please take the next case:
The url - http://hostname.domain/mypage.asp?fields=id&query=%22((release%3D{id%3D1004});(sprint%3D{id%3D1040});(team%3D{id%3D1004});(severity%3D{id%3D%27list_node.severity.urgent%27});!phase%3D{id+IN+%27phase.defect.closed%27,%27phase.defect.duplicate%27,%27phase.defect.rejected%27})%22

The Query as string - fields=id&query=%22((release%3D{id%3D1004});(sprint%3D{id%3D1040});(team%3D{id%3D1004});(severity%3D{id%3D%27list_node.severity.urgent%27});!phase%3D{id+IN+%27phase.defect.closed%27,%27phase.defect.duplicate%27,%27phase.defect.rejected%27})%22

The expected pairs - 
1. fields=id
2. query=%22((release%3D{id%3D1004});(sprint%3D{id%3D1040});(team%3D{id%3D1004});(severity%3D{id%3D%27list_node.severity.urgent%27});!phase%3D{id+IN+%27phase.defect.closed%27,%27phase.defect.duplicate%27,%27phase.defect.rejected%27})%22

The actual output -
1. ('fields', 'id')
2. ('query', '"((release={id=1004})')
3. ('(sprint={id=1040})', '')
4. ('(team={id=1004})', '')
5. ("(severity={id='list_node.severity.urgent'})", '')
6. ('!phase={id IN \'phase.defect.closed\',\'phase.defect.duplicate\',\'phase.defect.rejected\'})"', '')
msg335782 - (view) Author: kc (kc) * Date: 2019-02-17 17:44
W3C allows both constructs, ampersand and semicolon.
https://www.w3.org/TR/html401/appendix/notes.html#h-B.2.2

Especially servlet containers and servers running CGI programs often use semicolons as a separator.

I would say to parse either ampersands OR semicolons and keep a priority to ampersands.

For example the query strings:

?fields=id&query=%22((release%3D{id%3D1004});(sprint%3D{id%3D1040});(team%3D{id%3D1004});(severity%3D{id%3D%27list_node.severity.urgent%27});!phase%3D{id+IN+%27phase.defect.closed%27,%27phase.defect.duplicate%27,%27phase.defect.rejected%27})%22

?fruits=lemon;lime&family=citrus

should be parsed with & separators only.

The modified example without & character:
?fruits=lemon;family=citrus

can be parsed with semicolon as a separator because it contains both '=' and ';' but no '&' characters.
msg335801 - (view) Author: Kobi Gana (Kobi Gana) Date: 2019-02-18 09:01
We are on the same page and we should also consider marked this as defect.

Thanks

On Sun, Feb 17, 2019 at 7:44 PM nr <report@bugs.python.org> wrote:

>
> nr <aktiophi@googlemail.com> added the comment:
>
> W3C allows both constructs, ampersand and semicolon.
> https://www.w3.org/TR/html401/appendix/notes.html#h-B.2.2
>
> Especially servlet containers and servers running CGI programs often use
> semicolons as a separator.
>
> I would say to parse either ampersands OR semicolons and keep a priority
> to ampersands.
>
> For example the query strings:
>
>
> ?fields=id&query=%22((release%3D{id%3D1004});(sprint%3D{id%3D1040});(team%3D{id%3D1004});(severity%3D{id%3D%27list_node.severity.urgent%27});!phase%3D{id+IN+%27phase.defect.closed%27,%27phase.defect.duplicate%27,%27phase.defect.rejected%27})%22
>
> ?fruits=lemon;lime&family=citrus
>
> should be parsed with & separators only.
>
> The modified example without & character:
> ?fruits=lemon;family=citrus
>
> can be parsed with semicolon as a separator because it contains both '='
> and ';' but no '&' characters.
>
> ----------
> nosy: +nr
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue20116>
> _______________________________________
>
msg397208 - (view) Author: Jacob Walls (jacobtylerwalls) * Date: 2021-07-09 19:06
Greetings. I believe this is mooted by #42967 as well as changes even prior to that.

https://bugs.python.org/issue42967
History
Date User Action Args
2022-04-11 14:57:56adminsetgithub: 64315
2021-07-09 19:06:06jacobtylerwallssetnosy: + jacobtylerwalls
messages: + msg397208
2019-02-18 09:01:52Kobi Ganasetmessages: + msg335801
2019-02-17 17:44:38kcsetnosy: + kc
messages: + msg335782
2019-02-17 10:29:44Kobi Ganasetfiles: + parse_querystring.py
nosy: + Kobi Gana
messages: + msg335768

2016-04-20 14:16:35ruben.orduzsetmessages: + msg263843
2016-04-20 14:08:42luiz.poletosetmessages: + msg263842
2016-04-20 05:52:22orsenthilsetmessages: + msg263798
2016-04-15 13:09:52luiz.poletosetnosy: + luiz.poleto
messages: + msg263491
2014-01-04 00:30:48ruben.orduzsetmessages: + msg207262
2014-01-04 00:27:23orsenthilsetnosy: + orsenthil
messages: + msg207261
2014-01-03 20:24:30ruben.orduzsetmessages: + msg207244
2014-01-03 20:23:05r.david.murraysetmessages: + msg207243
2014-01-03 20:08:41ruben.orduzsetmessages: + msg207242
2014-01-03 20:01:06r.david.murraysetnosy: + r.david.murray

messages: + msg207241
versions: + Python 3.5, - Python 2.7
2014-01-03 19:39:19ruben.orduzcreate