msg207237 - (view) |
Author: Ruben D. Orduz (ruben.orduz) |
Date: 2014-01-03 19:39 |
Currently urlparse.parse_qs (http://hg.python.org/cpython/file/2.7/Lib/urlparse.py#l150) assumes and uses ';' as a query string separator with no way to overwrite that. There are several web service APIs out there that use ';' as list separator (e.g. [URL]?fruits=lemon;lime&family=citrus). Although ';' seems like a sensible choice for a default, there should be a way to overwrite it.
|
msg207241 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2014-01-03 20:01 |
As an enhancement, this could only go into 3.5.
|
msg207242 - (view) |
Author: Ruben D. Orduz (ruben.orduz) |
Date: 2014-01-03 20:08 |
So, are you suggesting I should change to a different type if desired for 2.7.x or leave for release to 3.5 and then submit a patch to backport it to 2.7.x? I apologize, not sure how the workflow works in these cases. Thanks.
|
msg207243 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2014-01-03 20:23 |
I'm saying that this is a change that can be made only in 3.5. if you want to submit a patch here for 2.7 for other people to use that's fine, but it won't get applied.
|
msg207244 - (view) |
Author: Ruben D. Orduz (ruben.orduz) |
Date: 2014-01-03 20:24 |
Ah, gotcha. I think I will leave as is then. Thanks for clarifying.
|
msg207261 - (view) |
Author: Senthil Kumaran (orsenthil) * |
Date: 2014-01-04 00:27 |
If you could point to RFC which states the list of characters which can be used as valid query string separators, we can include that list. (Of course in 3.5)
|
msg207262 - (view) |
Author: Ruben D. Orduz (ruben.orduz) |
Date: 2014-01-04 00:30 |
Senthil,
The RFC can be found here: http://tools.ietf.org/html/rfc3986#section-2.2
|
msg263491 - (view) |
Author: Luiz Poleto (luiz.poleto) * |
Date: 2016-04-15 13:09 |
If this bug is to be moved forward, we should consider this:
The RFC 3986 defines that a query can have any of these characters:
/?:@-._~!$&'()*+,;= ALPHA DIGIT %HH (encoded octet)
But does not define how the data should be interpreted, leaving that to the naming authority and the URI schema (although http/https doesn't specify it as well; see RFC 7230).
OTOH, parse_qs (both on 2.x and 3.x) is very specific that the query string is of type application/x-www-form-urlencoded; which defines that the name is separated from the value by '=' and name/value pairs are separated from each other by '&', although the use of ';' to separate the pairs is only suggested to be supported by HTTP server implementors.
It could be that adding support to the characters specified by RFC 3986 pose as a challenge since there is no fixed schema and they can be freely used by the naming authority so perhaps we could add a parameter to enable/disable ';' as a pair separator?
|
msg263798 - (view) |
Author: Senthil Kumaran (orsenthil) * |
Date: 2016-04-20 05:52 |
Luiz,
The original question was about introducing a parameter to override query string separate ';'.
If we do with enable or disable, then we should provide another option for query string separator.
The OP provided one example of query string which had & as a separator along with ';'. I wonder how the parsing of that should be.
The pointer to the RFC makes me think that is alright to provide an option to 'override' the default separator instead of providing an enable/disable.
I would like to hear opposite thoughts on this.
|
msg263842 - (view) |
Author: Luiz Poleto (luiz.poleto) * |
Date: 2016-04-20 14:08 |
Based on the example provided by the OP, it appears that he would expect
the output to be:
{'family': ['citrus'], 'fruits': ['lemon;lime']}
Since the W3C recommendation for the application/x-www-form-urlencoded type
specify using '&' to separate the parameters in the query string (';' is
not mentioned there), I recommended a parameter for disabling the use of
';' as a separator (but '&' will still be the separator to be used).
The only thing I see against using the RFC is that although it specifies
which characters are valid in a query string, it does not define how they
should be used; that is done by W3C's application/x-www-form-urlencoded and
it is very specific about using '&' as a separator.
|
msg263843 - (view) |
Author: Ruben D. Orduz (ruben.orduz) |
Date: 2016-04-20 14:16 |
Hi all,
OP here. My intent was to optionally pass a separator parameter, _not_ enable/disable toggle.
|
msg335768 - (view) |
Author: Kobi Gana (Kobi Gana) |
Date: 2019-02-17 10:29 |
Hi all,
Please take the next case:
The url - http://hostname.domain/mypage.asp?fields=id&query=%22((release%3D{id%3D1004});(sprint%3D{id%3D1040});(team%3D{id%3D1004});(severity%3D{id%3D%27list_node.severity.urgent%27});!phase%3D{id+IN+%27phase.defect.closed%27,%27phase.defect.duplicate%27,%27phase.defect.rejected%27})%22
The Query as string - fields=id&query=%22((release%3D{id%3D1004});(sprint%3D{id%3D1040});(team%3D{id%3D1004});(severity%3D{id%3D%27list_node.severity.urgent%27});!phase%3D{id+IN+%27phase.defect.closed%27,%27phase.defect.duplicate%27,%27phase.defect.rejected%27})%22
The expected pairs -
1. fields=id
2. query=%22((release%3D{id%3D1004});(sprint%3D{id%3D1040});(team%3D{id%3D1004});(severity%3D{id%3D%27list_node.severity.urgent%27});!phase%3D{id+IN+%27phase.defect.closed%27,%27phase.defect.duplicate%27,%27phase.defect.rejected%27})%22
The actual output -
1. ('fields', 'id')
2. ('query', '"((release={id=1004})')
3. ('(sprint={id=1040})', '')
4. ('(team={id=1004})', '')
5. ("(severity={id='list_node.severity.urgent'})", '')
6. ('!phase={id IN \'phase.defect.closed\',\'phase.defect.duplicate\',\'phase.defect.rejected\'})"', '')
|
msg335782 - (view) |
Author: kc (kc) * |
Date: 2019-02-17 17:44 |
W3C allows both constructs, ampersand and semicolon.
https://www.w3.org/TR/html401/appendix/notes.html#h-B.2.2
Especially servlet containers and servers running CGI programs often use semicolons as a separator.
I would say to parse either ampersands OR semicolons and keep a priority to ampersands.
For example the query strings:
?fields=id&query=%22((release%3D{id%3D1004});(sprint%3D{id%3D1040});(team%3D{id%3D1004});(severity%3D{id%3D%27list_node.severity.urgent%27});!phase%3D{id+IN+%27phase.defect.closed%27,%27phase.defect.duplicate%27,%27phase.defect.rejected%27})%22
?fruits=lemon;lime&family=citrus
should be parsed with & separators only.
The modified example without & character:
?fruits=lemon;family=citrus
can be parsed with semicolon as a separator because it contains both '=' and ';' but no '&' characters.
|
msg335801 - (view) |
Author: Kobi Gana (Kobi Gana) |
Date: 2019-02-18 09:01 |
We are on the same page and we should also consider marked this as defect.
Thanks
On Sun, Feb 17, 2019 at 7:44 PM nr <report@bugs.python.org> wrote:
>
> nr <aktiophi@googlemail.com> added the comment:
>
> W3C allows both constructs, ampersand and semicolon.
> https://www.w3.org/TR/html401/appendix/notes.html#h-B.2.2
>
> Especially servlet containers and servers running CGI programs often use
> semicolons as a separator.
>
> I would say to parse either ampersands OR semicolons and keep a priority
> to ampersands.
>
> For example the query strings:
>
>
> ?fields=id&query=%22((release%3D{id%3D1004});(sprint%3D{id%3D1040});(team%3D{id%3D1004});(severity%3D{id%3D%27list_node.severity.urgent%27});!phase%3D{id+IN+%27phase.defect.closed%27,%27phase.defect.duplicate%27,%27phase.defect.rejected%27})%22
>
> ?fruits=lemon;lime&family=citrus
>
> should be parsed with & separators only.
>
> The modified example without & character:
> ?fruits=lemon;family=citrus
>
> can be parsed with semicolon as a separator because it contains both '='
> and ';' but no '&' characters.
>
> ----------
> nosy: +nr
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue20116>
> _______________________________________
>
|
msg397208 - (view) |
Author: Jacob Walls (jacobtylerwalls) * |
Date: 2021-07-09 19:06 |
Greetings. I believe this is mooted by #42967 as well as changes even prior to that.
https://bugs.python.org/issue42967
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:56 | admin | set | github: 64315 |
2021-07-09 19:06:06 | jacobtylerwalls | set | nosy:
+ jacobtylerwalls messages:
+ msg397208
|
2019-02-18 09:01:52 | Kobi Gana | set | messages:
+ msg335801 |
2019-02-17 17:44:38 | kc | set | nosy:
+ kc messages:
+ msg335782
|
2019-02-17 10:29:44 | Kobi Gana | set | files:
+ parse_querystring.py nosy:
+ Kobi Gana messages:
+ msg335768
|
2016-04-20 14:16:35 | ruben.orduz | set | messages:
+ msg263843 |
2016-04-20 14:08:42 | luiz.poleto | set | messages:
+ msg263842 |
2016-04-20 05:52:22 | orsenthil | set | messages:
+ msg263798 |
2016-04-15 13:09:52 | luiz.poleto | set | nosy:
+ luiz.poleto messages:
+ msg263491
|
2014-01-04 00:30:48 | ruben.orduz | set | messages:
+ msg207262 |
2014-01-04 00:27:23 | orsenthil | set | nosy:
+ orsenthil messages:
+ msg207261
|
2014-01-03 20:24:30 | ruben.orduz | set | messages:
+ msg207244 |
2014-01-03 20:23:05 | r.david.murray | set | messages:
+ msg207243 |
2014-01-03 20:08:41 | ruben.orduz | set | messages:
+ msg207242 |
2014-01-03 20:01:06 | r.david.murray | set | nosy:
+ r.david.murray
messages:
+ msg207241 versions:
+ Python 3.5, - Python 2.7 |
2014-01-03 19:39:19 | ruben.orduz | create | |