Issue2637
Created on 2008-04-15 15:09 by tlesher, last changed 2008-05-06 22:54 by thomaspinckney3.
| msg65518 (view) |
Author: Tim Lesher (tlesher) |
Date: 2008-04-15 15:09 |
|
The urllib.quote docstring implies that it quotes only characters in RFC
2396's "reserved" set.
However, urllib.quote currently escapes all characters except those in
an "always_safe" list, which consists of alphanumerics and three
punctuation characters, "_.-".
This behavior is contrary to the RFC, which defines "unreserved"
characters as alphanumerics plus "mark" characters, or "-_.!~*'()".
The RFC also says:
Unreserved characters can be escaped without changing the semantics
of the URI, but this should not be done unless the URI is being used
in a context that does not allow the unescaped character to appear.
This seems to imply that "always_safe" should correspond to the RFC's
"unreserved" set of "alphanum" | "mark".
|
| msg66339 (view) |
Author: Tom Pinckney (thomaspinckney3) |
Date: 2008-05-06 22:54 |
|
It also looks like urllib.quote (and quote_plus) do not properly handle
unicode strings. urllib.urlencode() properly converts unicode strings to
utf-8 encoded ascii strings before then calling urllib.quote() on them.
|
|
| Date |
User |
Action |
Args |
| 2008-05-06 22:54:56 | thomaspinckney3 | set | nosy:
+ thomaspinckney3 messages:
+ msg66339 |
| 2008-04-15 15:09:10 | tlesher | create | |
|