This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author mgiuca
Recipients gvanrossum, janssen, jimjjewett, loewis, mgiuca, orsenthil, pitrou, thomaspinckney3
Date 2008-08-07.14:59:55
SpamBayes Score 8.74384e-12
Marked as misclassified No
Message-id <>
Following Guido and Antoine's reviews, I've written a new patch which
fixes *most* of the issues raised. The ones I didn't fix I have noted
below, and commented on the review site
( Note: I intend to address all of
these issues after some discussion.

Outstanding issues raised by the reviews:

Should unquote accept a bytes/bytearray as well as a str?

Should encode_rfc2231 with charset=None accept strings with non-ASCII
characters, and just encode them to UTF-8?

Does RFC 2965 let me get away with changing the test case to expect
UTF-8? (I'm pretty sure it doesn't care what encoding is used).

Should quote raise a TypeError if given a bytes with encoding/errors
arguments? (Motivation: TypeError is what you usually raise if you
supply too many args to a function).

(As discussed above) Should quote accept safe characters outside the
ASCII range (thereby potentially producing invalid URIs)?


Commit log for patch8:

Fix for issue 3300.

urllib.parse.unquote: Added "encoding" and "errors" optional arguments,
allowing the caller to determine the decoding of percent-encoded octets.
As per RFC 3986, default is "utf-8" (previously implicitly decoded as
ISO-8859-1). Also fixed a bug in which mixed-case hex digits (such as
"%aF") weren't being decoded at all.

urllib.parse.quote: Added "encoding" and "errors" optional arguments,
allowing the caller to determine the encoding of non-ASCII characters
before being percent-encoded. Default is "utf-8" (previously characters
in range(128, 256) were encoded as ISO-8859-1, and characters above that
as UTF-8). Also characters/bytes above 128 are no longer allowed to be
"safe". Also now allows either bytes or strings.

Added functions urllib.parse.quote_from_bytes,
urllib.parse.unquote_to_bytes. All quote/unquote functions now exported
from the module.

Doc/library/urllib.parse.rst: Updated docs on quote and unquote to
reflect new interface, added quote_from_bytes and unquote_to_bytes.

Lib/test/ Added many new test cases testing encoding
and decoding Unicode strings with various encodings, as well as testing
the new functions.

Lib/test/, Lib/test/,
Lib/test/ Updated and added test cases to deal with
UTF-8-encoded URIs.

Lib/email/ Calls urllib.parse.quote and urllib.parse.unquote
with encoding="latin-1", to preserve existing behaviour (which the whole
email module is dependent upon).
Date User Action Args
2008-08-07 14:59:58mgiucasetrecipients: + mgiuca, gvanrossum, loewis, jimjjewett, janssen, orsenthil, pitrou, thomaspinckney3
2008-08-07 14:59:58mgiucasetmessageid: <>
2008-08-07 14:59:58mgiucalinkissue3300 messages
2008-08-07 14:59:56mgiucacreate