Issue 1349732: urllib.urlencode provides two features in one param

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/42559

classification

Title:	urllib.urlencode provides two features in one param
Type:	enhancement	Stage:	resolved
Components:	Documentation, Library (Lib)	Versions:	Python 2.7

process

Status:	closed	Resolution:	duplicate
Dependencies:		Superseder:
Assigned To:	docs@python	Nosy List:	docs@python, georg.brandl, mike_j_brown, orsenthil, salty-horse, slinkp, terry.reedy
Priority:	normal	Keywords:	easy

Created on 2005-11-06 21:58 by salty-horse, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (7)
msg60833 - (view)	Author: Ori Avtalion (salty-horse) *	Date: 2005-11-06 21:58
Using the 2.4 distribution. It seems that urlencode knows how to handle unicode input with quote_plus and ascii encoding, but it only does that when doseq is True. 1) There's no mention of that useful feature in the documentation. 2) If I want to encode unicode data without doseq's feature, there's no way to do so. Although it's rare to use doseq's intended function, they shouldn't be connected. Shouldn't values be checked with _is_unicode and handled correctly in both modes of doseq? One reason I see that might make the unicode check cause problems is the comment says "preserve old behavior" when doseq is False. Could such a check affect the behaviour of old code? If it can, the unicode handling could be another optional parameter. Also, the docstring is really unclear as to the purpose of doseq. Can an small example be added? (I saw no PEP guidelines for how examples should be given in docstrings, or if they're even allowed, so perhaps this fits just the regular documentation) With query={"key": ("val1", "val2") doseq=1 yields: key=val1&key=val2 doseq=0 yields: key=%28%27val1%27%2C+%27val2%27%29 After the correct solution is settled, I'll gladly submit a patch with the fixes.
msg60834 - (view)	Author: Mike Brown (mike_j_brown)	Date: 2005-12-29 23:32
Logged In: YES user_id=371366 I understand why the implementation is the way it is. I agree that it is not documented as ideally as it could be. I also agree with your implication that ASCII-range unicode input should be acceptable (and converted to ASCII bytes internally before percent-encoding), regardless of doseq. I would not go so far as to say non-ASCII-range unicode should be accepted, since safe conversion to bytes before percent-encoding would not be possible. However, I was unable to reproduce your observation that doseq=0 results in urlencode not knowing how to handle unicode. The object is just passed to str(). Granted, that's not quite the same as when doseq=1, where unicode objects are specifically run through .encode('us-ascii','replace')), but I wouldn't characterize it as not knowing how to handle ASCII-range unicode. The results for ASCII-range unicode are the same. If you're going to make things more consistent, I would actually tighten up the doseq=1 behavior, replacing v = quote_plus(v.encode("ASCII","replace")) with v = quote_plus(v.encode("ASCII","strict")) and then mention in the docs that any object type is acceptable as a key or value, but if unicode is passed, it must be all ASCII-range characters; if there is a risk of characters above \u007f being passed, then the caller should convert the unicode to str beforehand. As for doseq's purpose and documentation, the doseq=1 behavior is ideal for almost all situations, since it takes care not to treat str or unicode as a sequence of separate 1-character values. AFAIK, the only reason it isn't the default is for backward compatiblity. It was introduced in Python 2.0.1 and was trying to retain compatibility with code written for Python 1.5.2 through 2.0.0. I suggest deprecating it and making doseq=1 behavior the default, if others (MvL?) approve.
msg60835 - (view)	Author: Ori Avtalion (salty-horse) *	Date: 2005-12-30 16:10
Logged In: YES user_id=854801 > However, I was unable to reproduce your observation that > doseq=0 results in urlencode not knowing how to handle > unicode. I had given urlencode a hebrew unicode string, and "".encode() could not convert it to ascii: s_unicode = u'\u05d1\u05d3\u05d9\u05e7\u05d4' print urllib.urlencode({"key":s_unicode}, 0) As I notice now, the line: >> urllib.urlencode({"key":s_unicode}, 1) key=%3F%3F%3F%3F%3F does not raise an exception but produces an incorrect result. The correct way to call it is like this: >> urllib.urlencode({"key":s_unicode.encode("iso8859_8")}, 1) key=%E1%E3%E9%F7%E4 So, in addition to your suggestion, I think the documentation should explicitly state that unicode strings will be treated as us-ascii. What about my suggestion of an example for doseq's behaviour in the docstring?
msg109824 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2010-07-10 06:35
"put something somewhere" will not get action. Please suggest specific wording and a specific place to put it and mark it TEXT or PATCH or something so a doc person can find it. I am assuming that this does not apply to 3.x.
msg110311 - (view)	Author: Senthil Kumaran (orsenthil) *	Date: 2010-07-14 18:53
This was fixed as part of Issue8788. Closing this.
msg269642 - (view)	Author: Paul Winkler (slinkp) *	Date: 2016-07-01 03:59
This was marked as a duplicate of http://bugs.python.org/issue8788 but the doc changes in that issue, and the current docs for 2.7, do not mention anything related to handling of unicode nor how `doseq` affects unicode-related behavior. If we can agree on wording, does 2.7 still get documentation fixes?
msg269644 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2016-07-01 04:42
2.7 can get doc changes if a core developer wants to push it.

History
Date	User	Action	Args
2022-04-11 14:56:13	admin	set	github: 42559
2016-07-01 04:42:23	terry.reedy	set	messages: + msg269644 versions: - Python 2.6
2016-07-01 03:59:12	slinkp	set	nosy: + slinkp messages: + msg269642
2010-07-14 18:53:56	orsenthil	set	status: open -> closed resolution: duplicate messages: + msg110311 stage: test needed -> resolved
2010-07-10 06:35:34	terry.reedy	set	versions: + Python 2.7 nosy: + terry.reedy, docs@python messages: + msg109824 assignee: georg.brandl -> docs@python
2009-04-22 18:48:01	ajaksu2	set	keywords: + easy stage: test needed
2009-02-12 18:25:59	ajaksu2	set	nosy: + orsenthil type: enhancement versions: + Python 2.6, - Python 2.7
2009-02-09 00:36:07	ajaksu2	set	nosy: + georg.brandl assignee: georg.brandl components: + Documentation versions: + Python 2.7, - Python 2.4
2005-11-06 21:58:35	salty-horse	create