Title: urlparse "caches" parses regardless of encoding
Type: Stage:
Components: Unicode Versions: Python 2.4, Python 2.5
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: alexandre.vassalotti, kkinder, lemburg, palfrey
Priority: normal Keywords:

Created on 2005-10-04 17:57 by kkinder, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (4)
msg26504 - (view) Author: Ken Kinder (kkinder) Date: 2005-10-04 17:57
The issue can be summarized with this code:

>>> urlparse.urlparse(u'')
(u'http', u'', u'/doc', '', '', '')
>>> urlparse.urlparse('')
(u'http', u'', u'/doc', '', '', '')

Once the urlparse library has "cached" a URL, it stores
the resulting value of that cache regardless of
datatype. Notice that in the second use of urlparse, I
passed it a STRING and got back a UNICODE object.

This can be quite confusing when, as a developer, you
think you've already encoded all your objects, you use
urlparse, and all of a sudden you have unicode objects
again, when you expected to have strings.
msg26505 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2007-01-13 19:25
Unassigning: I don't use urlparse, so can't comment.
msg58345 - (view) Author: Tom Parker (palfrey) Date: 2007-12-10 13:35
Also effects Python 2.5.1 (tested on Debian python2.5 package version
msg58541 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2007-12-13 17:58
Fixed in r59480.
