This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urlparse has a duplicate of urllib.unquote
Type: Stage: resolved
Components: Library (Lib) Versions: Python 2.7, Python 2.6
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: adamnelson, brett.cannon, ezio.melotti, mgiuca, orsenthil, r.david.murray
Priority: low Keywords: patch

Created on 2010-03-15 01:06 by mgiuca, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
urlparse-unquote.patch mgiuca, 2010-03-15 01:06 urlparse.py patch
urlparse-unquote-newmodule.patch mgiuca, 2010-03-15 02:56 Moves both versions of unquote to a separate module
Messages (9)
msg101075 - (view) Author: Matt Giuca (mgiuca) Date: 2010-03-15 01:06
urlparse contains a complete copy of the urllib.unquote function. This is extremely nasty code duplication -- I have two patches pending on urllib.unquote (#8135 and #8136) and I only just realised that I missed urlparse.unquote!

The reason given for this is:
"Cannot use directly from urllib as it would create circular reference.
urllib uses urlparse methods ( urljoin)"

I don't see that as a reason for code duplication. The fix is to make a local import of unquote in parse_qsl, like this:

def parse_qsl(qs, keep_blank_values=0, strict_parsing=0):
    from urllib import unquote

I am aware that this possibly violates PEP 8 (all imports should be at the top of the module), but I'd say this is the lesser of two evils.

A patch is attached. Commit log: "urlparse: Removed duplicate of urllib.unquote. Replaced with a local import."
msg101083 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-03-15 02:26
If I understand correctly, the problem with doing an import in a function is that there is an import lock which can cause various problem depending on how an application imports various things.  I've added Brett to the nosy list to see if he has a definitive opinion on this.

Note that this code is not duplicated in the py3 codebase.  It is probably best to just live with the code duplication in 2.7.  People may be importing unquote from the "wrong" module, and removing it would break their code.
msg101084 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-03-15 02:34
Yes, the reason for having the unquote verbatim was it created a circular reference otherwise. This was done when parse_qsl was moved from cgi module to the urlparse module. I would also like to know more on what RDM points out, tough I have seem some modules having local imports.
msg101086 - (view) Author: Matt Giuca (mgiuca) Date: 2010-03-15 02:56
If this patch is rejected, then at the very least, the urllib.unquote function needs a comment at the top explaining that it is duplicated in urlparse, so any changes should be made to both.

Note that urlparse.unquote is not a documented function, or in the __all__ export list, so people *shouldn't* be using it. But OK, I'll accept that some might.

If there is a problem with some kind of race condition importing (I don't see how there could be, but I'll accept it if someone confirms), or with people using urlparse.unquote directly, then I'd propose an alternate solution which removes the circular dependency entirely: Move unquote into a separate module _urlunquote, which is imported by both urllib and urlparse. No code breakage.

Patch attached. Commit log: "Fixed duplication of urllib.unquote in urlparse. Moved function to a separate module _urlunquote."
msg101143 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2010-03-16 01:19
So David's right that local commits are bad from a threading perspective. If you happen to have an import trigger a thread which itself triggers an import you will lock up the interpreter. Typically this is avoided by not importing anything in a thread and not launching a thread based on an import. But some people do and they eventually get bit by calling a function that they didn't realize would trigger an import as a side-effect.

Since the code duplication has been solved in Python 3 I say add a comment about the duplication and be done with it for ease of development and complete backwards-compatibility.
msg101147 - (view) Author: Matt Giuca (mgiuca) Date: 2010-03-16 02:35
What about the alternative (newmodule) patch? That doesn't have threading issues, or break backwards compatibility.
msg101152 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2010-03-16 03:32
You could break it out into a module, but that feels like overkill for some minor code duplication that is not going to be a problem once we stop caring about Python 2.x. I personally wouldn't bother going that far.
msg106437 - (view) Author: AdamN (adamnelson) Date: 2010-05-25 14:34
I would vote to close this and focus any code cleanliness work on 3.x.  The deep import is more trouble than it's worth.
msg106440 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-05-25 15:21
I synced the two versions in r81518 and added a comment to both about updating the other.
History
Date User Action Args
2022-04-11 14:56:58adminsetgithub: 52390
2010-05-25 15:21:46r.david.murraysetstatus: open -> closed
resolution: wont fix
messages: + msg106440

stage: patch review -> resolved
2010-05-25 14:34:51adamnelsonsetnosy: + adamnelson
messages: + msg106437
2010-03-16 03:32:31brett.cannonsetpriority: normal -> low

messages: + msg101152
2010-03-16 02:35:15mgiucasetmessages: + msg101147
2010-03-16 01:19:39brett.cannonsetmessages: + msg101143
2010-03-15 02:56:05mgiucasetfiles: + urlparse-unquote-newmodule.patch

messages: + msg101086
2010-03-15 02:34:32orsenthilsetnosy: + orsenthil
messages: + msg101084
2010-03-15 02:26:40r.david.murraysetnosy: + r.david.murray, brett.cannon
messages: + msg101083
2010-03-15 01:10:15ezio.melottisetpriority: normal
nosy: + ezio.melotti

stage: patch review
2010-03-15 01:06:47mgiucacreate