This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urlparse and usernames containing @
Type: behavior Stage:
Components: Library (Lib) Versions: Python 2.5
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: gvanrossum Nosy List: gvanrossum, ocroquette, orsenthil
Priority: normal Keywords: patch

Created on 2007-12-25 14:22 by ocroquette, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
urlparse_issue1698.patch orsenthil, 2007-12-26 17:24
issue1698-withtests.patch orsenthil, 2008-01-04 04:17
issue6898-r25-maint.patch orsenthil, 2008-01-08 02:42
Messages (12)
msg58990 - (view) Author: Olivier Croquette (ocroquette) Date: 2007-12-25 14:22
Some servers allow the @ character is usernames. It gives URLs like:
ftp://user@xyz@host/dir

user@xyz could for example by an email address.

I am not sure if this is RFC compliant. What's sure is that is makes
trouble with urlparse: 


>>> from urlparse import urlparse
>>> p = urlparse("ftp://user@host1@host2/dir") 
>>> print p.username
user
>>> print p.hostname
host1@host2

By using rsplit instead of split in lib/python2.5/urlparse.py, the
problem can be solved.
msg58995 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2007-12-26 14:43
Hi ocroquette,

Even though I have seen ftp sites requesting email addresses as user
names, I would rather put this issue for discussion as this not per any RFC.

urlparse is based upon RFC1808 and it points to RFC1738 for the URL
Syntax. RFC1738 specifically says (line 225) that:

The user name (and password), if present, are followed by a commercial
at-sign "@". Within the user and password field, any ":", "@" or "/"
must be encoded.



>>> url = "ftp://user@xyz@host/dir"
>>> from urlparse import urlparse
>>> p = urlparse(url)
>>> print p.username
user
>>> print p.hostname
xyz@host
>>> corr_url = "ftp://user%40xyz@host/dir"
>>> q = urlparse(corr_url)
>>> print q.username
user%40xyz
>>> print q.hostname
host
>>>

So, it is upon the user/client to encode the @sign in the user name, if
he has to be compliant with RFC. 
urlparse is not at fault here.
What shall we do with this issue?
msg58996 - (view) Author: Olivier Croquette (ocroquette) Date: 2007-12-26 16:38
Hi!

Thanks for the reply!

The problem right now is that urlparse parses silently an URL which is
not compliant, but does the wrong thing with it (since usernames can
contain @, and hostname can not, it's a more logical thing to parse from
the right using rsplit instead of split).

I see two possibilities to address that:

1. return a parse error if the URL contains two raw @
This way users and app developers will notice the problem rapidly

2. still accept this malformed URLs, but do what the user expects

Both solutions seem to me better than the current behaviour, so I would
say a change is necessary anyways.

PS: will urlparse transform an encoded "@" in the username when
.username is called, or does the application have to decode explicitely?

Olivier
msg58997 - (view) Author: Olivier Croquette (ocroquette) Date: 2007-12-26 16:40
See also the related bug on duplicity:
http://savannah.nongnu.org/bugs/?21475
msg58998 - (view) Author: Olivier Croquette (ocroquette) Date: 2007-12-26 16:45
And about the decoding, sorry, it's clear from your snippets that
urlparse doesn't do it:
>>> print q.username
user%40xyz

Maybe it should do it, I am not sure. What do you think? It would save
work for the module user.
msg59001 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2007-12-26 17:24
> Olivier Croquette added the comment:
> 
> 
> The problem right now is that urlparse parses silently an URL which is
> not compliant, but does the wrong thing with it (since usernames can
> contain @, and hostname can not, it's a more logical thing to parse from
> the right using rsplit instead of split).
> 
> I see two possibilities to address that:
> 
> 1. return a parse error if the URL contains two raw @
> This way users and app developers will notice the problem rapidly
> 
> 2. still accept this malformed URLs, but do what the user expects
> 
> Both solutions seem to me better than the current behaviour, so I would
> say a change is necessary anyways.

I am inclined towards the option 2, wherein even if the non-conformant URL is
given, which has "@" in the user name, the urlparse should be able atleast
recognize the hostname correctly. In this case, using rsplit instead of split
does the trick.

Attached is the patch against the trunk, which adds this. I notice duplicity
project also retorting to the same.

Unless anyone has an objection, we should commit this change.
msg59002 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2007-12-26 17:29
> 
> Olivier Croquette added the comment:
> 
> And about the decoding, sorry, it's clear from your snippets that
> urlparse doesn't do it:
> >>> print q.username
> user%40xyz
> 
> Maybe it should do it, I am not sure. What do you think? It would save
> work for the module user.

No, urlparse does not have any method for encoding/decoding.
It has been left to urllib2 in urlencode, urldecode, quote, unquote etc.
Strangely though, user should use urllib2 for that purpose.

And yeah, there is been an effort to merge the urlparse and urllib2
functionality and I am slowly upto it. :)
Before more requests come this way, I guess I have to show some progress.
msg59109 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-01-02 23:14
Could you please add a unit test for this?  Then I can check it in.

I'm thinking this could safely be fixed in 2.5.2.
msg59213 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2008-01-04 04:17
Hi Guido,
I have added the unit tests, tested it on my system and created a new patch
combining the previous one against the trunk.
You can verify and check in this. 

Thank you,
Senthil
msg59275 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-01-05 01:23
Committed revision 59726.

This is in the trunk (2.6).  Do you need this backported to 2.5.2 as well?
msg59518 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2008-01-08 02:42
Thank you, Yes having it in the 2.5.2 would be a good idea. The bug
report is also on 2.5 only.
I have attached the patch for the release25-maint branch.

Thanks,
Senthil
msg59519 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2008-01-08 02:45
The patch against the trunk could also have been applied to 2.5. There
is NO change in the trunk and 2.4-maint patch. 
Thanks,
Senthil
History
Date User Action Args
2022-04-11 14:56:29adminsetgithub: 46039
2008-01-08 02:45:02orsenthilsetmessages: + msg59519
2008-01-08 02:42:31orsenthilsetfiles: + issue6898-r25-maint.patch
messages: + msg59518
2008-01-05 01:23:01gvanrossumsetstatus: open -> closed
keywords: + patch
messages: + msg59275
2008-01-04 04:17:57orsenthilsetfiles: + issue1698-withtests.patch
messages: + msg59213
2008-01-02 23:14:19gvanrossumsetassignee: gvanrossum
resolution: accepted
messages: + msg59109
nosy: + gvanrossum
2007-12-26 17:29:46orsenthilsetmessages: + msg59002
2007-12-26 17:24:55orsenthilsetfiles: + urlparse_issue1698.patch
messages: + msg59001
2007-12-26 16:45:00ocroquettesetmessages: + msg58998
2007-12-26 16:40:19ocroquettesetmessages: + msg58997
2007-12-26 16:38:23ocroquettesetmessages: + msg58996
2007-12-26 14:43:33orsenthilsetnosy: + orsenthil
messages: + msg58995
2007-12-25 14:22:11ocroquettecreate