Message 114847 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	baikie
Recipients	baikie, ezio.melotti, lemburg, loewis, vstinner
Date	2010-08-24.22:59:29
SpamBayes Score	1.8274237e-07
Marked as misclassified	No
Message-id	<20100824225934.GA4097@dbwatson.ukfsn.org>
In-reply-to	<4C72FF16.6040800@v.loewis.de>

Content
> > It's about environments, not applications > > Still, my question remains. Is it a theoretical problem (i.e. one > of your imagination), or a real one (i.e. one you observed in real > life, without explicitly triggering it)? If real: what was the > specific environment, and what was the specific host name? Yes, I did reproduce the problem on my own system (Ubuntu 8.04). No, it is not from a real application, nor do I know anyone with their network configured like this (except possibly Dan "djbdns" Bernstein: http://cr.yp.to/djbdns/idn.html ). I reported this bug to save anyone who is in such an environment from crashing applications and erroneous name resolution. > > That means that when a decoded hostname contains a non-ASCII > > character which is not prohibited by IDNA/Nameprep, that string > > will, when used in a subsequent call, not refer to the hostname > > that was actually received, because it will be re-encoded using a > > different codec. > > Again, I fail to see the problem in this. It won't happen in > real life. However, if you worried that this could be abused, > I think it should decode host names as ASCII, not as UTF-8. > Then it will be symmetric again (IIUC). That would be an improvement. The idea of the patches I posted is to combine this with the existing surrogateescape mechanism, which handles situations like this perfectly well. I don't see how getting a UnicodeError is better than getting a string with some lone surrogates in it. In fact, it was my understanding of PEP 383 that it is in fact better to get the lone surrogates.

> > It's about environments, not applications
> 
> Still, my question remains. Is it a theoretical problem (i.e. one
> of your imagination), or a real one (i.e. one you observed in real
> life, without explicitly triggering it)? If real: what was the
> specific environment, and what was the specific host name?

Yes, I did reproduce the problem on my own system (Ubuntu 8.04).
No, it is not from a real application, nor do I know anyone with
their network configured like this (except possibly Dan "djbdns"
Bernstein: http://cr.yp.to/djbdns/idn.html ).

I reported this bug to save anyone who *is* in such an
environment from crashing applications and erroneous name
resolution.

> > That means that when a decoded hostname contains a non-ASCII
> > character which is not prohibited by IDNA/Nameprep, that string
> > will, when used in a subsequent call, not refer to the hostname
> > that was actually received, because it will be re-encoded using a
> > different codec.
> 
> Again, I fail to see the problem in this. It won't happen in
> real life. However, if you worried that this could be abused,
> I think it should decode host names as ASCII, not as UTF-8.
> Then it will be symmetric again (IIUC).

That would be an improvement.  The idea of the patches I posted
is to combine this with the existing surrogateescape mechanism,
which handles situations like this perfectly well.  I don't see
how getting a UnicodeError is better than getting a string with
some lone surrogates in it.  In fact, it was my understanding of
PEP 383 that it is in fact better to get the lone surrogates.

History
Date	User	Action	Args
2010-08-24 22:59:31	baikie	set	recipients: + baikie, lemburg, loewis, vstinner, ezio.melotti
2010-08-24 22:59:29	baikie	link	issue9377 messages
2010-08-24 22:59:29	baikie	create