Message 142069 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	pitrou
Recipients	Arfrever, ezio.melotti, jkloth, mrabarnett, pitrou, r.david.murray, tchrist, terry.reedy
Date	2011-08-14.17:36:41
SpamBayes Score	9.155787e-12
Marked as misclassified	No
Message-id	<1313343278.3574.15.camel@localhost.localdomain>
In-reply-to	<25916.1313340867@chthon>

Content
> > The UTF-8 codec described by RFC 2279 didn't say so, so, since our > > codec was following RFC 2279, it was producing valid UTF-8. With RFC > > 3629 a number of things changed in a non-backward compatible way. > > Therefore we couldn't just change the behavior of the UTF-8 codec nor > > rename it to something else in Python 2. We had to wait till Python 3 > > in order to fix it. > > I'm a bit confused on this. You no longer fix bugs in Python 2? In general, we try not to introduce changes that have a high probability of breaking existing code, especially when what is being "fixed" is a minor issue which almost nobody complains about. This is even truer for stable branches, and Python 2 is very much a stable branch now (no more feature releases after 2.7). > That's why I say that you are of conformance by having encoders and decoders of UTF > streams tolerate noncharacters. You are not allowed to call something a UTF and do > non-UTF things with it, because this in violation of conformance requirement C2. Perhaps, but it is not Python's fault if the IETF and the Unicode consortium have disagreed on what UTF-8 should be. I'm not sure what people called "UTF-8" when support for it was first introduced in Python, but you can't blame us for maintaining a consistent behaviour across releases.

> > The UTF-8 codec described by RFC 2279 didn't say so, so, since our
> > codec was following RFC 2279, it was producing valid UTF-8.  With RFC
> > 3629 a number of things changed in a non-backward compatible way.
> > Therefore we couldn't just change the behavior of the UTF-8 codec nor
> > rename it to something else in Python 2.  We had to wait till Python 3
> > in order to fix it.
> 
> I'm a bit confused on this.  You no longer fix bugs in Python 2?

In general, we try not to introduce changes that have a high probability
of breaking existing code, especially when what is being "fixed" is a
minor issue which almost nobody complains about.

This is even truer for stable branches, and Python 2 is very much a
stable branch now (no more feature releases after 2.7).

> That's why I say that you are of conformance by having encoders and decoders of UTF
> streams tolerate noncharacters.  You are not allowed to call something a UTF and do
> non-UTF things with it, because this in violation of conformance requirement C2.

Perhaps, but it is not Python's fault if the IETF and the Unicode
consortium have disagreed on what UTF-8 should be. I'm not sure what
people called "UTF-8" when support for it was first introduced in
Python, but you can't blame us for maintaining a consistent behaviour
across releases.

History
Date	User	Action	Args
2011-08-14 17:36:42	pitrou	set	recipients: + pitrou, terry.reedy, jkloth, ezio.melotti, mrabarnett, Arfrever, r.david.murray, tchrist
2011-08-14 17:36:42	pitrou	link	issue12729 messages
2011-08-14 17:36:41	pitrou	create