This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author loewis
Recipients alexandre.vassalotti, dddibagh, georg.brandl, lemburg, loewis, mawbid
Date 2008-10-21.23:34:45
SpamBayes Score 4.9960036e-16
Marked as misclassified No
Message-id <48FE6713.7080100@v.loewis.de>
In-reply-to <1224629775.93.0.909801174269.issue2980@psf.upfronthosting.co.za>
Content
> I read the PEP,
> which serves as a specification of raw unicode escape (at least for the
> decoding bit) and the reference documentation.

Which PEP specifically? PEP 263 only mentions the unicode-escape
encoding in its problem statement, i.e. as a pre-existing thing.
It doesn't specify it, nor does it give a rationale for why it behaves
the way it does.

> Then I read the source
> trying to map between specified behavior in the documentation and the
> implementation in the source code. When it comes to the part which
> causes the problem with non-ASCII characters, it is difficult to follow.

What code are you looking at, and where do you find it difficult to
follow it? Maybe you get confused between the "unicode-escape" codec,
and the "raw-unicode-escape" codec, also.

> Or in other words: what is the high level reason why the codec won't
> escape \x80 in my test program?

The raw-unicode-escape codec? It was designed to support parsing of
Python 2.0 source code, and of "raw" unicode strings (ur"") in
particular. In Python 2.0, you only needed to escape characters above
U+0100; Latin-1 characters didn't need escaping. Python, itself, only
relied on the decoding directory. That the codec choses not to escape
Latin-1 characters on encoding is an arbitrary choice (I guess); it's
still symmetric with decoding.

Even though the choice was arbitrary, you shouldn't change it now,
because people may rely on how this codec works.

> What makes you think that the problem cannot be fixed without changing
> the existing pickle format 0?

Applications might rely on what was implemented rather than what was
specified. If they had implemented their own pickle readers, such
readers might break if the pickle format is changed. In principle, even
the old pickle readers of Python 2.0..2.6 might break if the format
changes in 2.7 - we would have to go back and check that they don't
break (although I do believe that they would work fine).

So I personally don't see a problem with fixing this, but it appears
MAL does (for whatever reasons - I can't quite buy the performance
argument). OTOH, I don't feel that this issue deserves as much of
my time to actually implement anythings.

So contributions are welcome. If you find that the patch meets
resistance, you also need to write a PEP, and ask for BDFL
pronouncement.
History
Date User Action Args
2008-10-21 23:34:46loewissetrecipients: + loewis, lemburg, georg.brandl, alexandre.vassalotti, mawbid, dddibagh
2008-10-21 23:34:45loewislinkissue2980 messages
2008-10-21 23:34:45loewiscreate