Author lealanko
Recipients JYMEN, gvanrossum, lealanko, loewis
Date 2007-10-24.10:14:48
SpamBayes Score 0.00065286
Marked as misclassified No
Message-id <>
The port is certainly not yet "complete" in any sense. I have only fixed
the most obvious places where explicit conversion between ASCII/Unicode
values and platform-specific characters is required. There are a number
of remaining issues, some of which cannot be fixed without major
rehauls. The point of this first release is just to allow other
interested people to chime in, to test the patch, and to suggest what
should be done with it. The latter has certainly happened. :)

I have no great interest in whether the patch ever gets incorporated
into the main Python distribution. I do think, though, that it's a good
idea to make the relationship between characters and Unicode values more
explicit in the code in any case, and my patch shouldn't affect the
behavior on any other platforms.

Guido's comment about networking code is quite accurate, but the problem
is social, not technical: there is already networking code that assumes
that 8-bit string literals represent ASCII strings, and there is already
text-processing code that assumes that 8-bit string literals represent
"text" as found in ordinary text files on the platform. There is no
reliable way to make both kinds of code work on a platform whose native
encoding is not ASCII-compatible. In this sense, it is indeed impossible
to port Python 2.x to an EBCDIC platform "completely", so that all
existing code would continue to do "the right thing" without modifications.

However, Py3k presents a fresh start, and one where this particular
problem is gone, since string literals are no longer associated with a
particular encoding, and bytes literals explicitly represent the ASCII
values of the characters in the literal expression. Then text-processing
code will likely use string literals, and it easy to make the default
encoding platform-specific when transferring data between local text
files and string objects. As far as I can see, EBCDIC shouldn't pose any
special problems then.

From what I read in PEP 3120 and the Py3k docs, there seems to be some
confusion regarding source encoding issues.

Firstly, Python source code is fundamentally _text_. For instance, a
string literal is delimited by single quote or double quote characters.
Characters themselves are abstract entities that have no inherent
numeric values, although we can name them with e.g. Unicode code points,
so we can say that the string delimiters are characters represented by
the code points U+0022 and U+0027.

What PEP 3120 specifies is a mechanism for mapping octet sequences into
these abstract characters. If this is made part of the language
specification, it presumably means that a conformant Py3k source file
must start as UTF-8 at least until an encoding declaration is
encountered. Further, a conformant Py3k implementation must accept such
UTF-8 source files and decode them as specified in the PEP.

So far so good. however, there is nothing to prevent an implementation
from providing (as an extension) a facility to allow _other_ kinds of
source as well. "There is no room for platform-specific derivations" is
an arbitrary restriction: there are certainly quite a number of ways to
support both UTF-8 and CP1047 source on z/OS: for instance, the
filesystem allows storing the encoding of a text file as metadata.

Moreover, there is a semantics-preserving mapping from UTF-8 source
files to CP1047 source files: since non-ASCII characters can only appear
in comments an string literals, and comments have no semantics, it
suffices to \u-escape the exotic characters in string literals. Hence
all Python source can be represented as native text on an EBCDIC

Of course you can declare that support for such extensions would be
heretical and no EBCDIC source file would be True Python Source and no
EBCDIC implementation would be a True Python Implementation, but I don't
really care. Python 3000 _can_ be ported to z/OS much better than 2.x,
and it probably will, even if you don't like it. Oh the wonders of open
source. :)
Date User Action Args
2007-10-24 10:14:51lealankosetspambayes_score: 0.00065286 -> 0.00065286
recipients: + lealanko, gvanrossum, loewis, JYMEN
2007-10-24 10:14:51lealankosetspambayes_score: 0.00065286 -> 0.00065286
messageid: <>
2007-10-24 10:14:51lealankolinkissue1298 messages
2007-10-24 10:14:48lealankocreate