This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Support for z/OS and EBCDIC.
Type: enhancement Stage:
Components: Build, Distutils, Extension Modules, Interpreter Core, Library (Lib), Unicode Versions: Python 2.6
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: JYMEN, gvanrossum, lealanko, loewis
Priority: normal Keywords:

Created on 2007-10-18 17:14 by lealanko, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
python-20071018-zos.patch lealanko, 2007-10-18 17:14
Messages (12)
msg56532 - (view) Author: Lauri Alanko (lealanko) Date: 2007-10-18 17:14
The attached patch, based on Jean-Yves Mengant's work, is against svn
head, and adds support for z/OS in particular, and non-ASCII platforms
in general. Further details are in a separate mail to python-dev, which
I will send shortly.
msg56535 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-18 17:41
How important is z/OS?  I'm very skeptical of the viability of any OS
that uses an encoding that is not a superset of ASCII.
msg56548 - (view) Author: Lauri Alanko (lealanko) Date: 2007-10-19 07:10
The character set of EBCDIC is a superset of the character set of
ASCII. In fact CP1047, the variant used on z/OS, has the same
character set as Latin-1. Only the encoding is completely
different.

As a non-ASCII platform, z/OS is certainly challenging for people
used to modern conventions, and that is exactly why a familiar
and easy-to-use tool like Python is so valuable there. As for
viability, there are some obvious difficulties with Python's
handling of source encodings, but as long as you restrict
yourself to the ASCII _character set_ in your source code, the
vast majority of things seem to work fine with my patch.

There are more details in my mail to python-dev, which doesn't
seem to have appeared yet. I'm not a subscriber, so it's probably
pending moderation somewhere. (I hope "The list address accepts
e-mail from non-members" is still correct information.)
msg56549 - (view) Author: Lauri Alanko (lealanko) Date: 2007-10-19 07:12
How do you measure importance? Z/OS is not important to many
people in the world, but to those to whom it is important, it is
_very_ important, in a very tangible way. It was certainly
important enough for someone to port Python to it. :)
msg56553 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-19 14:02
> How do you measure importance? Z/OS is not important to many
> people in the world, but to those to whom it is important, it is
> _very_ important, in a very tangible way. It was certainly
> important enough for someone to port Python to it. :)

But is it important enough to cause a lot of work for the maintainers
of Python, not just once (reviewing your mega-patch) but also in the
future (making sure that the Z/OS support doesn't break)? We have
accepted mega-patches for minority OS'es in the past, and our
experience has unfortunately been that the contributors of such
patches inevitable lose interest and the Python core developers are
stuck with maintaining the patch -- or ripping it out, which is just
as much work but at least promises that there will be no more work
related to this issue in the future.

I strongly recommend an alternative: the Z/OS community should
maintain the patch set themselves. That way the burden of keeping it
working is to those who benefit. It also makes it possible to decide
not to upgrade to a newer version of Python because there aren't
enough benefits. This is done for example by Nokia for its port to
S60.

> The character set of EBCDIC is a superset of the character set of
> ASCII. In fact CP1047, the variant used on z/OS, has the same
> character set as Latin-1. Only the encoding is completely
> different.

And there's the crux -- too much code (not just in the core but also
in the library and in 3rd party code) assumes that the ASCII
*encoding* is used in 8-bit strings. Breaking this will break tons of
stuff. Glancing at your code it seems that you haven't tried the
socket module or the higher-level internet modules to contact web
servers on the internet...
msg56577 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-19 23:51
FYI, I checked the moderation queue for python-dev and didn't find your
message.  You might want to resend.
msg56647 - (view) Author: Lauri Alanko (lealanko) Date: 2007-10-22 13:32
Further comments on the port can be at:
http://mail.python.org/pipermail/python-dev/2007-October/074991.html
msg56667 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-10-23 05:32
I'm marking the patch as rejected, but leave it open. It seems clear
that it cannot be incorporated into Python because of the maintenance
issues (the only reasonable way to incorporate it would be if a
long-time Python contributor steps forward and offers to maintain it,
which seems unlikely).

I'm leaving it open for the moment so people can easily find it. I
encourage you to find some new home for the patch, e.g. by submitting it
to PyPI (or to some System z community page if there is one); at this
point, it should be closed.

If the patch is still around five years from now, and still maintained,
I might be interested in stepping forward to support it (assuming I am
still a Python contributor at this point).
msg56676 - (view) Author: Jean-Yves MENGANT (JYMEN) Date: 2007-10-23 09:46
Let me provide my contribution to this discussion around this ZOS port
topic :
I initially made the Python 2.2 and 2.4 for ZOS platform and ask the
python community to link to my pages as a support to ZOS at that time 

Lauri get in touch with me couple of weeks ago asking if I was planning
to make a port of the 2.5 ; since I was waiting for 2.6 before
initiating a new port, He goes ahead and makes the 2.5 port happen now.

About how important is the ZOS system ; let me argue around that :  even
if ZOS is an IBM proprietary OS which
has been there for decades it will be there for a long time since it
occupies a very specific 'niche' on the os'es market 
And since IBM has heavily spoiled the migration path to Unix in order to
keep its revenues on it migrating those 
systems to plain vanilla unixes is a nightmare => Today every US or
European big company s having a ZOS sytem somewhere. 
Next even if ZOS is proprietary and EBCDIC it has a peasonable POSIX.5
compliant subsystem and a descent C/C++ compiler
which makes the port of python not too complex.

From a script standpoint there are today 3 available scripting languages
availables :
- REXX (the mike cowlishaw script language) , perl and python)

So keeping an accurate version of python on this platform makes sense as
well to increase the python language usage

Next I am still happy  to continue supporting the ZOS port and I
perfectly understand that fully integrating the ZOS idiosynchrasies 
into the Python main branch generates maintenability problems ... But
some of the submitted problems included into Lauri patch are not ZOS
specific and increase 
and simply increase the portability of the python Kernel  to EBCDIC
platform(ZOS and OS400) 

So finally my opinion here is the the problem can be splitted into two
parts :

1 General improvements patches which improves the Python kernel which
can be incorporated in the python kernel and which 
may not be to complicated to maintain on the main branch

2 ZOS idiosynchrasies (mainly located in making the autoconf/automake
and build scripts compliant with ZOS ); this can be done specifically by 
zos python specialists which have access to ZOS mainframe in order to be
able to test.

I am happy to continue to make the topic 2 availables on the ZOS python
port pages with the help of others contributors like Lauri and 
give them credit on the ZOS port page. So I propose to integrate lauri's
patch in the 2.5.1 current and provide a modified ZOS compliant 
source tar containing modified autoconf/automake and dynamic loading stuff

I Finally should emphazise on 2 complementary arguments : 

- The ZOS port has been used in industrial products(including the
company for which I work today) and contributes to promote 
the python language on important non unix platforms showing the extreme
portability of the language.
- Even the IBM Labs in Boulder(colorado) get in touch with me in order
to integrate the port in one of their project.
msg56683 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-10-23 17:14
Jean-Yves, please understand that no amount of discussion can likely
change Guido's or my view on this patch. We both fully understand the
relevance of OS/390, and *still* reject it, for the reasons discussed.

Besides, integration into 2.5.1 is not possible, as it would violate our
maintenance policy of not integrating new features into bug fix (2.x.y)
releases. Integrating it into 2.6 might be possibly technically, but
could be a waste of time since 2.x will shortly (i.e. within a few
years) reach the end of its life. I doubt that the patch as it stands
will work correctly on 3.x (as *that* stands).

As you seem to be proposing that supporting EBCDIC will be "easy", just
try to port the patch to 3.x to see how this assumption is wrong. In
Python 3.x, Python source code *cannot* be interpreted as EBCDIC,
without an encoding declaration, since the language specification says
that the source code is UTF-8; there is no room for platform-specific
derivations from that default. Also consider Guido's discussion of the
networking code; unless you can report that httplib and ftplib work
correctly, I doubt that the port is really complete.

So I think the only choice is to maintain this port outside of the
Python source tree, for a few more years. If you plan to contribute it
again to the Python core some day, please keep track of all the
individual contributors, as we will then require copyright agreements
from everyone.
msg56704 - (view) Author: Lauri Alanko (lealanko) Date: 2007-10-24 10:14
The port is certainly not yet "complete" in any sense. I have only fixed
the most obvious places where explicit conversion between ASCII/Unicode
values and platform-specific characters is required. There are a number
of remaining issues, some of which cannot be fixed without major
rehauls. The point of this first release is just to allow other
interested people to chime in, to test the patch, and to suggest what
should be done with it. The latter has certainly happened. :)

I have no great interest in whether the patch ever gets incorporated
into the main Python distribution. I do think, though, that it's a good
idea to make the relationship between characters and Unicode values more
explicit in the code in any case, and my patch shouldn't affect the
behavior on any other platforms.

Guido's comment about networking code is quite accurate, but the problem
is social, not technical: there is already networking code that assumes
that 8-bit string literals represent ASCII strings, and there is already
text-processing code that assumes that 8-bit string literals represent
"text" as found in ordinary text files on the platform. There is no
reliable way to make both kinds of code work on a platform whose native
encoding is not ASCII-compatible. In this sense, it is indeed impossible
to port Python 2.x to an EBCDIC platform "completely", so that all
existing code would continue to do "the right thing" without modifications.

However, Py3k presents a fresh start, and one where this particular
problem is gone, since string literals are no longer associated with a
particular encoding, and bytes literals explicitly represent the ASCII
values of the characters in the literal expression. Then text-processing
code will likely use string literals, and it easy to make the default
encoding platform-specific when transferring data between local text
files and string objects. As far as I can see, EBCDIC shouldn't pose any
special problems then.

From what I read in PEP 3120 and the Py3k docs, there seems to be some
confusion regarding source encoding issues.

Firstly, Python source code is fundamentally _text_. For instance, a
string literal is delimited by single quote or double quote characters.
Characters themselves are abstract entities that have no inherent
numeric values, although we can name them with e.g. Unicode code points,
so we can say that the string delimiters are characters represented by
the code points U+0022 and U+0027.

What PEP 3120 specifies is a mechanism for mapping octet sequences into
these abstract characters. If this is made part of the language
specification, it presumably means that a conformant Py3k source file
must start as UTF-8 at least until an encoding declaration is
encountered. Further, a conformant Py3k implementation must accept such
UTF-8 source files and decode them as specified in the PEP.

So far so good. however, there is nothing to prevent an implementation
from providing (as an extension) a facility to allow _other_ kinds of
source as well. "There is no room for platform-specific derivations" is
an arbitrary restriction: there are certainly quite a number of ways to
support both UTF-8 and CP1047 source on z/OS: for instance, the
filesystem allows storing the encoding of a text file as metadata.

Moreover, there is a semantics-preserving mapping from UTF-8 source
files to CP1047 source files: since non-ASCII characters can only appear
in comments an string literals, and comments have no semantics, it
suffices to \u-escape the exotic characters in string literals. Hence
all Python source can be represented as native text on an EBCDIC
platform.

Of course you can declare that support for such extensions would be
heretical and no EBCDIC source file would be True Python Source and no
EBCDIC implementation would be a True Python Implementation, but I don't
really care. Python 3000 _can_ be ported to z/OS much better than 2.x,
and it probably will, even if you don't like it. Oh the wonders of open
source. :)
msg56708 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-24 14:32
I have no desire or time to continue this discussion.  The ASCII
assumption will be ingrained as deeply or deeper in 3.0 than in 2.x,
just like 8-bit bytes and 2's complement.  The computer industry has
chosen, and there just isn't any incentive to invent abstractions for
properties that are constant in 99.999999% of all practical situations.
History
Date User Action Args
2022-04-11 14:56:27adminsetgithub: 45639
2007-10-24 14:32:49gvanrossumsetstatus: open -> closed
messages: + msg56708
2007-10-24 10:14:51lealankosetmessages: + msg56704
2007-10-23 17:14:06loewissetmessages: + msg56683
2007-10-23 09:46:55JYMENsetnosy: + JYMEN
messages: + msg56676
2007-10-23 05:32:10loewissetnosy: + loewis
resolution: rejected
messages: + msg56667
2007-10-22 13:32:24lealankosetmessages: + msg56647
2007-10-19 23:51:22gvanrossumsetmessages: + msg56577
2007-10-19 14:02:25gvanrossumsetmessages: + msg56553
2007-10-19 07:12:24lealankosetmessages: + msg56549
2007-10-19 07:10:51lealankosetmessages: + msg56548
2007-10-18 17:41:54gvanrossumsetnosy: + gvanrossum
messages: + msg56535
2007-10-18 17:14:11lealankocreate