Issue 1298: Support for z/OS and EBCDIC.

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/45639

classification

Title:	Support for z/OS and EBCDIC.
Type:	enhancement	Stage:
Components:	Build, Distutils, Extension Modules, Interpreter Core, Library (Lib), Unicode	Versions:	Python 2.6

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:		Nosy List:	JYMEN, gvanrossum, lealanko, loewis
Priority:	normal	Keywords:

Created on 2007-10-18 17:14 by lealanko, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
python-20071018-zos.patch	lealanko, 2007-10-18 17:14

Messages (12)
msg56532 - (view)	Author: Lauri Alanko (lealanko)	Date: 2007-10-18 17:14
The attached patch, based on Jean-Yves Mengant's work, is against svn head, and adds support for z/OS in particular, and non-ASCII platforms in general. Further details are in a separate mail to python-dev, which I will send shortly.
msg56535 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2007-10-18 17:41
How important is z/OS? I'm very skeptical of the viability of any OS that uses an encoding that is not a superset of ASCII.
msg56548 - (view)	Author: Lauri Alanko (lealanko)	Date: 2007-10-19 07:10
The character set of EBCDIC is a superset of the character set of ASCII. In fact CP1047, the variant used on z/OS, has the same character set as Latin-1. Only the encoding is completely different. As a non-ASCII platform, z/OS is certainly challenging for people used to modern conventions, and that is exactly why a familiar and easy-to-use tool like Python is so valuable there. As for viability, there are some obvious difficulties with Python's handling of source encodings, but as long as you restrict yourself to the ASCII _character set_ in your source code, the vast majority of things seem to work fine with my patch. There are more details in my mail to python-dev, which doesn't seem to have appeared yet. I'm not a subscriber, so it's probably pending moderation somewhere. (I hope "The list address accepts e-mail from non-members" is still correct information.)
msg56549 - (view)	Author: Lauri Alanko (lealanko)	Date: 2007-10-19 07:12
How do you measure importance? Z/OS is not important to many people in the world, but to those to whom it is important, it is _very_ important, in a very tangible way. It was certainly important enough for someone to port Python to it. :)
msg56553 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2007-10-19 14:02
> How do you measure importance? Z/OS is not important to many > people in the world, but to those to whom it is important, it is > _very_ important, in a very tangible way. It was certainly > important enough for someone to port Python to it. :) But is it important enough to cause a lot of work for the maintainers of Python, not just once (reviewing your mega-patch) but also in the future (making sure that the Z/OS support doesn't break)? We have accepted mega-patches for minority OS'es in the past, and our experience has unfortunately been that the contributors of such patches inevitable lose interest and the Python core developers are stuck with maintaining the patch -- or ripping it out, which is just as much work but at least promises that there will be no more work related to this issue in the future. I strongly recommend an alternative: the Z/OS community should maintain the patch set themselves. That way the burden of keeping it working is to those who benefit. It also makes it possible to decide not to upgrade to a newer version of Python because there aren't enough benefits. This is done for example by Nokia for its port to S60. > The character set of EBCDIC is a superset of the character set of > ASCII. In fact CP1047, the variant used on z/OS, has the same > character set as Latin-1. Only the encoding is completely > different. And there's the crux -- too much code (not just in the core but also in the library and in 3rd party code) assumes that the ASCII encoding is used in 8-bit strings. Breaking this will break tons of stuff. Glancing at your code it seems that you haven't tried the socket module or the higher-level internet modules to contact web servers on the internet...
msg56577 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2007-10-19 23:51
FYI, I checked the moderation queue for python-dev and didn't find your message. You might want to resend.
msg56647 - (view)	Author: Lauri Alanko (lealanko)	Date: 2007-10-22 13:32
Further comments on the port can be at: http://mail.python.org/pipermail/python-dev/2007-October/074991.html
msg56667 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2007-10-23 05:32
I'm marking the patch as rejected, but leave it open. It seems clear that it cannot be incorporated into Python because of the maintenance issues (the only reasonable way to incorporate it would be if a long-time Python contributor steps forward and offers to maintain it, which seems unlikely). I'm leaving it open for the moment so people can easily find it. I encourage you to find some new home for the patch, e.g. by submitting it to PyPI (or to some System z community page if there is one); at this point, it should be closed. If the patch is still around five years from now, and still maintained, I might be interested in stepping forward to support it (assuming I am still a Python contributor at this point).
msg56676 - (view)	Author: Jean-Yves MENGANT (JYMEN)	Date: 2007-10-23 09:46
Let me provide my contribution to this discussion around this ZOS port topic : I initially made the Python 2.2 and 2.4 for ZOS platform and ask the python community to link to my pages as a support to ZOS at that time Lauri get in touch with me couple of weeks ago asking if I was planning to make a port of the 2.5 ; since I was waiting for 2.6 before initiating a new port, He goes ahead and makes the 2.5 port happen now. About how important is the ZOS system ; let me argue around that : even if ZOS is an IBM proprietary OS which has been there for decades it will be there for a long time since it occupies a very specific 'niche' on the os'es market And since IBM has heavily spoiled the migration path to Unix in order to keep its revenues on it migrating those systems to plain vanilla unixes is a nightmare => Today every US or European big company s having a ZOS sytem somewhere. Next even if ZOS is proprietary and EBCDIC it has a peasonable POSIX.5 compliant subsystem and a descent C/C++ compiler which makes the port of python not too complex. From a script standpoint there are today 3 available scripting languages availables : - REXX (the mike cowlishaw script language) , perl and python) So keeping an accurate version of python on this platform makes sense as well to increase the python language usage Next I am still happy to continue supporting the ZOS port and I perfectly understand that fully integrating the ZOS idiosynchrasies into the Python main branch generates maintenability problems ... But some of the submitted problems included into Lauri patch are not ZOS specific and increase and simply increase the portability of the python Kernel to EBCDIC platform(ZOS and OS400) So finally my opinion here is the the problem can be splitted into two parts : 1 General improvements patches which improves the Python kernel which can be incorporated in the python kernel and which may not be to complicated to maintain on the main branch 2 ZOS idiosynchrasies (mainly located in making the autoconf/automake and build scripts compliant with ZOS ); this can be done specifically by zos python specialists which have access to ZOS mainframe in order to be able to test. I am happy to continue to make the topic 2 availables on the ZOS python port pages with the help of others contributors like Lauri and give them credit on the ZOS port page. So I propose to integrate lauri's patch in the 2.5.1 current and provide a modified ZOS compliant source tar containing modified autoconf/automake and dynamic loading stuff I Finally should emphazise on 2 complementary arguments : - The ZOS port has been used in industrial products(including the company for which I work today) and contributes to promote the python language on important non unix platforms showing the extreme portability of the language. - Even the IBM Labs in Boulder(colorado) get in touch with me in order to integrate the port in one of their project.
msg56683 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2007-10-23 17:14
Jean-Yves, please understand that no amount of discussion can likely change Guido's or my view on this patch. We both fully understand the relevance of OS/390, and still reject it, for the reasons discussed. Besides, integration into 2.5.1 is not possible, as it would violate our maintenance policy of not integrating new features into bug fix (2.x.y) releases. Integrating it into 2.6 might be possibly technically, but could be a waste of time since 2.x will shortly (i.e. within a few years) reach the end of its life. I doubt that the patch as it stands will work correctly on 3.x (as that stands). As you seem to be proposing that supporting EBCDIC will be "easy", just try to port the patch to 3.x to see how this assumption is wrong. In Python 3.x, Python source code cannot be interpreted as EBCDIC, without an encoding declaration, since the language specification says that the source code is UTF-8; there is no room for platform-specific derivations from that default. Also consider Guido's discussion of the networking code; unless you can report that httplib and ftplib work correctly, I doubt that the port is really complete. So I think the only choice is to maintain this port outside of the Python source tree, for a few more years. If you plan to contribute it again to the Python core some day, please keep track of all the individual contributors, as we will then require copyright agreements from everyone.
msg56704 - (view)	Author: Lauri Alanko (lealanko)	Date: 2007-10-24 10:14
The port is certainly not yet "complete" in any sense. I have only fixed the most obvious places where explicit conversion between ASCII/Unicode values and platform-specific characters is required. There are a number of remaining issues, some of which cannot be fixed without major rehauls. The point of this first release is just to allow other interested people to chime in, to test the patch, and to suggest what should be done with it. The latter has certainly happened. :) I have no great interest in whether the patch ever gets incorporated into the main Python distribution. I do think, though, that it's a good idea to make the relationship between characters and Unicode values more explicit in the code in any case, and my patch shouldn't affect the behavior on any other platforms. Guido's comment about networking code is quite accurate, but the problem is social, not technical: there is already networking code that assumes that 8-bit string literals represent ASCII strings, and there is already text-processing code that assumes that 8-bit string literals represent "text" as found in ordinary text files on the platform. There is no reliable way to make both kinds of code work on a platform whose native encoding is not ASCII-compatible. In this sense, it is indeed impossible to port Python 2.x to an EBCDIC platform "completely", so that all existing code would continue to do "the right thing" without modifications. However, Py3k presents a fresh start, and one where this particular problem is gone, since string literals are no longer associated with a particular encoding, and bytes literals explicitly represent the ASCII values of the characters in the literal expression. Then text-processing code will likely use string literals, and it easy to make the default encoding platform-specific when transferring data between local text files and string objects. As far as I can see, EBCDIC shouldn't pose any special problems then. From what I read in PEP 3120 and the Py3k docs, there seems to be some confusion regarding source encoding issues. Firstly, Python source code is fundamentally _text_. For instance, a string literal is delimited by single quote or double quote characters. Characters themselves are abstract entities that have no inherent numeric values, although we can name them with e.g. Unicode code points, so we can say that the string delimiters are characters represented by the code points U+0022 and U+0027. What PEP 3120 specifies is a mechanism for mapping octet sequences into these abstract characters. If this is made part of the language specification, it presumably means that a conformant Py3k source file must start as UTF-8 at least until an encoding declaration is encountered. Further, a conformant Py3k implementation must accept such UTF-8 source files and decode them as specified in the PEP. So far so good. however, there is nothing to prevent an implementation from providing (as an extension) a facility to allow _other_ kinds of source as well. "There is no room for platform-specific derivations" is an arbitrary restriction: there are certainly quite a number of ways to support both UTF-8 and CP1047 source on z/OS: for instance, the filesystem allows storing the encoding of a text file as metadata. Moreover, there is a semantics-preserving mapping from UTF-8 source files to CP1047 source files: since non-ASCII characters can only appear in comments an string literals, and comments have no semantics, it suffices to \u-escape the exotic characters in string literals. Hence all Python source can be represented as native text on an EBCDIC platform. Of course you can declare that support for such extensions would be heretical and no EBCDIC source file would be True Python Source and no EBCDIC implementation would be a True Python Implementation, but I don't really care. Python 3000 _can_ be ported to z/OS much better than 2.x, and it probably will, even if you don't like it. Oh the wonders of open source. :)
msg56708 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2007-10-24 14:32
I have no desire or time to continue this discussion. The ASCII assumption will be ingrained as deeply or deeper in 3.0 than in 2.x, just like 8-bit bytes and 2's complement. The computer industry has chosen, and there just isn't any incentive to invent abstractions for properties that are constant in 99.999999% of all practical situations.

History
Date	User	Action	Args
2022-04-11 14:56:27	admin	set	github: 45639
2007-10-24 14:32:49	gvanrossum	set	status: open -> closed messages: + msg56708
2007-10-24 10:14:51	lealanko	set	messages: + msg56704
2007-10-23 17:14:06	loewis	set	messages: + msg56683
2007-10-23 09:46:55	JYMEN	set	nosy: + JYMEN messages: + msg56676
2007-10-23 05:32:10	loewis	set	nosy: + loewis resolution: rejected messages: + msg56667
2007-10-22 13:32:24	lealanko	set	messages: + msg56647
2007-10-19 23:51:22	gvanrossum	set	messages: + msg56577
2007-10-19 14:02:25	gvanrossum	set	messages: + msg56553
2007-10-19 07:12:24	lealanko	set	messages: + msg56549
2007-10-19 07:10:51	lealanko	set	messages: + msg56548
2007-10-18 17:41:54	gvanrossum	set	nosy: + gvanrossum messages: + msg56535
2007-10-18 17:14:11	lealanko	create