This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Encode to EBCDIC doesn't take into account conversion table irregularities
Type: behavior Stage:
Components: Unicode Versions: Python 3.6
process
Status: open Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Vladimir Filippov, ezio.melotti, serhiy.storchaka, vstinner
Priority: normal Keywords:

Created on 2017-06-07 11:41 by Vladimir Filippov, last changed 2022-04-11 14:58 by admin.

Messages (3)
msg295329 - (view) Author: Vladimir Filippov (Vladimir Filippov) Date: 2017-06-07 11:41
These 4 symbols were encoded incorrectly to EBCDIC (codec cp500): "![]|". Correct table of conversation for these symbols described in https://www.ibm.com/support/knowledgecenter/SSZJPZ_11.3.0/com.ibm.swg.im.iis.ds.parjob.adref.doc/topics/r_deeadvrf_Conversion_Table_Irregularities.html

This code:
--------------------
ascii = '![]|';
print("ASCII:  " + bytes(ascii, 'ascii').hex())
res = ascii.encode('cp500')
print ("EBCDIC: " +res.hex())
--------------------
on Python 3.6.1 produce this output:
--------------------
ASCII:  215b5d7c
EBCDIC: 4f4a5abb
--------------------

Expected encoding (from IBM's table):
! - 5A
[ - AD
] - BD
| - 4F

Workaround: use this translation after encoding
bytes.maketrans(b'\x4F\x4A\x5A\xBB', b'\x5A\xAD\xBD\x4F')
msg295336 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-06-07 12:22
The cp500 codec in Python is generated from the table ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP500.TXT .

There are several EBCDIC code pages. EBCDIC-compatible encodings supported in Python are: cp037, cp273, cp424, cp500, cp875, cp1026 and cp1140. Three of them, cp037, cp424 and cp1140, encode '!' to b'\x5A' and '|' to b'\x4F'.
msg295352 - (view) Author: Vladimir Filippov (Vladimir Filippov) Date: 2017-06-07 15:45
According to ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP037.TXT symbols [ and ] have other codes (instead of 0xAD and 0xBD):
0xBA	0x005B	#LEFT SQUARE BRACKET
0xBB	0x005D	#RIGHT SQUARE BRACKET

Looks like ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP500.TXT was created based on https://www.ibm.com/support/knowledgecenter/SSZJPZ_11.3.0/com.ibm.swg.im.iis.ds.parjob.adref.doc/topics/r_deeadvrf_ASCII_to_EBCDIC.html
But this information "This translation is not bidirectional. Some EBCDIC characters cannot be translated to ASCII and some conversion irregularities exist in the table. For more information, see Conversion table irregularities." was ignored. Additional, this line from CP500.TXT:
0xBB	0x007C	#VERTICAL LINE
haven't any source in IBM's table.

Example from z/OS mainframe:
-------------------
bash-4.3$ iconv -f 819 -t 1047 -T ascii.txt > ebcdic.txt
bash-4.3$ ls -T *.txt
t ISO8859-1   T=on  ascii.txt
t IBM-1047    T=on  ebcdic.txt
bash-4.3$ cat ascii.txt
![]|bash-4.3$ od -h ascii.txt
0000000000    21  5B  5D  7C
0000000004
bash-4.3$ cat ebcdic.txt
![]|bash-4.3$ od -h ebcdic.txt
0000000000    5A  AD  BD  4F
0000000004
-------------------
History
Date User Action Args
2022-04-11 14:58:47adminsetgithub: 74771
2017-06-07 15:45:17Vladimir Filippovsetstatus: pending -> open

messages: + msg295352
2017-06-07 12:31:40serhiy.storchakasetstatus: open -> pending
resolution: not a bug
2017-06-07 12:22:00serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg295336
2017-06-07 11:41:40Vladimir Filippovcreate