Message291207
hz is a Simplified Chinese codec, available in Python since around 2004.
However, hz encoder has a serious bug, it forgets to escape ~
>>> 'hi~'.encode('hz')
b'hi~' # the correct output should be b'hi~~'
As a result, we can't finish a roundtrip:
>>> b'hi~'.decode('hz')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'hz' codec can't decode byte 0x7e in position 2: incomplete multibyte
In these years, no one has reported this bug, so I think it's pretty safe to remove hz codec.
FYI:
HZ codec is a 7-bit wrapper for GB2312, was formerly commonly used in email and USENET postings. It was designed in 1989 by Fung Fung Lee, and subsequently codified in 1995 into RFC 1843.
It was popular in USENET networks, which in the late 1980s and early 1990s, generally did not allow transmission of 8-bit characters or escape characters.
https://en.wikipedia.org/wiki/HZ_(character_encoding)
Does other languages have hz codec?
Java 8: no [1]
.NET: yes [2]
PHP: yes [3]
Perl: yes [4]
[1] http://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html
[2] https://msdn.microsoft.com/en-us/library/system.text.encoding(v=vs.110).aspx
[3] http://php.net/manual/en/mbstring.supported-encodings.php
[4] http://perldoc.perl.org/Encode/CN.html |
|
Date |
User |
Action |
Args |
2017-04-06 03:42:17 | malin | set | recipients:
+ malin, vstinner, ezio.melotti, xiang.zhang |
2017-04-06 03:42:17 | malin | set | messageid: <1491450137.17.0.519356226961.issue30003@psf.upfronthosting.co.za> |
2017-04-06 03:42:17 | malin | link | issue30003 messages |
2017-04-06 03:42:16 | malin | create | |
|