Author malin
Recipients ezio.melotti, malin, vstinner, xiang.zhang
Date 2017-04-06.03:42:16
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1491450137.17.0.519356226961.issue30003@psf.upfronthosting.co.za>
In-reply-to
Content
hz is a Simplified Chinese codec, available in Python since around 2004.

However, hz encoder has a serious bug, it forgets to escape ~
>>> 'hi~'.encode('hz')
b'hi~'    # the correct output should be b'hi~~'

As a result, we can't finish a roundtrip:
>>> b'hi~'.decode('hz')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'hz' codec can't decode byte 0x7e in position 2: incomplete multibyte

In these years, no one has reported this bug, so I think it's pretty safe to remove hz codec.

FYI:
HZ codec is a 7-bit wrapper for GB2312, was formerly commonly used in email and USENET postings. It was designed in 1989 by Fung Fung Lee, and subsequently codified in 1995 into RFC 1843.

It was popular in USENET networks, which in the late 1980s and early 1990s, generally did not allow transmission of 8-bit characters or escape characters.

https://en.wikipedia.org/wiki/HZ_(character_encoding)

Does other languages have hz codec?
Java 8: no [1]
.NET: yes [2]
PHP: yes [3]
Perl: yes [4]

[1] http://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html
[2] https://msdn.microsoft.com/en-us/library/system.text.encoding(v=vs.110).aspx
[3] http://php.net/manual/en/mbstring.supported-encodings.php
[4] http://perldoc.perl.org/Encode/CN.html
History
Date User Action Args
2017-04-06 03:42:17malinsetrecipients: + malin, vstinner, ezio.melotti, xiang.zhang
2017-04-06 03:42:17malinsetmessageid: <1491450137.17.0.519356226961.issue30003@psf.upfronthosting.co.za>
2017-04-06 03:42:17malinlinkissue30003 messages
2017-04-06 03:42:16malincreate