http://bugs.python.org/review/14874/diff/4980/Objects/unicodeobject.c
File Objects/unicodeobject.c (right):
http://bugs.python.org/review/14874/diff/4980/Objects/unicodeobject.c#newcode...
Objects/unicodeobject.c:7722: if (!PyUnicode_Check(string) ||
!PyUnicode_GET_LENGTH(string)) {
On 2012/06/16 18:44:00, AntoinePitrou wrote:
> Perhaps this is relaxing the length requirement a bit too much?
The main reason for this weakening -- 257-character strings (with 257th
character U+FFFE added to widen a string to UCS2). Below in the code the
hardcoded 256-char limit replaced by a variable length. With these changes, the
code works for strings of any length. Unmapped characters are simply ignored
(lines 7801-7803). A string shorter than 256 characters means the same encoding
as a string filled to 256 length by U+FFFE character.
http://bugs.python.org/review/14874/diff/4980/Tools/unicode/gencodec.py
File Tools/unicode/gencodec.py (right):
http://bugs.python.org/review/14874/diff/4980/Tools/unicode/gencodec.py#newco...
Tools/unicode/gencodec.py:105: if not isinstance(enc, tuple) and enc < 256:
On 2012/06/16 18:44:00, AntoinePitrou wrote:
> Why this change?
gencodec.py is too long has not been used and updated. Now it just does not work
for many character mappings in Python3. In Python3 you cannot compare the tuples
and integers. parsecodes() can return a number or a tuple (for multibyte
encoding). Without this change the script crashes on
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/APPLE/GURMUKHI.TXT for example.
The script contains a number of other errors, which do not allow its use for all
character mappings, I've corrected only the most necessary. Full fixing of
gencodec.py -- this is a separate issue.
Issue 14874: Faster charmap decoding
(Closed)
Created 1 year ago by storchaka
Modified 11 months, 1 week ago
Reviewers: AntoinePitrou
Base URL: None
Comments: 4