msg122549 - (view) |
Author: Alexander Belopolsky (belopolsky) * |
Date: 2010-11-27 20:29 |
$ ../../python.exe gencodec.py MAPPINGS/VENDORS/MISC/ build/
converting APL-ISO-IR-68.TXT to build/apl_iso_ir_68.py and build/apl_iso_ir_68.mapping
converting ATARIST.TXT to build/atarist.py and build/atarist.mapping
converting CP1006.TXT to build/cp1006.py and build/cp1006.mapping
converting CP424.TXT to build/cp424.py and build/cp424.mapping
Traceback (most recent call last):
File "gencodec.py", line 421, in <module>
convertdir(*sys.argv[1:])
File "gencodec.py", line 391, in convertdir
pymap(mappathname, map, dirprefix + codefile,name,comments)
File "gencodec.py", line 355, in pymap
code = codegen(name,map,encodingname,comments)
File "gencodec.py", line 268, in codegen
precisions=(4, 2))
File "gencodec.py", line 152, in python_mapdef_code
mappings = sorted(map.items())
TypeError: unorderable types: NoneType() < int()
It does appear to have been updated for 3.x:
$ python2.7 gencodec.py MAPPINGS/VENDORS/MISC/ build/
Traceback (most recent call last):
File "gencodec.py", line 35, in <module>
UNI_UNDEFINED = chr(0xFFFE)
ValueError: chr() arg not in range(256)
|
msg122559 - (view) |
Author: Alexander Belopolsky (belopolsky) * |
Date: 2010-11-27 21:15 |
Attached patch addresses the issue by using -1 instead of None for missing codes. Comparison of generated encoding files to those in Lib/encodings shows only whitespace changes except one which appears to be a change on the unicode.org side:
diff -b build/koi8_u.py ../../Lib/encodings/koi8_u.py
1c1
< """ Python Character Mapping Codec koi8_u generated from 'MAPPINGS/VENDORS/MISC/KOI8-U.TXT' with gencodec.py.
---
> """ Python Character Mapping Codec koi8_u generated from 'python-mappings/KOI8-U.TXT' with gencodec.py.
221c221
< '\u0491' # 0xAD -> CYRILLIC SMALL LETTER GHE WITH UPTURN
---
> '\u0491' # 0xAD -> CYRILLIC SMALL LETTER UKRAINIAN GHE WITH UPTURN
237c237
< '\u0490' # 0xBD -> CYRILLIC CAPITAL LETTER GHE WITH UPTURN
---
> '\u0490' # 0xBD -> CYRILLIC CAPITAL LETTER UKRAINIAN GHE WITH UPTURN
308d307
<
|
msg122565 - (view) |
Author: Marc-Andre Lemburg (lemburg) * |
Date: 2010-11-27 22:09 |
Alexander Belopolsky wrote:
>
> Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment:
>
> Attached patch addresses the issue by using -1 instead of None for missing codes. Comparison of generated encoding files to those in Lib/encodings shows only whitespace changes except one which appears to be a change on the unicode.org side:
Please use a global constant instead of the literal -1, e.g. MISSING_CODE.
Thanks.
> diff -b build/koi8_u.py ../../Lib/encodings/koi8_u.py
> 1c1
> < """ Python Character Mapping Codec koi8_u generated from 'MAPPINGS/VENDORS/MISC/KOI8-U.TXT' with gencodec.py.
> ---
>> """ Python Character Mapping Codec koi8_u generated from 'python-mappings/KOI8-U.TXT' with gencodec.py.
> 221c221
> < '\u0491' # 0xAD -> CYRILLIC SMALL LETTER GHE WITH UPTURN
> ---
>> '\u0491' # 0xAD -> CYRILLIC SMALL LETTER UKRAINIAN GHE WITH UPTURN
> 237c237
> < '\u0490' # 0xBD -> CYRILLIC CAPITAL LETTER GHE WITH UPTURN
> ---
>> '\u0490' # 0xBD -> CYRILLIC CAPITAL LETTER UKRAINIAN GHE WITH UPTURN
> 308d307
> <
That's just a comment and doesn't change the semantics of the codec.
|
msg122585 - (view) |
Author: Alexander Belopolsky (belopolsky) * |
Date: 2010-11-27 23:02 |
Attached patch uses MISSING_CODE as Mark suggested. There are still errors apparently because parsecodes() may return either an int or a tuple. I think only mac encodings are affected, so I would like to commit the current patch before tackling this issue.
$ ../../python.exe gencodec.py MAPPINGS/VENDORS/APPLE/ build/ mac_
converting ARABIC.TXT to build/mac_arabic.py and build/mac_arabic.mapping
converting CELTIC.TXT to build/mac_celtic.py and build/mac_celtic.mapping
converting CENTEURO.TXT to build/mac_centeuro.py and build/mac_centeuro.mapping
converting CHINSIMP.TXT to build/mac_chinsimp.py and build/mac_chinsimp.mapping
Traceback (most recent call last):
File "gencodec.py", line 424, in <module>
convertdir(*sys.argv[1:])
File "gencodec.py", line 394, in convertdir
pymap(mappathname, map, dirprefix + codefile,name,comments)
File "gencodec.py", line 358, in pymap
code = codegen(name,map,encodingname,comments)
File "gencodec.py", line 271, in codegen
precisions=(4, 2))
File "gencodec.py", line 155, in python_mapdef_code
mappings = sorted(map.items())
TypeError: unorderable types: tuple() < int()
|
msg122586 - (view) |
Author: Alexander Belopolsky (belopolsky) * |
Date: 2010-11-27 23:03 |
Please ignore Makefile changes in the patch.
|
msg122829 - (view) |
Author: Alexander Belopolsky (belopolsky) * |
Date: 2010-11-29 16:57 |
Martin,
I believe you were the last to update the unicode database. (See r85371.) Did you use python2.x to generate it or you have your own private copy of these tools?
I noticed that genwincodecs.bat refers to c:\python26\python in 2.7 branch and c:\python30\python in py3k. Could this be an indication that these tools are out of date?
What is the plan for maintaining these tools? Should fixes be done in 2.7 and 3.x be generated by 2to3? Or should fixes go to py3k and backported to 2.7 when they don't add new features?
|
msg122837 - (view) |
Author: Marc-Andre Lemburg (lemburg) * |
Date: 2010-11-29 18:21 |
gencodec.py is only rarely used, namely when adding new codecs based
on Unicode mapping files.
It is not run regularly on the files from ftp.unicode.org and only
updated on demand.
AFAIK, it was last used on Python2 and never on Python3, hence the
errors you find with it.
BTW: You appear to have a comma appended to the constant, that doesn't
belong there:
+# Placeholder for a missing codepoint
+MISSING_CODE = -1,
+
Perhaps that's causing the second error you are seeing.
|
msg122842 - (view) |
Author: Alexander Belopolsky (belopolsky) * |
Date: 2010-11-29 18:36 |
On Mon, Nov 29, 2010 at 1:21 PM, Marc-Andre Lemburg
<report@bugs.python.org> wrote:
..
> BTW: You appear to have a comma appended to the constant, that doesn't
> belong there:
>
> +# Placeholder for a missing codepoint
> +MISSING_CODE = -1,
> +
>
> Perhaps that's causing the second error you are seeing.
No, that comma was a left-over from the attempt to fix the
mac_chinsimp error. The trace that I reported was generated with
MISSING_CODE = -1. I am replacing the patch.
Is it ok to commit a partial fix? It may take longer to fix the mac error.
|
msg122843 - (view) |
Author: Marc-Andre Lemburg (lemburg) * |
Date: 2010-11-29 18:37 |
Alexander Belopolsky wrote:
>
> Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment:
>
> On Mon, Nov 29, 2010 at 1:21 PM, Marc-Andre Lemburg
> <report@bugs.python.org> wrote:
> ..
>> BTW: You appear to have a comma appended to the constant, that doesn't
>> belong there:
>>
>> +# Placeholder for a missing codepoint
>> +MISSING_CODE = -1,
>> +
>>
>> Perhaps that's causing the second error you are seeing.
>
> No, that comma was a left-over from the attempt to fix the
> mac_chinsimp error. The trace that I reported was generated with
> MISSING_CODE = -1. I am replacing the patch.
>
> Is it ok to commit a partial fix? It may take longer to fix the mac error.
Sure, we won't need that script anytime soon and if we do, we
can just as well use the Python2 version.
|
msg122850 - (view) |
Author: Alexander Belopolsky (belopolsky) * |
Date: 2010-11-29 18:52 |
On Mon, Nov 29, 2010 at 1:38 PM, Marc-Andre Lemburg
<report@bugs.python.org> wrote:
..
> Sure, we won't need that script anytime soon and if we do, we
> can just as well use the Python2 version.
That may not be true. I compared 2.7 and py3k versions and the later
has some new features:
* unidata_version changed from 5.2.0 to 6.0.0
* Unihan data is read from zip file
* added processing of DerivedCoreProperties
These changes don't affect gencodec.py, but it may be inconvenient to
run makeunicodedata.py and gencodec.py using different versions of
Python.
I'll check that all non-mac encodings are correctly generated before committing.
|
msg122858 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2010-11-29 19:48 |
> These changes don't affect gencodec.py, but it may be inconvenient to
> run makeunicodedata.py and gencodec.py using different versions of
> Python.
As MAL explains: these are completely unrelated, independent tools,
and gencodec isn't run more than once per decade (or so). I only ever
run makeunicodedata, and I have been using Python 3 to run it.
The mappings are not supposed to ever change once produced. In
particular, new versions of Unicode cannot affect them, since the
existing characters all map fine to existing code points, which will
not change their meaning per Unicode stability criteria.
|
msg122916 - (view) |
Author: Alexander Belopolsky (belopolsky) * |
Date: 2010-11-30 16:57 |
Committed in revision 86891. Keeping open to address Mac issue.
|
msg202543 - (view) |
Author: A.M. Kuchling (akuchling) * |
Date: 2013-11-10 18:24 |
For the Mac issue, we could just delete the mapping files before processing them. I've attached a patch that modifies the Makefile.
|
msg233902 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2015-01-13 05:57 |
Here is a new version of Kuchling’s patch. I restored some mapping files which do not give any errors (including the mac_turkish codec, which is actually documented), and removed both readme files.
|
msg406955 - (view) |
Author: Irit Katriel (iritkatriel) * |
Date: 2021-11-24 19:59 |
I don't think Martin's patch has been applied. Is it needed?
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:09 | admin | set | github: 54761 |
2021-11-24 21:35:33 | vstinner | set | nosy:
- vstinner
|
2021-11-24 19:59:51 | iritkatriel | set | nosy:
+ iritkatriel messages:
+ msg406955
|
2015-01-13 05:57:30 | martin.panter | set | files:
+ 10552-remove-apple-files-v2.txt versions:
+ Python 3.4 nosy:
+ martin.panter, vstinner
messages:
+ msg233902
components:
+ Unicode |
2014-12-31 16:22:37 | akuchling | set | nosy:
- akuchling
|
2014-06-29 23:08:51 | belopolsky | set | nosy:
+ ronaldoussoren, ned.deily, hynek
|
2014-06-29 23:07:44 | belopolsky | set | assignee: belopolsky -> |
2013-11-10 18:24:50 | akuchling | set | files:
+ 10552-remove-apple-files.txt nosy:
+ akuchling messages:
+ msg202543
|
2010-12-30 22:14:16 | georg.brandl | unlink | issue7962 dependencies |
2010-11-30 16:57:48 | belopolsky | set | nosy:
lemburg, loewis, belopolsky, ezio.melotti messages:
+ msg122916 priority: normal -> low assignee: belopolsky components:
+ macOS stage: commit review -> needs patch |
2010-11-29 20:22:31 | belopolsky | unlink | issue10575 dependencies |
2010-11-29 19:48:38 | loewis | set | messages:
+ msg122858 |
2010-11-29 18:52:32 | belopolsky | set | messages:
+ msg122850 |
2010-11-29 18:37:58 | lemburg | set | messages:
+ msg122843 |
2010-11-29 18:36:58 | belopolsky | set | files:
- issue10552a.diff |
2010-11-29 18:36:46 | belopolsky | set | files:
+ issue10552a.diff
messages:
+ msg122842 |
2010-11-29 18:21:55 | lemburg | set | messages:
+ msg122837 |
2010-11-29 16:57:45 | belopolsky | set | messages:
+ msg122829 |
2010-11-29 16:45:33 | belopolsky | link | issue10575 dependencies |
2010-11-27 23:03:04 | belopolsky | set | messages:
+ msg122586 |
2010-11-27 23:02:25 | belopolsky | set | files:
+ issue10552a.diff
messages:
+ msg122585 stage: commit review |
2010-11-27 22:16:02 | ezio.melotti | set | nosy:
+ ezio.melotti
|
2010-11-27 22:09:48 | lemburg | set | messages:
+ msg122565 |
2010-11-27 21:15:09 | belopolsky | set | files:
+ issue10552.diff
nosy:
+ loewis messages:
+ msg122559
keywords:
+ patch |
2010-11-27 20:31:17 | belopolsky | link | issue7962 dependencies |
2010-11-27 20:29:09 | belopolsky | create | |