Issue20046
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2013-12-21 19:20 by serhiy.storchaka, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
locale_optimize_aliases.patch | serhiy.storchaka, 2013-12-21 19:20 | review | ||
locale_optimize_aliases_2.patch | serhiy.storchaka, 2013-12-26 20:19 | review | ||
locale_optimize_aliases_3.patch | serhiy.storchaka, 2013-12-26 21:43 | review | ||
locale_optimize_aliases_4.patch | serhiy.storchaka, 2013-12-26 22:19 | review |
Messages (13) | |||
---|---|---|---|
msg206769 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2013-12-21 19:20 | |
Proposed patch removes over 400 entities from locale alias tables. They are redundant because they can be calculated on fly. Also it enables utf8 aliases. Now this adds not hundreds of redundant aliases, but only 8 new locales: + 'be_bg.utf8': 'bg_BG.UTF-8', + 'c.utf8': 'en_US.UTF-8', + 'en_dl.utf8': 'en_DL.UTF-8', + 'en_zw.utf8': 'en_ZS.UTF-8', + 'pa_pk.utf8': 'pa_PK.UTF-8', + 'sr_yu.utf8': 'sr_RS.UTF-8', + 'te_in.utf8': 'te_IN.UTF-8', + 'zh_sg.utf8': 'zh_SG.UTF-8', |
|||
msg206770 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2013-12-21 19:33 | |
Example. 'br_fr': 'br_FR.ISO8859-1', - 'br_fr.iso88591': 'br_FR.ISO8859-1', - 'br_fr.iso885914': 'br_FR.ISO8859-14', - 'br_fr.iso885915': 'br_FR.ISO8859-15', - 'br_fr.iso885915@euro': 'br_FR.ISO8859-15', - 'br_fr.utf8@euro': 'br_FR.UTF-8', - 'br_fr@euro': 'br_FR.ISO8859-15', Only one of 7 br_fr entities are left. For br_fr.iso88591, br_fr.iso885914 and br_fr.iso885915 just replaced encoding of base br_fr locale. For br_fr.iso885915@euro and br_fr.utf8@euro the @euro modifier is dropped because ISO8859-15 and UTF-8 already contains the euro character. For br_fr@euro default ISO8859-1 encoding replaced to ISO8859-15 and the @euro modifier is dropped. So now the table contains only base entities which map lang_country to lang_country.encoding and special cases for deprecated and obscure aliases. |
|||
msg206787 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2013-12-21 21:38 | |
On 21.12.2013 20:33, Serhiy Storchaka wrote: > > Serhiy Storchaka added the comment: > > Example. > > 'br_fr': 'br_FR.ISO8859-1', > - 'br_fr.iso88591': 'br_FR.ISO8859-1', > - 'br_fr.iso885914': 'br_FR.ISO8859-14', > - 'br_fr.iso885915': 'br_FR.ISO8859-15', > - 'br_fr.iso885915@euro': 'br_FR.ISO8859-15', > - 'br_fr.utf8@euro': 'br_FR.UTF-8', > - 'br_fr@euro': 'br_FR.ISO8859-15', > > Only one of 7 br_fr entities are left. For br_fr.iso88591, br_fr.iso885914 and br_fr.iso885915 just replaced encoding of base br_fr locale. For br_fr.iso885915@euro and br_fr.utf8@euro the @euro modifier is dropped because ISO8859-15 and UTF-8 already contains the euro character. For br_fr@euro default ISO8859-1 encoding replaced to ISO8859-15 and the @euro modifier is dropped. > > So now the table contains only base entities which map lang_country to lang_country.encoding and special cases for deprecated and obscure aliases. Looks like an interesting approach, but I'd like to think about it a day or two. Some notes: * the patch seems to include some unrelated changes, e.g. the devanagari fixes and a few new mappings * the optimize step is called twice for some reason - is this intended ? if yes, please add a comment why this is done * the patch would need some tests to make sure that the removed aliases indeed still map to the correct C locale strings Thanks, -- Marc-Andre Lemburg eGenix.com |
|||
msg206796 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2013-12-21 23:38 | |
> * the patch seems to include some unrelated changes, e.g. the > devanagari fixes and a few new mappings May be. In any case I have added issue20027 as dependency. New mappings were added when enable UTF-8 locales in makelocalealias.py (I can split this in separate issue). > * the optimize step is called twice for some reason - is this > intended ? if yes, please add a comment why this is done Actually we should call it in a loop while the size of the table is decreased. > * the patch would need some tests to make sure that the removed > aliases indeed still map to the correct C locale strings Currently the makelocalealias.py is such manual test. test_locale contains multiple tests for different locales. I'll add several new cases. |
|||
msg206932 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2013-12-26 00:36 | |
I thought some more about this approach. I'm +1 on it. The locale lookup is not time critical, so the table optimization makes sense. Nice idea, Serhiy ! On 22.12.2013 00:38, Serhiy Storchaka wrote: > > Serhiy Storchaka added the comment: > >> * the patch seems to include some unrelated changes, e.g. the >> devanagari fixes and a few new mappings > > May be. In any case I have added issue20027 as dependency. New mappings were > added when enable UTF-8 locales in makelocalealias.py (I can split this in > separate issue). I think it's better to apply the patches separately, if that's possible. >> * the optimize step is called twice for some reason - is this >> intended ? if yes, please add a comment why this is done > > Actually we should call it in a loop while the size of the table is decreased. Ok, that makes sense. Still, please add a comment on why this is necessary. >> * the patch would need some tests to make sure that the removed >> aliases indeed still map to the correct C locale strings > > Currently the makelocalealias.py is such manual test. test_locale contains > multiple tests for different locales. I'll add several new cases. Great. Thanks, -- Marc-Andre Lemburg eGenix.com |
|||
msg206959 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2013-12-26 20:19 | |
Here is updated patch. It doesn't include any additions to locale alias table (including devanagari). Added several tests cases (many other test cases for removed aliases already exist). optimize() is called only once, looks as second run has no effect until UTF-8 aliases are enabled. |
|||
msg206964 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2013-12-26 21:09 | |
On 26.12.2013 21:19, Serhiy Storchaka wrote: > > Here is updated patch. It doesn't include any additions to locale alias table (including devanagari). Added several tests cases (many other test cases for removed aliases already exist). optimize() is called only once, looks as second run has no effect until UTF-8 aliases are enabled. Looks good. Could you add a test for the optimization function ? It should make sure that a complete set of now removed locale names are properly optimized away, e.g. 'nl_nl': 'nl_NL.ISO8859-1', - 'nl_nl.88591': 'nl_NL.ISO8859-1', - 'nl_nl.iso88591': 'nl_NL.ISO8859-1', - 'nl_nl.iso885915': 'nl_NL.ISO8859-15', - 'nl_nl.iso885915@euro': 'nl_NL.ISO8859-15', - 'nl_nl.utf8@euro': 'nl_NL.UTF-8', - 'nl_nl@euro': 'nl_NL.ISO8859-15', and 'ja_jp': 'ja_JP.eucJP', - 'ja_jp.ajec': 'ja_JP.eucJP', 'ja_jp.euc': 'ja_JP.eucJP', - 'ja_jp.eucjp': 'ja_JP.eucJP', - 'ja_jp.iso-2022-jp': 'ja_JP.JIS7', - 'ja_jp.iso2022jp': 'ja_JP.JIS7', - 'ja_jp.jis': 'ja_JP.JIS7', - 'ja_jp.jis7': 'ja_JP.JIS7', 'ja_jp.mscode': 'ja_JP.SJIS', 'ja_jp.pck': 'ja_JP.SJIS', - 'ja_jp.sjis': 'ja_JP.SJIS', - 'ja_jp.ujis': 'ja_JP.eucJP', This should then cover all the Latin-1 encodings and also an Asian one. Thanks, -- Marc-Andre Lemburg eGenix.com |
|||
msg206966 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2013-12-26 21:43 | |
> Could you add a test for the optimization function ? I have no ideas. The optimization function is a part of the makelocalealias.py which ran manually very rarely (last time 3.5 year ago). It isn't exposed outside the script and I'm not sure that it worths the complication of the testing. > It should make sure that a complete set of now removed locale > names are properly optimized away, e.g. > > 'nl_nl': 'nl_NL.ISO8859-1', > - 'nl_nl.88591': 'nl_NL.ISO8859-1', > - 'nl_nl.iso88591': 'nl_NL.ISO8859-1', > - 'nl_nl.iso885915': 'nl_NL.ISO8859-15', > - 'nl_nl.iso885915@euro': 'nl_NL.ISO8859-15', > - 'nl_nl.utf8@euro': 'nl_NL.UTF-8', > - 'nl_nl@euro': 'nl_NL.ISO8859-15', These cases are covered by test_english and test_euro_modifier. > 'ja_jp': 'ja_JP.eucJP', > - 'ja_jp.ajec': 'ja_JP.eucJP', > 'ja_jp.euc': 'ja_JP.eucJP', > - 'ja_jp.eucjp': 'ja_JP.eucJP', > - 'ja_jp.iso-2022-jp': 'ja_JP.JIS7', > - 'ja_jp.iso2022jp': 'ja_JP.JIS7', > - 'ja_jp.jis': 'ja_JP.JIS7', > - 'ja_jp.jis7': 'ja_JP.JIS7', > 'ja_jp.mscode': 'ja_JP.SJIS', > 'ja_jp.pck': 'ja_JP.SJIS', > - 'ja_jp.sjis': 'ja_JP.SJIS', > - 'ja_jp.ujis': 'ja_JP.eucJP', Here is a test which includes tests for japanese cases end tests for the euc encoding (it maps to different encodings depending on language). |
|||
msg206967 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2013-12-26 21:47 | |
On 26.12.2013 22:43, Serhiy Storchaka wrote: > > Serhiy Storchaka added the comment: > >> Could you add a test for the optimization function ? > > I have no ideas. The optimization function is a part of the makelocalealias.py > which ran manually very rarely (last time 3.5 year ago). It isn't exposed > outside the script and I'm not sure that it worths the complication of the > testing. I probably wasn't clear: I meant some tests that show that the alias definitions (on the left) from the X11 file are actually mapped to the correct alias locales (on the right). >> It should make sure that a complete set of now removed locale >> names are properly optimized away, e.g. >> >> 'nl_nl': 'nl_NL.ISO8859-1', >> - 'nl_nl.88591': 'nl_NL.ISO8859-1', >> - 'nl_nl.iso88591': 'nl_NL.ISO8859-1', >> - 'nl_nl.iso885915': 'nl_NL.ISO8859-15', >> - 'nl_nl.iso885915@euro': 'nl_NL.ISO8859-15', >> - 'nl_nl.utf8@euro': 'nl_NL.UTF-8', >> - 'nl_nl@euro': 'nl_NL.ISO8859-15', > > These cases are covered by test_english and test_euro_modifier. Ok. >> 'ja_jp': 'ja_JP.eucJP', >> - 'ja_jp.ajec': 'ja_JP.eucJP', >> 'ja_jp.euc': 'ja_JP.eucJP', >> - 'ja_jp.eucjp': 'ja_JP.eucJP', >> - 'ja_jp.iso-2022-jp': 'ja_JP.JIS7', >> - 'ja_jp.iso2022jp': 'ja_JP.JIS7', >> - 'ja_jp.jis': 'ja_JP.JIS7', >> - 'ja_jp.jis7': 'ja_JP.JIS7', >> 'ja_jp.mscode': 'ja_JP.SJIS', >> 'ja_jp.pck': 'ja_JP.SJIS', >> - 'ja_jp.sjis': 'ja_JP.SJIS', >> - 'ja_jp.ujis': 'ja_JP.eucJP', > > Here is a test which includes tests for japanese cases end tests for the euc > encoding (it maps to different encodings depending on language). Thanks. I think this is good enough now. |
|||
msg206970 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2013-12-26 22:19 | |
> I probably wasn't clear: I meant some tests that show that the > alias definitions (on the left) from the X11 file are actually mapped to the > correct alias locales (on the right). This is how the optimization function works. It updates alias table with the X11 file, and then removes the alias entities one by one and checks that the alias definition are mapped to to correct alias locales. Here is a patch which adds a self-check in the makelocalealias.py script. |
|||
msg206971 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2013-12-26 22:47 | |
On 26.12.2013 23:19, Serhiy Storchaka wrote: > > Serhiy Storchaka added the comment: > >> I probably wasn't clear: I meant some tests that show that the >> alias definitions (on the left) from the X11 file are actually mapped to the >> correct alias locales (on the right). > > This is how the optimization function works. It updates alias table with the > X11 file, and then removes the alias entities one by one and checks that the > alias definition are mapped to to correct alias locales. > > Here is a patch which adds a self-check in the makelocalealias.py script. Thanks, this makes that part of the implementation waterproof as well. Good to go, I guess. |
|||
msg206972 - (view) | Author: Roundup Robot (python-dev) | Date: 2013-12-26 22:57 | |
New changeset 63bc68d7f449 by Serhiy Storchaka in branch 'default': Issue #20046: Locale alias table no longer contains entities which can be http://hg.python.org/cpython/rev/63bc68d7f449 |
|||
msg206973 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2013-12-26 23:00 | |
Thank you Marc-Andre for your reviews. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:55 | admin | set | github: 64245 |
2013-12-26 23:00:11 | serhiy.storchaka | set | status: open -> closed messages: + msg206973 assignee: serhiy.storchaka resolution: fixed stage: patch review -> resolved |
2013-12-26 22:57:05 | python-dev | set | nosy:
+ python-dev messages: + msg206972 |
2013-12-26 22:47:30 | lemburg | set | messages: + msg206971 |
2013-12-26 22:19:51 | serhiy.storchaka | set | files:
+ locale_optimize_aliases_4.patch messages: + msg206970 |
2013-12-26 21:47:36 | lemburg | set | messages: + msg206967 |
2013-12-26 21:43:12 | serhiy.storchaka | set | files:
+ locale_optimize_aliases_3.patch messages: + msg206966 |
2013-12-26 21:09:16 | lemburg | set | messages: + msg206964 |
2013-12-26 20:19:14 | serhiy.storchaka | set | files:
+ locale_optimize_aliases_2.patch messages: + msg206959 versions: + Python 3.4 |
2013-12-26 00:36:03 | lemburg | set | messages: + msg206932 |
2013-12-21 23:38:34 | serhiy.storchaka | set | messages: + msg206796 |
2013-12-21 21:38:44 | lemburg | set | messages: + msg206787 |
2013-12-21 19:33:26 | serhiy.storchaka | set | messages: + msg206770 |
2013-12-21 19:22:08 | serhiy.storchaka | set | dependencies: + Fixed support for Indian locales, Fix makelocalealias.py for Python 3 |
2013-12-21 19:20:57 | serhiy.storchaka | create |