msg319569 - (view) |
Author: Prawin Phichitnitikorn (winvinc) |
Date: 2018-06-15 00:39 |
This Error "
Current thread 0x0000238c (most recent call first): Fatal Python error: Py_Initialize: can’t initialize sys standard streams LookupError: unknown encoding: 874"
is cause by mapping of 874 encodling is missing in encodings\aliases.py
|
msg319570 - (view) |
Author: Steven D'Aprano (steven.daprano) * |
Date: 2018-06-15 01:09 |
Please don't post screenshots of text, they make it difficult for the blind and visually impaired to contribute. Instead, please copy and paste the error message into the body of your bug report. (Which I see you have done, which makes the screenshot unnecessary.)
Just reporting the error message alone is not very useful, we also should see the context of what you were doing when the error occurred.
|
msg319589 - (view) |
Author: Ronald Oussoren (ronaldoussoren) * |
Date: 2018-06-15 07:12 |
@stephen: Lib/encoding/aliases.py contains aliases for a (largish) number of encoding names, including both "cpXXXX" and "XXXX" for most windows code pages. For code page 874 only the name "cp874" can be used and not "874", which apparently causes problems.
@Prawin: have you added an alias to aliases.py to check if adding an alias would fix the problem you're having?
|
msg319590 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2018-06-15 07:53 |
It seems like the following code pages have a Python codec (Lib/encoding/cpXXX.py) but lack an alias in Lib/encodings/aliases.py:
[720, 737, 856, 874, 875, 1006, 65001]
Is someone volunteer to write a pull request for that? It should be easy.
Example of a correct alias in Lib/encodings/aliases.py:
# cp1252 codec
'1252' : 'cp1252',
'windows_1252' : 'cp1252',
|
msg319611 - (view) |
Author: Karthikeyan Singaravelan (xtreak) * |
Date: 2018-06-15 12:33 |
I have added the aliases as per comment by @vstinner https://bugs.python.org/msg319590 . I have used https://docs.python.org/3.8/library/codecs.html#standard-encodings as a reference to see if there are any additional aliases to add with respect to the second column. I am a beginner in contributing to cpython and hence please let me know if I have missed something or any way to test this.
PR : https://github.com/python/cpython/pull/7705
Thanks
|
msg319612 - (view) |
Author: Ronald Oussoren (ronaldoussoren) * |
Date: 2018-06-15 12:44 |
Could you also add a documentation update and a news entry?
The section on standard encodings mentions aliases for standard encodings, and IMHO the new aliases should be added to that page.
Creating a new entry is described here: https://devguide.python.org/committing/?highlight=blurb#what-s-new-and-news-entries
|
msg319613 - (view) |
Author: Karthikeyan Singaravelan (xtreak) * |
Date: 2018-06-15 13:12 |
Thanks @ronaldoussoren for the links. I have added an entry using blurb tool and updated the docs at Doc/library/codecs.rst with relevant aliases.
Thanks
|
msg319617 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2018-06-15 14:30 |
Why only these code pages? There are other cpXXXX encodings that don't have the XXXX alias.
Maybe add a logic in encodings.search_function() that will map XXXX to cpXXXX if it is all digits? Maybe even map ibmXXXX and windows_XXXX to cpXXXX, but this will create false aliases like ibm1252 and windows_437.
|
msg319716 - (view) |
Author: Karthikeyan Singaravelan (xtreak) * |
Date: 2018-06-16 05:14 |
There are certain encodings as I went through the file Lib/encodings/aliases.py where there are all digit items that doesn't correspond to cpXXXX sequence. I think the search function is used not only for encodings that start with 'cp' and thus adding the logic might result in checks for extra cases.
Sample cases :
'936' : 'gbk'
'8859' : 'latin_1'
'646' : 'ascii'
I also have limited knowledge on working through encodings/__init__.py so correct me if I am wrong on the above.
Thanks.
|
msg319717 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2018-06-16 05:20 |
Of course entries in the alias table should have a precedence.
|
msg319725 - (view) |
Author: Karthikeyan Singaravelan (xtreak) * |
Date: 2018-06-16 07:37 |
Thanks @serhiy.storchaka . I looked into the code and it seems the resolution is done in `search_function` at Lib/encodings/__init__.py . It seems that encoding is normalized using some logic and then we use the normalized encoding to check against aliases which is the dictionary where I have added the alias. If it's not found then '.' is replaced with '_' to check again. I hope this is the place where I need to check if aliased_encoding is None after both attempts and norm_encoding is all digits then prepend "cp" to norm_encoding to check again against `aliases` dictionary. Unfortunately, print and pdb doesn't work inside the function and I don't know how to test this change or write test cases for the same.
Any pointers will be highly helpful.
Thanks
|
msg319728 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2018-06-16 07:51 |
It is easy to test it. Encoding/decoding with '874' should give the same result as with 'cp874'.
|
msg319733 - (view) |
Author: Karthikeyan Singaravelan (xtreak) * |
Date: 2018-06-16 12:01 |
I am able to verify the newly added aliases using the below assert statement
assert codecs.encode('a', '874') == codecs.encode('a', 'cp874')
I am struck on the part where it could be patched in the search_function and I hope this is the approach @serhiy.storchaka was making. After the usual logic I am checking if the aliased_encoding is None and if the normalized_encoding is all digits then I am prepending 'cp' in front and calling search_function again so that cases like '936' first look at the table which has higher precedence and then for other cases even though an entry is not present it returns 'cpXXXX' encoder.
I have tested it by removing newly added '874' from aliases.py so that instead of an error 'cp874' is returned. Since in the next call the case of encoding being digits is not valid due to prepending 'cp' there will be no error due to infinite recursion for wrong ones.
Thanks
|
msg319877 - (view) |
Author: Ronald Oussoren (ronaldoussoren) * |
Date: 2018-06-18 11:19 |
I'm not convinced that adding code to search_function is the right solution for this.
BTW. I'm also not sure yet why this error happens, does windows return a codepage number as the preferred encoding when the io module looks for one? If so, wouldn't it be better to correct the encoding name there (from the codepage number to a string with a "cp" prefix)?
|
msg319881 - (view) |
Author: Karthikeyan Singaravelan (xtreak) * |
Date: 2018-06-18 13:16 |
I think if we can get a confirmation from @Prawin that adding an alias fixed the issue or a minimal test case then it will be helpful. The minimal I can come up with is as below :
import codecs
# Fails without alias being added other cases like 1252 pass because of alias
assert codecs.encode('a', '874') == codecs.encode('a', 'cp874')
# Below assertion passes after search_function patch though alias is not added since I prepend cp in search_function
assert codecs.encode('a', '874') == codecs.encode('a', 'cp874')
Thanks
|
msg319883 - (view) |
Author: Ronald Oussoren (ronaldoussoren) * |
Date: 2018-06-18 13:37 |
Confirmation that the patch actually fixes the problem would be nice, but I'd still like to understand why Python tries to use an encoding with the name "874" as this might lead to a nicer solution to the problem.
BTW. There is some discussion on this issue on the python-ideas mailinglist.
|
msg319926 - (view) |
Author: Prawin Phichitnitikorn (winvinc) |
Date: 2018-06-19 03:57 |
Sorry for late Reply,
But for me I'm resolve by adding
# cp874 codec
'874' : 'cp874',
to alias.py file
|
msg319927 - (view) |
Author: Karthikeyan Singaravelan (xtreak) * |
Date: 2018-06-19 05:00 |
Thanks @prawin for the confirmation. There is a mailing list discussion at https://groups.google.com/forum/#!topic/python-ideas/Ny1RN9wY0cI and it seems this is related to Thai language locale. Feel free to add in if you have any more input on if it's reproducible in maybe other machines of Thai locale or so on. There is a PR that adds alias along with other missing items but I will wait for others to chime in to see if there is a better solution to fix this.
Thanks.
|
msg319949 - (view) |
Author: Ronald Oussoren (ronaldoussoren) * |
Date: 2018-06-19 10:21 |
In particular, we're interested in the following information:
* What OS is installed on your machine?
* What locale (country/language) is configured?
* What does "import locale; print(locale._getdefaultlocale())" print?
|
msg319950 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2018-06-19 10:25 |
* Does you use a regular Python interpreter or embedded in other program?
|
msg319976 - (view) |
Author: Ronald Oussoren (ronaldoussoren) * |
Date: 2018-06-19 15:01 |
@Serhiy: The screenshot suggests that this is regular python install.
|
msg319978 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2018-06-19 15:17 |
Prawin Phichitnitikorn: "But for me I'm resolve by adding (...)"
Ok, so can you please give the value of:
* sys.stdin.encoding
* sys.stdout.encoding
* sys.stderr.encoding
* os.device_encoding(0)
* os.device_encoding(1)
* os.device_encoding(2)
* locale.getpreferredencoding(False)
Maybe also the .errors attribute of sys.stdin, sys.stdout and sys.stderr.
|
msg320424 - (view) |
Author: Inada Naoki (methane) * |
Date: 2018-06-25 16:06 |
When I grepped "Unknown encoding 874", I see some people got trouble from anaconda installation.
I don't know about what anaconda setup does, but it will not happen on normal CPython.
We use UTF-8 by default on Windows, for fsencoding and console encoding, from Python 3.6.
|
msg320425 - (view) |
Author: Inada Naoki (methane) * |
Date: 2018-06-25 16:19 |
I grepped PYTHONIOENCODING and found this line.
https://github.com/conda/conda/blob/082fe8fd7458ecd9dd7547749039f4b1f06d76db/conda/activate.py#L726
|
msg320426 - (view) |
Author: Inada Naoki (methane) * |
Date: 2018-06-25 16:37 |
I found original pull request and issue report
https://github.com/conda/conda/pull/4558
https://github.com/ContinuumIO/anaconda-issues/issues/1410
|
msg320429 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2018-06-25 16:50 |
Thank you Inada-san! Seems this issue can be closed as a third party issue.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:59:01 | admin | set | github: 78046 |
2018-07-12 13:50:50 | vstinner | set | status: open -> closed resolution: third party stage: patch review -> resolved |
2018-06-25 16:50:54 | serhiy.storchaka | set | messages:
+ msg320429 |
2018-06-25 16:37:33 | methane | set | messages:
+ msg320426 |
2018-06-25 16:19:21 | methane | set | messages:
+ msg320425 |
2018-06-25 16:06:23 | methane | set | messages:
+ msg320424 |
2018-06-25 15:59:09 | methane | set | nosy:
+ methane
|
2018-06-19 15:17:06 | vstinner | set | messages:
+ msg319978 |
2018-06-19 15:01:25 | ronaldoussoren | set | messages:
+ msg319976 |
2018-06-19 10:25:28 | serhiy.storchaka | set | messages:
+ msg319950 |
2018-06-19 10:21:29 | ronaldoussoren | set | messages:
+ msg319949 |
2018-06-19 05:00:47 | xtreak | set | messages:
+ msg319927 |
2018-06-19 03:57:25 | winvinc | set | messages:
+ msg319926 |
2018-06-18 13:37:56 | ronaldoussoren | set | messages:
+ msg319883 |
2018-06-18 13:16:22 | xtreak | set | messages:
+ msg319881 |
2018-06-18 11:19:40 | ronaldoussoren | set | messages:
+ msg319877 |
2018-06-16 12:01:24 | xtreak | set | files:
+ 33865.patch
messages:
+ msg319733 |
2018-06-16 07:51:58 | serhiy.storchaka | set | messages:
+ msg319728 |
2018-06-16 07:37:51 | xtreak | set | messages:
+ msg319725 |
2018-06-16 05:20:34 | serhiy.storchaka | set | messages:
+ msg319717 |
2018-06-16 05:14:55 | xtreak | set | messages:
+ msg319716 |
2018-06-15 14:30:05 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg319617
|
2018-06-15 13:12:20 | xtreak | set | messages:
+ msg319613 |
2018-06-15 12:44:55 | ronaldoussoren | set | messages:
+ msg319612 |
2018-06-15 12:33:09 | xtreak | set | nosy:
+ xtreak messages:
+ msg319611
|
2018-06-15 12:28:43 | xtreak | set | keywords:
+ patch stage: patch review pull_requests:
+ pull_request7321 |
2018-06-15 07:53:42 | vstinner | set | keywords:
+ easy
messages:
+ msg319590 title: unknown encoding: 874 -> [EASY] Missing code page aliases: "unknown encoding: 874" |
2018-06-15 07:12:37 | ronaldoussoren | set | nosy:
+ ronaldoussoren messages:
+ msg319589
|
2018-06-15 01:09:00 | steven.daprano | set | nosy:
+ steven.daprano messages:
+ msg319570
|
2018-06-15 00:39:11 | winvinc | create | |