classification
Title: Improve encoding alias handling in locale coercion tests
Type: Stage: resolved
Components: Tests Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: kulikjak, miss-islington, vstinner
Priority: normal Keywords: patch

Created on 2019-06-19 07:57 by kulikjak, last changed 2019-07-02 11:24 by vstinner. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 11195 closed kulikjak, 2019-06-19 07:57
PR 14285 closed kulikjak, 2019-06-21 12:55
PR 14443 closed kulikjak, 2019-06-28 12:00
PR 14447 merged kulikjak, 2019-06-28 14:44
PR 14449 merged kulikjak, 2019-06-28 14:59
PR 14552 merged miss-islington, 2019-07-02 10:48
Messages (9)
msg346025 - (view) Author: Jakub Kulik (kulikjak) * Date: 2019-06-19 07:57
Locale coercion tests on Solaris are failing because 646 ASCII alias is not recognized. Its addition into the _handle_output_variations function fixes this problem.

This was changed/fixed in Python 3.8 and later, where aliases are correctly translated to their canonical Python codec name so no patch is needed there.
msg346463 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-25 00:55
test_c_locale_coerce should use "codecs.lookup(encoding).name" to get the normalized name of an encoding, rather than fragile:

        data = data.replace(b"ANSI_X3.4-1968", b"ascii")
        data = data.replace(b"US-ASCII", b"ascii")

the proposed pattern is even more dangerous:

         data = data.replace(b"646", b"ascii")

I'm not sure where encodings should be normalized. Maybe around _check_child_encoding_details().

For the PR, please write it for the master branch.
msg346592 - (view) Author: Jakub Kulik (kulikjak) * Date: 2019-06-26 08:09
I just added that in the way it was already there but I see why the current solution is not the best. Also I wanted to push this into 3.7 only as this problem is not present in 3.8 (as discussed in the PR 11195 opened incorrectly against the master).

Just to be sure: what you propose is to rewrite current replaces to use "codecs.lookup(encoding).name" instead and then push it into the master?
Ok, I will look into it.
msg346602 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-26 11:44
>  what you propose is to rewrite current replaces to use "codecs.lookup(encoding).name" instead and then push it into the master?

I suggest to remove the code which does the .replace(), but instead normalize the encoding when checking for the expected encoding (near .assertEqual()). I still see the .replace() code in master, so yeah, the code should first be changed in master:

    @staticmethod
    def _handle_output_variations(data):
        """Adjust the output to handle platform specific idiosyncrasies

        * Some platforms report ASCII as ANSI_X3.4-1968
        * Some platforms report ASCII as US-ASCII
        * Some platforms report UTF-8 instead of utf-8
        """
        data = data.replace(b"ANSI_X3.4-1968", b"ascii")
        data = data.replace(b"US-ASCII", b"ascii")
        data = data.lower()
        return data
msg346821 - (view) Author: Jakub Kulik (kulikjak) * Date: 2019-06-28 14:41
Python 3.8+ encodings are always normalized and thus no output variations handling is necessary (the code is no longer necessary).

Python 3.7 (and possibly lower) can have variations in encodings - that should be fixed with codecs.lookup functions.
msg347129 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-07-02 10:46
New changeset c53173aa00689aa1be17ce5406289718f6b30532 by Victor Stinner (Jakub Kulík) in branch '3.7':
bpo-37335: Fix test_c_locale_coercion to handle any ASCII alias (GH-14449)
https://github.com/python/cpython/commit/c53173aa00689aa1be17ce5406289718f6b30532
msg347130 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-07-02 10:48
New changeset 61bf97e91620e020939d57a36918ab22579920ff by Victor Stinner (Jakub Kulík) in branch 'master':
bpo-37335, test_c_locale_coercion: Remove unnecessary code (GH-14447)
https://github.com/python/cpython/commit/61bf97e91620e020939d57a36918ab22579920ff
msg347133 - (view) Author: miss-islington (miss-islington) Date: 2019-07-02 11:18
New changeset 518dc94e423398f7b0b5fd7bd5b84f138618e68e by Miss Islington (bot) in branch '3.8':
bpo-37335, test_c_locale_coercion: Remove unnecessary code (GH-14447)
https://github.com/python/cpython/commit/518dc94e423398f7b0b5fd7bd5b84f138618e68e
msg347134 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-07-02 11:24
Thanks Jakub Kulik. test_c_locale_coercion should pass again on 3.7, 3.8 and master branches on Solaris.
History
Date User Action Args
2019-07-02 11:24:31vstinnersetmessages: + msg347134
2019-07-02 11:21:34kulikjaksetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2019-07-02 11:18:43miss-islingtonsetnosy: + miss-islington
messages: + msg347133
2019-07-02 10:48:50miss-islingtonsetpull_requests: + pull_request14370
2019-07-02 10:48:31vstinnersetmessages: + msg347130
2019-07-02 10:46:04vstinnersetmessages: + msg347129
2019-06-28 14:59:54kulikjaksetpull_requests: + pull_request14265
2019-06-28 14:44:14kulikjaksetpull_requests: + pull_request14263
2019-06-28 14:41:23kulikjaksetversions: + Python 3.8, Python 3.9
messages: + msg346821
title: Fix unexpected ASCII aliases in locale coercion tests. -> Improve encoding alias handling in locale coercion tests
2019-06-28 12:00:51kulikjaksetpull_requests: + pull_request14259
2019-06-28 11:51:45kulikjaksettitle: Add 646 ASCII alias to locale coercion tests. -> Fix unexpected ASCII aliases in locale coercion tests.
2019-06-26 11:44:08vstinnersetmessages: + msg346602
2019-06-26 08:09:50kulikjaksetmessages: + msg346592
2019-06-25 00:55:12vstinnersetnosy: + vstinner
messages: + msg346463
2019-06-21 12:55:35kulikjaksetpull_requests: + pull_request14107
2019-06-19 07:57:51kulikjaksetkeywords: + patch
stage: patch review
pull_requests: + pull_request14063
2019-06-19 07:57:30kulikjakcreate