Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve encoding alias handling in locale coercion tests #81516

Closed
kulikjak mannequin opened this issue Jun 19, 2019 · 9 comments
Closed

Improve encoding alias handling in locale coercion tests #81516

kulikjak mannequin opened this issue Jun 19, 2019 · 9 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes tests Tests in the Lib/test dir

Comments

@kulikjak
Copy link
Mannequin

kulikjak mannequin commented Jun 19, 2019

BPO 37335
Nosy @vstinner, @miss-islington, @kulikjak
PRs
  • bpo-37335: Add alias for ASCII encoding #11195
  • [3.7] bpo-37335: Add alias for ASCII encoding #14285
  • bpo-37335: Fix unexpected ASCII aliases in locale coercion tests #14443
  • bpo-37335: Remove unnecessary code from c locale coercion tests #14447
  • [3.7] bpo-37335: Fix unexpected ASCII aliases in locale coercion tests #14449
  • [3.8] bpo-37335, test_c_locale_coercion: Remove unnecessary code (GH-14447) #14552
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-07-02.11:21:34.048>
    created_at = <Date 2019-06-19.07:57:30.797>
    labels = ['3.8', '3.7', 'tests', '3.9']
    title = 'Improve encoding alias handling in locale coercion tests'
    updated_at = <Date 2019-07-02.11:24:31.594>
    user = 'https://github.com/kulikjak'

    bugs.python.org fields:

    activity = <Date 2019-07-02.11:24:31.594>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-07-02.11:21:34.048>
    closer = 'kulikjak'
    components = ['Tests']
    creation = <Date 2019-06-19.07:57:30.797>
    creator = 'kulikjak'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 37335
    keywords = ['patch']
    message_count = 9.0
    messages = ['346025', '346463', '346592', '346602', '346821', '347129', '347130', '347133', '347134']
    nosy_count = 3.0
    nosy_names = ['vstinner', 'miss-islington', 'kulikjak']
    pr_nums = ['11195', '14285', '14443', '14447', '14449', '14552']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue37335'
    versions = ['Python 3.7', 'Python 3.8', 'Python 3.9']

    @kulikjak
    Copy link
    Mannequin Author

    kulikjak mannequin commented Jun 19, 2019

    Locale coercion tests on Solaris are failing because 646 ASCII alias is not recognized. Its addition into the _handle_output_variations function fixes this problem.

    This was changed/fixed in Python 3.8 and later, where aliases are correctly translated to their canonical Python codec name so no patch is needed there.

    @kulikjak kulikjak mannequin added 3.7 (EOL) end of life tests Tests in the Lib/test dir labels Jun 19, 2019
    @vstinner
    Copy link
    Member

    test_c_locale_coerce should use "codecs.lookup(encoding).name" to get the normalized name of an encoding, rather than fragile:

            data = data.replace(b"ANSI_X3.4-1968", b"ascii")
            data = data.replace(b"US-ASCII", b"ascii")

    the proposed pattern is even more dangerous:

             data = data.replace(b"646", b"ascii")

    I'm not sure where encodings should be normalized. Maybe around _check_child_encoding_details().

    For the PR, please write it for the master branch.

    @kulikjak
    Copy link
    Mannequin Author

    kulikjak mannequin commented Jun 26, 2019

    I just added that in the way it was already there but I see why the current solution is not the best. Also I wanted to push this into 3.7 only as this problem is not present in 3.8 (as discussed in the PR 11195 opened incorrectly against the master).

    Just to be sure: what you propose is to rewrite current replaces to use "codecs.lookup(encoding).name" instead and then push it into the master?
    Ok, I will look into it.

    @vstinner
    Copy link
    Member

    what you propose is to rewrite current replaces to use "codecs.lookup(encoding).name" instead and then push it into the master?

    I suggest to remove the code which does the .replace(), but instead normalize the encoding when checking for the expected encoding (near .assertEqual()). I still see the .replace() code in master, so yeah, the code should first be changed in master:

        @staticmethod
        def _handle_output_variations(data):
            """Adjust the output to handle platform specific idiosyncrasies
        * Some platforms report ASCII as ANSI_X3.4-1968
        * Some platforms report ASCII as US-ASCII
        * Some platforms report UTF-8 instead of utf-8
        """
        data = data.replace(b"ANSI_X3.4-1968", b"ascii")
        data = data.replace(b"US-ASCII", b"ascii")
        data = data.lower()
        return data
    

    @kulikjak kulikjak mannequin changed the title Add 646 ASCII alias to locale coercion tests. Fix unexpected ASCII aliases in locale coercion tests. Jun 28, 2019
    @kulikjak
    Copy link
    Mannequin Author

    kulikjak mannequin commented Jun 28, 2019

    Python 3.8+ encodings are always normalized and thus no output variations handling is necessary (the code is no longer necessary).

    Python 3.7 (and possibly lower) can have variations in encodings - that should be fixed with codecs.lookup functions.

    @kulikjak kulikjak mannequin added 3.8 only security fixes 3.9 only security fixes labels Jun 28, 2019
    @kulikjak kulikjak mannequin changed the title Fix unexpected ASCII aliases in locale coercion tests. Improve encoding alias handling in locale coercion tests Jun 28, 2019
    @vstinner
    Copy link
    Member

    vstinner commented Jul 2, 2019

    New changeset c53173a by Victor Stinner (Jakub Kulík) in branch '3.7':
    bpo-37335: Fix test_c_locale_coercion to handle any ASCII alias (GH-14449)
    c53173a

    @vstinner
    Copy link
    Member

    vstinner commented Jul 2, 2019

    New changeset 61bf97e by Victor Stinner (Jakub Kulík) in branch 'master':
    bpo-37335, test_c_locale_coercion: Remove unnecessary code (GH-14447)
    61bf97e

    @miss-islington
    Copy link
    Contributor

    New changeset 518dc94 by Miss Islington (bot) in branch '3.8':
    bpo-37335, test_c_locale_coercion: Remove unnecessary code (GH-14447)
    518dc94

    @kulikjak kulikjak mannequin closed this as completed Jul 2, 2019
    @vstinner
    Copy link
    Member

    vstinner commented Jul 2, 2019

    Thanks Jakub Kulik. test_c_locale_coercion should pass again on 3.7, 3.8 and master branches on Solaris.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes tests Tests in the Lib/test dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants