Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A typo in a constant in cp932 codec #79375

Closed
izbyshev mannequin opened this issue Nov 8, 2018 · 13 comments
Closed

A typo in a constant in cp932 codec #79375

izbyshev mannequin opened this issue Nov 8, 2018 · 13 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error

Comments

@izbyshev
Copy link
Mannequin

izbyshev mannequin commented Nov 8, 2018

BPO 35194
Nosy @vstinner, @methane, @serhiy-storchaka, @izbyshev, @miss-islington
PRs
  • bpo-35194: Fix a wrong constant in cp932 codec #10420
  • [3.7] bpo-35194: Fix a wrong constant in cp932 codec (GH-10420) #10423
  • [3.6] bpo-35194: Fix a wrong constant in cp932 codec (GH-10420) #10424
  • bpo-35194: cjkcodec: check the encoded value is not truncated #10432
  • [2.7] bpo-35194: Fix a wrong constant in cp932 codec (GH-10420) #10433
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-03-29.07:48:59.564>
    created_at = <Date 2018-11-08.22:55:24.583>
    labels = ['extension-modules', '3.8', 'type-bug', '3.7']
    title = 'A typo in a constant in cp932 codec'
    updated_at = <Date 2019-03-29.07:49:13.054>
    user = 'https://github.com/izbyshev'

    bugs.python.org fields:

    activity = <Date 2019-03-29.07:49:13.054>
    actor = 'methane'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-03-29.07:48:59.564>
    closer = 'methane'
    components = ['Extension Modules']
    creation = <Date 2018-11-08.22:55:24.583>
    creator = 'izbyshev'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 35194
    keywords = ['patch']
    message_count = 13.0
    messages = ['329489', '329500', '329501', '329503', '329504', '329508', '329514', '329515', '329516', '329520', '329521', '329595', '339100']
    nosy_count = 5.0
    nosy_names = ['vstinner', 'methane', 'serhiy.storchaka', 'izbyshev', 'miss-islington']
    pr_nums = ['10420', '10423', '10424', '10432', '10433']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue35194'
    versions = ['Python 2.7', 'Python 3.6', 'Python 3.7', 'Python 3.8']

    @izbyshev
    Copy link
    Mannequin Author

    izbyshev mannequin commented Nov 8, 2018

    UBSan with -fsanitize=implicit-integer-truncation found a suspicious one:
    /scratch2/izbyshev/cpython/Modules/cjkcodecs/_codecs_jp.c:43:17: runtime error: implicit conversion from type 'unsigned int' of value 4294966013 (32-bit, unsigned) to type 'unsigned char' changed the value to 253 (8-bit, unsigned)

    Indeed, the wrong constant was used (the correct one is used in corresponding decoder code at

    OUTCHAR(0xf8f1 - 0xfd + c);
    ). In this case the truncation was harmless because only the lowest byte of the wrong result was used, and it was correct. But it probably makes sense to fix it if only to reduce noise from UBSan.

    All Python versions are affected, but I've marked 3.8 only since I'm not sure what the policy for backporting such changes is.

    @izbyshev izbyshev mannequin added 3.8 only security fixes extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error labels Nov 8, 2018
    @miss-islington
    Copy link
    Contributor

    New changeset 7a69cf4 by Miss Islington (bot) (Alexey Izbyshev) in branch 'master':
    bpo-35194: Fix a wrong constant in cp932 codec (GH-10420)
    7a69cf4

    @serhiy-storchaka
    Copy link
    Member

    Maybe add asserts in OUTBYTE1() and similar macros to prevent similar errors in future?

    @serhiy-storchaka serhiy-storchaka added the 3.7 (EOL) end of life label Nov 9, 2018
    @miss-islington
    Copy link
    Contributor

    New changeset 49ee41f by Miss Islington (bot) in branch '3.6':
    bpo-35194: Fix a wrong constant in cp932 codec (GH-10420)
    49ee41f

    @miss-islington
    Copy link
    Contributor

    New changeset 22234f1 by Miss Islington (bot) in branch '3.7':
    bpo-35194: Fix a wrong constant in cp932 codec (GH-10420)
    22234f1

    @vstinner
    Copy link
    Member

    vstinner commented Nov 9, 2018

    Maybe add asserts in OUTBYTE1() and similar macros to prevent similar errors in future?

    I like the idea. Make sure that: 0 <= ch <= 255?

    @izbyshev
    Copy link
    Mannequin Author

    izbyshev mannequin commented Nov 9, 2018

    I've added 'assert' to macros. Since 'typeof' seems to be disallowed in Python, I've used 'unsigned int' as the type of an intermediate variable.
    Another alternative is 'assert(0 <= (c) && (c) <= 255)', but 'c' will be evaluated several times.

    @serhiy-storchaka
    Copy link
    Member

    Converting to unsigned int can unnecessary widen the value or truncate higher bits. On other side, testing "0 <= (c)" can emit a compiler warning if c is unsigned.

    Maybe test "Py_CHARMASK(c) == (c)"?

    @izbyshev
    Copy link
    Mannequin Author

    izbyshev mannequin commented Nov 9, 2018

    Maybe test "Py_CHARMASK(c) == (c)"?

    This is a good alternative if multiple evaluation of 'c' is acceptable. Though I'd prefer '(unsigned char)c == c' for this style of fixing because it is bit closer to what happens in '((*outbuf)[i]) = c'.

    @izbyshev
    Copy link
    Mannequin Author

    izbyshev mannequin commented Nov 9, 2018

    I've checked than other macros in Modules/cjkcodecs/cjkcodecs.h don't avoid multiple argument evaluation (e.g. OUTCHAR2, _TRYMAP_ENC), so I've changed 'assert' to a variant of what Serhiy suggested.

    @izbyshev
    Copy link
    Mannequin Author

    izbyshev mannequin commented Nov 9, 2018

    OUTCHAR2 is a wrong example. Other examples are NEXT_IN, NEXT_OUT.

    @serhiy-storchaka
    Copy link
    Member

    New changeset 0d16526 by Serhiy Storchaka (Alexey Izbyshev) in branch '2.7':
    [2.7] bpo-35194: Fix a wrong constant in cp932 codec. (GH-10420) (GH-10433)
    0d16526

    @methane methane closed this as completed Mar 29, 2019
    @methane
    Copy link
    Member

    methane commented Mar 29, 2019

    New changeset 5f45979 by Inada Naoki (Alexey Izbyshev) in branch 'master':
    bpo-35194: cjkcodec: check the encoded value is not truncated (GH-10432)
    5f45979

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants