classification
Title: A typo in a constant in cp932 codec
Type: behavior Stage: patch review
Components: Extension Modules Versions: Python 3.8, Python 3.7, Python 3.6, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: izbyshev, miss-islington, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2018-11-08 22:55 by izbyshev, last changed 2018-11-10 05:47 by serhiy.storchaka.

Pull Requests
URL Status Linked Edit
PR 10420 merged izbyshev, 2018-11-08 22:57
PR 10423 merged miss-islington, 2018-11-09 07:12
PR 10424 merged miss-islington, 2018-11-09 07:12
PR 10432 open izbyshev, 2018-11-09 13:39
PR 10433 merged izbyshev, 2018-11-09 14:07
Messages (12)
msg329489 - (view) Author: Alexey Izbyshev (izbyshev) * Date: 2018-11-08 22:55
UBSan with -fsanitize=implicit-integer-truncation found a suspicious one:
/scratch2/izbyshev/cpython/Modules/cjkcodecs/_codecs_jp.c:43:17: runtime error: implicit conversion from type 'unsigned int' of value 4294966013 (32-bit, unsigned) to type 'unsigned char' changed the value to 253 (8-bit, unsigned)

Indeed, the wrong constant was used (the correct one is used in corresponding decoder code at https://github.com/python/cpython/blob/fd512d76456b65c529a5bc58d8cfe73e4a10de7a/Modules/cjkcodecs/_codecs_jp.c#L105). In this case the truncation was harmless because only the lowest byte of the wrong result was used, and it was correct. But it probably makes sense to fix it if only to reduce noise from UBSan.

All Python versions are affected, but I've marked 3.8 only since I'm not sure what the policy for backporting such changes is.
msg329500 - (view) Author: miss-islington (miss-islington) Date: 2018-11-09 07:12
New changeset 7a69cf47a9bbc95f95fd67c982bff121b2a903cb by Miss Islington (bot) (Alexey Izbyshev) in branch 'master':
bpo-35194: Fix a wrong constant in cp932 codec (GH-10420)
https://github.com/python/cpython/commit/7a69cf47a9bbc95f95fd67c982bff121b2a903cb
msg329501 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-11-09 07:23
Maybe add asserts in OUTBYTE1() and similar macros to prevent similar errors in future?
msg329503 - (view) Author: miss-islington (miss-islington) Date: 2018-11-09 07:33
New changeset 49ee41f1c3934aa095e32fa751cdf3ba641ae34b by Miss Islington (bot) in branch '3.6':
bpo-35194: Fix a wrong constant in cp932 codec (GH-10420)
https://github.com/python/cpython/commit/49ee41f1c3934aa095e32fa751cdf3ba641ae34b
msg329504 - (view) Author: miss-islington (miss-islington) Date: 2018-11-09 07:35
New changeset 22234f1375d28803074405497ea61315fb37240d by Miss Islington (bot) in branch '3.7':
bpo-35194: Fix a wrong constant in cp932 codec (GH-10420)
https://github.com/python/cpython/commit/22234f1375d28803074405497ea61315fb37240d
msg329508 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-11-09 10:21
> Maybe add asserts in OUTBYTE1() and similar macros to prevent similar errors in future?

I like the idea. Make sure that: 0 <= ch <= 255?
msg329514 - (view) Author: Alexey Izbyshev (izbyshev) * Date: 2018-11-09 13:44
I've added 'assert' to macros. Since 'typeof' seems to be disallowed in Python, I've used 'unsigned int' as the type of an intermediate variable. 
Another alternative is 'assert(0 <= (c) && (c) <= 255)', but 'c' will be evaluated several times.
msg329515 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-11-09 13:50
Converting to unsigned int can unnecessary widen the value or truncate higher bits. On other side, testing "0 <= (c)" can emit a compiler warning if c is unsigned.

Maybe test "Py_CHARMASK(c) == (c)"?
msg329516 - (view) Author: Alexey Izbyshev (izbyshev) * Date: 2018-11-09 13:58
> Maybe test "Py_CHARMASK(c) == (c)"?

This is a good alternative if multiple evaluation of 'c' is acceptable. Though I'd prefer '(unsigned char)c == c' for this style of fixing because it is bit closer to what happens in '((*outbuf)[i]) = c'.
msg329520 - (view) Author: Alexey Izbyshev (izbyshev) * Date: 2018-11-09 14:27
I've checked than other macros in Modules/cjkcodecs/cjkcodecs.h don't avoid multiple argument evaluation (e.g. OUTCHAR2, _TRYMAP_ENC), so I've changed 'assert' to a variant of what Serhiy suggested.
msg329521 - (view) Author: Alexey Izbyshev (izbyshev) * Date: 2018-11-09 14:35
OUTCHAR2 is a wrong example. Other examples are NEXT_IN, NEXT_OUT.
msg329595 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-11-10 05:47
New changeset 0d165262d949440e5aea6533b10e19e4cd5cf12d by Serhiy Storchaka (Alexey Izbyshev) in branch '2.7':
[2.7] bpo-35194: Fix a wrong constant in cp932 codec. (GH-10420) (GH-10433)
https://github.com/python/cpython/commit/0d165262d949440e5aea6533b10e19e4cd5cf12d
History
Date User Action Args
2018-11-10 05:47:15serhiy.storchakasetmessages: + msg329595
2018-11-09 14:35:21izbyshevsetmessages: + msg329521
2018-11-09 14:27:12izbyshevsetmessages: + msg329520
2018-11-09 14:07:58izbyshevsetpull_requests: + pull_request9707
2018-11-09 13:58:03izbyshevsetmessages: + msg329516
2018-11-09 13:50:06serhiy.storchakasetmessages: + msg329515
2018-11-09 13:44:40izbyshevsetmessages: + msg329514
2018-11-09 13:39:51izbyshevsetpull_requests: + pull_request9706
2018-11-09 10:21:57vstinnersetmessages: + msg329508
2018-11-09 07:35:09miss-islingtonsetmessages: + msg329504
2018-11-09 07:33:13miss-islingtonsetmessages: + msg329503
2018-11-09 07:23:19serhiy.storchakasetmessages: + msg329501
versions: + Python 2.7, Python 3.6, Python 3.7
2018-11-09 07:12:41miss-islingtonsetpull_requests: + pull_request9705
2018-11-09 07:12:34miss-islingtonsetpull_requests: + pull_request9704
2018-11-09 07:12:10miss-islingtonsetnosy: + miss-islington
messages: + msg329500
2018-11-08 22:57:37izbyshevsetkeywords: + patch
stage: patch review
pull_requests: + pull_request9700
2018-11-08 22:55:24izbyshevcreate