This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: unicodedata_UCD_lookup() has theoretical buffer overflow
Type: behavior Stage: patch review
Components: Extension Modules Versions: Python 3.6, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, christian.heimes, ezio.melotti, lemburg, pitrou, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2015-04-18 22:32 by christian.heimes, last changed 2022-04-11 14:58 by admin.

Files
File name Uploaded Description Edit
unicode_name_maxlen.patch christian.heimes, 2015-04-18 22:32 review
unicode_name_maxlen_trunc.patch serhiy.storchaka, 2015-12-19 23:12 review
Messages (2)
msg241461 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2015-04-18 22:32
Coverity has found a potential buffer overflow in the unicodedata module. The function call _getcode() which calls _cmpname(). _cmpname() copies data into fixed size buffer of length NAME_MAXLEN. Neither lookup() nor _getcode() limit name_length to NAME_MAXLEN. Therefore the buffer could theoretical overflow.

In practice the buffer overflow can't be abused because Tools/unicode/makeunicodedata.py already limits max name length. I still like to fix the bug because it is a low hanging fruit. In most versions of Python the code already checks that name_length fits in INT_MAX.

CID 1295028 (#1 of 1): Out-of-bounds access (OVERRUN)
overrun-call: Overrunning callee's array of size 256 by passing argument (int)name_length (which evaluates to 2147483647) in call to _getcode
msg256744 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-12-19 23:12
For now the error message virtually always contains the name (unless the length of its UTF-8 representation > INT_MAX). With unicode_name_maxlen.patch it doesn't contains the name of length few hundreds or tens characters.

Proposed patch makes the error message always contain the name, but truncated to NAME_MAXLEN bytes.

>>> name = ''.join(map(chr, range(0x2c80, 0x2ce4)))
>>> unicodedata.lookup(name)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: "undefined character name 'ⲀⲁⲂⲃⲄⲅⲆⲇⲈⲉⲊⲋⲌⲍⲎⲏⲐⲑⲒⲓⲔⲕⲖⲗⲘⲙⲚⲛⲜⲝⲞⲟⲠⲡⲢⲣⲤⲥⲦⲧⲨⲩⲪⲫⲬⲭⲮⲯⲰⲱⲲⲳⲴⲵⲶⲷⲸⲹⲺⲻⲼⲽⲾⲿⳀⳁⳂⳃⳄⳅⳆⳇⳈⳉⳊⳋⳌⳍⳎⳏⳐⳑⳒⳓⳔ�...'"
History
Date User Action Args
2022-04-11 14:58:15adminsetgithub: 68185
2015-12-19 23:12:32serhiy.storchakasetfiles: + unicode_name_maxlen_trunc.patch

messages: + msg256744
components: + Extension Modules
versions: + Python 3.6, - Python 3.3, Python 3.4
2015-04-18 22:32:38christian.heimescreate