Issue 28563: Arbitrary code execution in gettext.c2py

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/72749

classification

Title:	Arbitrary code execution in gettext.c2py
Type:	security	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.7, Python 3.6, Python 3.3, Python 3.4, Python 3.5, Python 2.7

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	serhiy.storchaka	Nosy List:	Carl Ekerot, Tim.Graham, christian.heimes, loewis, python-dev, serhiy.storchaka, xiang.zhang
Priority:	deferred blocker	Keywords:	patch

Created on 2016-10-30 16:58 by Carl Ekerot, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
gettext_c2py.patch	xiang.zhang, 2016-11-04 06:50		review
gettext_c2py_func.patch	Carl Ekerot, 2016-11-04 15:05	Based on Xiang Zhang's patch
gettext_c2py_v2.patch	xiang.zhang, 2016-11-06 17:37		review
gettext-parse-plural.patch	serhiy.storchaka, 2016-11-07 17:14		review
gettext-parse-plural-2.patch	serhiy.storchaka, 2016-11-08 08:58		review
gettext-non-integer-plural.patch	serhiy.storchaka, 2016-11-14 15:53		review
gettext-non-integer-plural-2.patch	serhiy.storchaka, 2016-11-14 16:32		review

Pull Requests
URL	Status	Linked	Edit
PR 552	closed	dstufft, 2017-03-31 16:36

Messages (32)
msg279734 - (view)	Author: Carl Ekerot (Carl Ekerot) *	Date: 2016-10-30 16:58
The c2py-function in the gettext module is seriously flawed in many ways due to its use of eval to create a plural function: return eval('lambda n: int(%s)' % plural) My first discovery was that nothing prevents an input plural string that resembles a function call: gettext.c2py("n()")(lambda: os.system("sh")) This is of course a low risk bug, since it requires control of both the plural function string and the argument. Gaining arbitrary code execution using only the plural function string requires that the security checks are bypassed. The security checks utilize the tokenize module and makes sure that no NAME-tokens that are not "n" exist in the string. However, it does not check if the parser succeeds without any token.ERRORTOKEN being present. Hence, the following input will pass the security checks: gettext.c2py( '"(eval(foo) && ""' )(0) ----> 1 gettext.c2py( '"(eval(foo) && ""' )(0) gettext.pyc in <lambda>(n) NameError: global name 'foo' is not defined It will pass since it recognizes the entire input as a STRING token, and subsequently fails with an ERRORTOKEN. Passing a string in the argument to eval will however break the exploit since the parser will read the start-of-string in the eval argument as end-of-string, and the eval argument will be read as a NAME-token. Instead of passing a string to eval, we can build a string from characters in the docstrings available in the context of the gettext module: gettext.c2py('"(eval(' 'os.__doc__[152:155] + ' # os. 'os.__doc__[46:52] + ' # system 'os.__doc__[318] + ' # ( 'os.__doc__[55] + ' # ' 'os.__doc__[10] + ' # s 'os.__doc__[42] + ' # h 'os.__doc__[55] + ' # ' 'os.__doc__[329]' # ) ') && ""')(0) This will successfully spawn a shell in Python 2.7.11. Bonus: With the new string interpolation in Python 3.7, exploiting gettext.c2py becomes trivial: gettext.c2py('f"{os.system(\'sh\')}"')(0) The tokenizer will recognize the entire format-string as just a string, thus bypassing the security checks. To mitigate these vulnerabilities, eval should be avoided by implementing a custom parser for the gettext plural function DSL.
msg280037 - (view)	Author: Xiang Zhang (xiang.zhang) *	Date: 2016-11-04 06:50
gettext_c2py.patch tries to avoid the problem. It still uses eval but manually parse the expression using tokens extracted from gettext. Tests are passed. But there is still a problem. Both patched and original c2py fail to handle nested ternary operator. They respect the right associative but fails to handle expressions like '1?2?3:4:5'.
msg280048 - (view)	Author: Carl Ekerot (Carl Ekerot) *	Date: 2016-11-04 15:05
It doesn't solve the case when an identifier or number is used as a function: >>> import os >>> gettext.c2py("n()")(lambda: os.system("sh")) $ 0 >>> gettext.c2py("1()")(0) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<string>", line 1, in <lambda> TypeError: 'int' object is not callable This is more of an unintended behavior than a security issue. Xiang Zhang: I've created a patch based on yours which handles the above case. I've also added a corresponding test case. Imo it would be even better if we could avoid eval. One possible (and safe) way would be to construct a safe subset of Python using the ast module. This would however still require that the C-style syntax is translated to Python. As you mention, there are issues parsing and translating nested ternary operators, and I doubt it will be possible to cover all cases with the regex replace utilized today.
msg280082 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-11-04 20:47
> But there is still a problem. Both patched and original c2py fail to handle nested ternary operator. They respect the right associative but fails to handle expressions like '1?2?3:4:5'. What if make repeated replacements with regular expression r'([^?:]?)\?([^?:]?):([^?:]*)'?
msg280084 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-11-04 20:58
> It doesn't solve the case when an identifier or number is used as a function: In the first case we should convert an argument to integer. ns = {} exec('''if True: def func(arg): n = int(arg) return {} '''.format(plural), ns) return ns['func'] Or raise an exception if argument is not integer. In the second case I think we can just left it as is. This is not valid C expression. Or we can add try/except in above code and convert TypeError to ValueError if this is more preferable exception.
msg280103 - (view)	Author: Xiang Zhang (xiang.zhang) *	Date: 2016-11-05 07:37
> gettext.c2py("n()")(lambda: os.system("sh")) > gettext.c2py("1()")(0) Empty parentheses should be disallowed. Function calls are not allowed in plural expression. And non-integer argument should be disallowed either, just as Serhiy's example shows. > What if make repeated replacements with regular expression r'([^?:]?)\?([^?:]?):([^?:]*)'? How does it work for '1?2:3?4:5'? :-( I am considering a parser.
msg280104 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-11-05 08:29
> How does it work for '1?2:3?4:5'? '1?2:3?4:5' -> '(2 if 1 else 3)?4:5' -> '(4 if (2 if 1 else 3) else 5' But there are other problems. Precedence of some operators is different in C and Python. Chained comparison in Python cause different result that in C (e.g. '2 == 2 == 2'). Seems there is no other way besides a simple parser. gettext_c2py.patch itself LGTM for fixing security issue, but tests are needed. It would be nice to make c2py() working with any expressions, but if this is too hard, this can be left for other issue. I'm going to commit a variant of gettext_c2py.patch before 3.6 release if there will be not better patches.
msg280107 - (view)	Author: Carl Ekerot (Carl Ekerot) *	Date: 2016-11-05 09:56
Verified gettext.c2py with gettext_c2py.patch applied agains the plural forms actually used in localization, listed over at http://docs.translatehouse.org/projects/localization-guide/en/latest/l10n/pluralforms.html. I tested all of the none-trivial forms, and from what I can tell they generate valid syntax and are correct.
msg280112 - (view)	Author: Xiang Zhang (xiang.zhang) *	Date: 2016-11-05 13:36
> '1?2:3?4:5' -> '(2 if 1 else 3)?4:5' -> '(4 if (2 if 1 else 3) else 5' This is not right. It's right associative so it should be 1?2:(3?4:5) -> 1?2:(4 if 3 else 5) -> 2 if 1 else (4 if 3 else 5) > It would be nice to make c2py() working with any expressions, but if this is too hard, this can be left for other issue. Agree. But I am interested in trying. > gettext_c2py.patch itself LGTM for fixing security issue, but tests are needed. It gets drawbacks so I don't include tests. I'll add in next try.
msg280118 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2016-11-05 17:49
The gettext module might be vulnerable to f-string attacks, too. Also see #18317.
msg280119 - (view)	Author: Carl Ekerot (Carl Ekerot) *	Date: 2016-11-05 18:00
> The gettext module might be vulnerable to f-string attacks It is. See the example in the first comment: gettext.c2py('f"{os.system(\'sh\')}"')(0) This vulnerability seems to be solved in Xiang's patch. The DoS aspect is interesting though, since there's no constraints against malicious use of the power-operator, say "999..9". This too would be solved by implementing a parser with only simple arithmetics.
msg280120 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2016-11-05 18:17
Argh, sorry. I meant to write "The gettext module might be vulnerable to more than f-string attacks.". May I suggest that you have a look at my old patch? It uses an AST visitor to inspect the AST of a gettext plural expression. It allows only a limited set of AST types as well as limited amount of expressions. I consider it a superior solution and a fix for more generic attacks. I haven't tested my patch with f-strings yet. It either refuses f-strings FormattedValue already or can be easily modified to reject it.
msg280122 - (view)	Author: Xiang Zhang (xiang.zhang) *	Date: 2016-11-05 19:21
Christian, I think our patches are quite similar in function. They only allow limited tokens. > I consider it a superior solution and a fix for more generic attacks Mine now still allows **. But it can be easily fixed. But both our patches still translate a C expression to Python and still suffer from nested ternary operator and different semantics between C and Python, e.g. (2==2==2 as Serhiy notes). :-( I plan to try a simple parser.
msg280157 - (view)	Author: Xiang Zhang (xiang.zhang) *	Date: 2016-11-06 17:37
gettext_c2py_v2.patch implements a simple C expression parser. More tests are included. Carl, hope you are willing to test it.
msg280191 - (view)	Author: Carl Ekerot (Carl Ekerot) *	Date: 2016-11-07 08:37
Judging by the code, this seems to be a much more rigid implementation. I've only run the unit tests and some variations of my initial examples, and everything seems to work as intended. Will look at it more closely this afternoon. One thing that caught my attention in the patch is that gettext.c2py is removed entirely. I know that this function is not exposed in the docs or in __all__, but it still has "public scope" and it's likely used directly in the wild (googling the signature confirms this). Perhaps it should give a DeprectationWarning and delegate to _Plural?
msg280196 - (view)	Author: Xiang Zhang (xiang.zhang) *	Date: 2016-11-07 09:13
> Perhaps it should give a DeprectationWarning and delegate to _Plural? I hold a conservative opinion about this. c2py in my mind should be a inner help method. It's not documented so if there are users using it, they are risking changes. And as reported here, it's buggy and dangerous.
msg280211 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-11-07 17:14
Sorry Xiang, but your patch looks overcomplicated to me. Too much methods, decorators, classes, too much strange names. Here is simpler implementation with using only one recursive function.
msg280276 - (view)	Author: Xiang Zhang (xiang.zhang) *	Date: 2016-11-08 03:40
> Sorry Xiang, but your patch looks overcomplicated to me. Too much methods, decorators, classes, too much strange names. It's fine. That's a Pratt parser. Yes, the names are strange. Your patch looks more simpler. I left several comments.
msg280287 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-11-08 08:58
Just for reference: https://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms http://git.savannah.gnu.org/cgit/gettext.git/tree/gettext-runtime/intl/plural.y Yet one security issue is that too deep recursion can cause MemoryError or even a crash in Python compiler. My patch creates too much nested parenthesis. Updated patch minimize using parenthesis to minimum and adds guards against too deep recursion.
msg280289 - (view)	Author: Xiang Zhang (xiang.zhang) *	Date: 2016-11-08 10:01
LGTM. And I expect there could be a comment about the special decimal number.
msg280293 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-11-08 11:18
What a comment you need Xiang? Isn't existing comment enough?
msg280299 - (view)	Author: Carl Ekerot (Carl Ekerot) *	Date: 2016-11-08 12:37
Looks good to me. It behaves as intended on every input I can think of.
msg280307 - (view)	Author: Xiang Zhang (xiang.zhang) *	Date: 2016-11-08 13:13
> What a comment you need Xiang? Isn't existing comment enough? Serhiy, I mean the case a number starting with 0, e.g. 0123. The plural form is a C expression and in C 0123 is an octal number. c2py now interprets it as a decimal number.
msg280337 - (view)	Author: Roundup Robot (python-dev)	Date: 2016-11-08 19:28
New changeset e0cc3fadd7b3 by Serhiy Storchaka in branch '2.7': Issue #28563: Fixed possible DoS and arbitrary code execution when handle https://hg.python.org/cpython/rev/e0cc3fadd7b3 New changeset 7e66c5dc4218 by Serhiy Storchaka in branch '3.3': Issue #28563: Fixed possible DoS and arbitrary code execution when handle https://hg.python.org/cpython/rev/7e66c5dc4218 New changeset 730cf8cdf02c by Serhiy Storchaka in branch '3.4': Issue #28563: Fixed possible DoS and arbitrary code execution when handle https://hg.python.org/cpython/rev/730cf8cdf02c New changeset e0dd4cc9661a by Serhiy Storchaka in branch '3.5': Issue #28563: Fixed possible DoS and arbitrary code execution when handle https://hg.python.org/cpython/rev/e0dd4cc9661a New changeset 7ea64ab9a5c8 by Serhiy Storchaka in branch '3.6': Issue #28563: Fixed possible DoS and arbitrary code execution when handle https://hg.python.org/cpython/rev/7ea64ab9a5c8 New changeset 4ca9a4f96edc by Serhiy Storchaka in branch 'default': Issue #28563: Fixed possible DoS and arbitrary code execution when handle https://hg.python.org/cpython/rev/4ca9a4f96edc
msg280780 - (view)	Author: Tim Graham (Tim.Graham) *	Date: 2016-11-14 15:03
Hi, this broke a couple tests with Django because it's passing number as a float rather than an integer. For example: ====================================================================== ERROR: test_localized_formats (template_tests.filter_tests.test_filesizeformat.FunctionTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/tim/code/django/tests/template_tests/filter_tests/test_filesizeformat.py", line 28, in test_localized_formats self.assertEqual(filesizeformat(1023), '1023\xa0Bytes') File "/home/tim/code/django/django/template/defaultfilters.py", line 895, in filesizeformat value = ungettext("%(size)d byte", "%(size)d bytes", bytes_) % {'size': bytes_} File "/home/tim/code/django/django/utils/translation/__init__.py", line 91, in ungettext return _trans.ungettext(singular, plural, number) File "/home/tim/code/django/django/utils/translation/trans_real.py", line 385, in ngettext return do_ntranslate(singular, plural, number, 'ngettext') File "/home/tim/code/django/django/utils/translation/trans_real.py", line 372, in do_ntranslate return getattr(t, translation_function)(singular, plural, number) File "/home/tim/code/cpython/Lib/gettext.py", line 441, in ngettext tmsg = self._catalog[(msgid1, self.plural(n))] File "<string>", line 4, in func ValueError: Plural value must be an integer, got 1023.0. Can you advise if this patch could be adapted or if Django should adapt? By the way, I used this patch to get a more useful error message: diff --git a/Lib/gettext.py b/Lib/gettext.py index 7032efa..2076a3f 100644 --- a/Lib/gettext.py +++ b/Lib/gettext.py @@ -185,7 +185,7 @@ def c2py(plural): exec('''if True: def func(n): if not isinstance(n, int): - raise ValueError('Plural value must be an integer.') + raise ValueError('Plural value must be an integer, got %%s. return int(%s) ''' % result, ns) return ns['func']
msg280781 - (view)	Author: Xiang Zhang (xiang.zhang) *	Date: 2016-11-14 15:29
GNU gettext only allows the plural value(n) to be an integer(unsigned long int). Django deliberately converts the value to long in filesizeformat. Any reason not to use int?
msg280784 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-11-14 15:53
Proposed patch makes gettext accepting non-integer numbers. But DeprecationWarning is emitted if this number have fractional part (because in this case the result of current code can be different from the result of old code). I think Django tests should be passed without errors and warnings. In future versions (probably in 3.7) this warning will be replaced with TypeError or ValueError, and new warning will be added for all non-integer numbers.
msg280789 - (view)	Author: Tim Graham (Tim.Graham) *	Date: 2016-11-14 16:23
Thanks, that does fix that first test. There is one more that still fails: $ python -Wall tests/runtests.py humanize_tests.tests.HumanizeTests.test_i18n_intword Testing against Django installed in '/home/tim/code/django/django' with up to 3 processes Creating test database for alias 'default'... Creating test database for alias 'other'... /home/tim/code/cpython/Lib/gettext.py:454: DeprecationWarning: Plural value must be an integer, got 1.2 tmsg = self._catalog[(msgid1, self.plural(n))] /home/tim/code/cpython/Lib/gettext.py:454: DeprecationWarning: Plural value must be an integer, got 1.2 tmsg = self._catalog[(msgid1, self.plural(n))] F ====================================================================== FAIL: test_i18n_intword (humanize_tests.tests.HumanizeTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/tim/code/django/tests/humanize_tests/tests.py", line 133, in test_i18n_intword self.humanize_tester(test_list, result_list, 'intword') File "/home/tim/code/django/tests/humanize_tests/tests.py", line 39, in humanize_tester msg="%s test failed, produced '%s', should've produced '%s'" % (method, rendered, result)) AssertionError: '1,2 Million' != '1,2 Millionen' : intword test failed, produced '1,2 Million', should've produced '1,2 Millionen' I think the relevant code is https://github.com/django/django/blob/cbae4d31847d75d889815bfe7c04af035f45e28d/django/contrib/humanize/templatetags/humanize.py#L60-L63. I'm not too familiar with this code, but I'll try to get a better explanation if it's not clear to you.
msg280791 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-11-14 16:32
What about following patch?
msg280792 - (view)	Author: Tim Graham (Tim.Graham) *	Date: 2016-11-14 16:49
Yes, that fixes the second test. Current warning (with stacklevel=3): /home/tim/code/cpython/Lib/gettext.py:454: DeprecationWarning: Plural value must be an integer, got 1.29 tmsg = self._catalog[(msgid1, self.plural(n))] Possibly the stacklevel should instead be 4: /home/tim/code/django/django/utils/translation/trans_real.py:373: DeprecationWarning: Plural value must be an integer, got 1.29 return getattr(t, translation_function)(singular, plural, number)
msg280800 - (view)	Author: Roundup Robot (python-dev)	Date: 2016-11-14 17:31
New changeset 5b6cde995b3a by Serhiy Storchaka in branch '3.3': Issue #28563: Make plural form selection more lenient and accepting https://hg.python.org/cpython/rev/5b6cde995b3a New changeset 6ca91a14a555 by Serhiy Storchaka in branch '2.7': Issue #28563: Make plural form selection more lenient and accepting https://hg.python.org/cpython/rev/6ca91a14a555 New changeset f78a05cda5aa by Serhiy Storchaka in branch '3.4': Issue #28563: Make plural form selection more lenient and accepting https://hg.python.org/cpython/rev/f78a05cda5aa New changeset 6a2754055ff8 by Serhiy Storchaka in branch '3.5': Issue #28563: Make plural form selection more lenient and accepting https://hg.python.org/cpython/rev/6a2754055ff8 New changeset 4c201c65ce5d by Serhiy Storchaka in branch '3.6': Issue #28563: Make plural form selection more lenient and accepting https://hg.python.org/cpython/rev/4c201c65ce5d New changeset d0efb8532589 by Serhiy Storchaka in branch 'default': Issue #28563: Make plural form selection more lenient and accepting https://hg.python.org/cpython/rev/d0efb8532589
msg280802 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-11-14 17:39
Yes, you are right, the stacklevel should be 4. Thank you for your report and testing Tim. Committed the patch without any deprecation. Will add a deprecation in separate issue.

History
Date	User	Action	Args
2022-04-11 14:58:38	admin	set	github: 72749
2017-03-31 16:36:31	dstufft	set	pull_requests: + pull_request1039
2016-11-14 17:39:32	serhiy.storchaka	set	status: open -> closed messages: + msg280802 stage: patch review -> resolved
2016-11-14 17:31:44	python-dev	set	messages: + msg280800
2016-11-14 16:49:45	Tim.Graham	set	messages: + msg280792
2016-11-14 16:32:05	serhiy.storchaka	set	files: + gettext-non-integer-plural-2.patch messages: + msg280791
2016-11-14 16:23:08	Tim.Graham	set	messages: + msg280789
2016-11-14 15:53:29	serhiy.storchaka	set	status: closed -> open files: + gettext-non-integer-plural.patch messages: + msg280784 stage: resolved -> patch review
2016-11-14 15:29:07	xiang.zhang	set	messages: + msg280781
2016-11-14 15:03:39	Tim.Graham	set	nosy: + Tim.Graham messages: + msg280780
2016-11-08 19:32:58	serhiy.storchaka	link	issue18317 superseder
2016-11-08 19:29:34	serhiy.storchaka	set	status: open -> closed resolution: fixed stage: patch review -> resolved
2016-11-08 19:28:55	python-dev	set	nosy: + python-dev messages: + msg280337
2016-11-08 13:13:36	xiang.zhang	set	messages: + msg280307
2016-11-08 12:37:38	Carl Ekerot	set	messages: + msg280299
2016-11-08 11:18:34	serhiy.storchaka	set	messages: + msg280293
2016-11-08 10:01:30	xiang.zhang	set	messages: + msg280289
2016-11-08 08:58:07	serhiy.storchaka	set	files: + gettext-parse-plural-2.patch messages: + msg280287
2016-11-08 03:40:06	xiang.zhang	set	messages: + msg280276
2016-11-07 17:14:33	serhiy.storchaka	set	files: + gettext-parse-plural.patch messages: + msg280211
2016-11-07 09:13:09	xiang.zhang	set	messages: + msg280196
2016-11-07 08:37:15	Carl Ekerot	set	messages: + msg280191
2016-11-06 21:55:53	serhiy.storchaka	set	assignee: serhiy.storchaka stage: patch review
2016-11-06 17:37:47	xiang.zhang	set	files: + gettext_c2py_v2.patch messages: + msg280157
2016-11-05 19:21:44	xiang.zhang	set	messages: + msg280122
2016-11-05 18:17:49	christian.heimes	set	messages: + msg280120
2016-11-05 18:00:31	Carl Ekerot	set	messages: + msg280119
2016-11-05 17:49:56	christian.heimes	set	nosy: + christian.heimes messages: + msg280118
2016-11-05 13:36:53	xiang.zhang	set	messages: + msg280112
2016-11-05 09:56:13	Carl Ekerot	set	messages: + msg280107
2016-11-05 08:29:03	serhiy.storchaka	set	priority: high -> deferred blocker messages: + msg280104
2016-11-05 07:37:11	xiang.zhang	set	messages: + msg280103
2016-11-04 20:58:07	serhiy.storchaka	set	messages: + msg280084
2016-11-04 20:47:51	serhiy.storchaka	set	messages: + msg280082
2016-11-04 15:05:47	Carl Ekerot	set	files: + gettext_c2py_func.patch messages: + msg280048
2016-11-04 06:50:57	xiang.zhang	set	files: + gettext_c2py.patch keywords: + patch messages: + msg280037
2016-10-31 11:23:54	xiang.zhang	set	nosy: + xiang.zhang
2016-10-30 18:17:07	serhiy.storchaka	set	priority: normal -> high nosy: + serhiy.storchaka
2016-10-30 17:25:30	SilentGhost	set	nosy: + loewis
2016-10-30 16:58:41	Carl Ekerot	create