This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients Arach, Arfrever, Huzaifa.Sidhpurwala, Jim.Jewett, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, eric.araujo, eric.snow, fx5, georg.brandl, grahamd, gregory.p.smith, gvanrossum, gz, jcea, lemburg, mark.dickinson, neologix, pitrou, skrah, terry.reedy, tim.peters, v+python, vstinner, zbysz
Date 2012-01-18.22:52:45
SpamBayes Score 0.0
Marked as misclassified No
Message-id <CAMpsgwav+2G-BZ4fw_7HU9zev-L2zk4Jwzsb_UBpjn-TNPDtCA@mail.gmail.com>
In-reply-to <4F1716A6.7070903@egenix.com>
Content
> Don't you think that the number of corrections you have to apply in order
> to get the tests working again shows how much impact such a change would
> have in real-world applications ?

Let see the diffstat:

 Doc/using/cmdline.rst                       |    7
 Include/pythonrun.h                         |    2
 Include/unicodeobject.h                     |    6
 Lib/json/__init__.py                        |    4
 Lib/os.py                                   |   17 -
 Lib/packaging/create.py                     |    7
 Lib/packaging/tests/test_create.py          |   18 -
 Lib/test/mapping_tests.py                   |    2
 Lib/test/regrtest.py                        |    5
 Lib/test/test_builtin.py                    |    1
 Lib/test/test_dis.py                        |   36 ++-
 Lib/test/test_gdb.py                        |   11 -
 Lib/test/test_inspect.py                    |    1
 Lib/test/test_os.py                         |   35 ++-
 Lib/test/test_set.py                        |   25 ++
 Lib/test/test_unicode.py                    |   39 ++++
 Lib/test/test_urllib.py                     |   16 -
 Lib/test/test_urlparse.py                   |    6
 Lib/tkinter/test/test_ttk/test_functions.py |    2
 Makefile.pre.in                             |    1
 Modules/posixmodule.c                       |  126 ++-----------
 Objects/unicodeobject.c                     |   20 +-
 PCbuild/pythoncore.vcproj                   |    4
 Python/pythonrun.c                          |    3
 Python/random.c                             |  268 ++++++++++++++++++++++++++++
 25 files changed, 488 insertions(+), 174 deletions(-)

Except Lib/packaging/create.py, all other changes are related to the
introduction of the randomized hash function, or fix tests... Even
Lib/packaging/create.py change is related to fixing tests. The test
can be changed differently, but I like the idea of having always the
same output in packaging (e.g. it is more readable for the user if
files are sorted).

I expected to have to do something on multiprocessing, but nope, it
doesn't care of the hash value.

So I expect something similar in applications: no change in the
applications, but a lot of hacks/tricks in tests.

> Perhaps we should start to think about a compromise: make both the
> collision counting and the hash seeding optional and let the user
> decide which option is best.

I don't think that we need two fixes for a single vulnerability (in
the same Python version), one is enough. If we decide to count
collisions, the randomized hash idea can be simply dropped. But we may
use a different fix for Python 3.3 and for stable versions (e.g. count
collisions for stable versions and use randomized hash for 3.3).

> BTW: The patch still includes the unnecessary _Py_unicode_hash_secret.suffix
> which needlessly complicates the code and doesn't any additional
> protection against hash value collisions

How does it complicate the code? It adds an extra XOR to hash(str) and
4 or 8 bytes in memory, that's all. It is more difficult to compute
the secret from hash(str) output if there is a prefix *and* a suffix.
If there is only a prefix, knowning a single hash(str) value is just
enough to retrieve directly the secret.
.
> I don't think it affects more than 0.01% of applications/users :)

It would help to try a patched Python on a real world application like
Django to realize how much code is broken (or not) by a randomized
hash function.
History
Date User Action Args
2012-01-18 22:52:46vstinnersetrecipients: + vstinner, lemburg, gvanrossum, tim.peters, barry, georg.brandl, terry.reedy, gregory.p.smith, jcea, mark.dickinson, pitrou, christian.heimes, benjamin.peterson, eric.araujo, grahamd, Arfrever, v+python, alex, zbysz, skrah, dmalcolm, gz, neologix, Arach, Mark.Shannon, eric.snow, Zhiping.Deng, Huzaifa.Sidhpurwala, Jim.Jewett, PaulMcMillan, fx5
2012-01-18 22:52:46vstinnerlinkissue13703 messages
2012-01-18 22:52:45vstinnercreate