Author dmalcolm
Recipients Arach, Arfrever, Huzaifa.Sidhpurwala, Jim.Jewett, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, eric.snow, fx5, georg.brandl, grahamd, gregory.p.smith, gvanrossum, gz, haypo, jcea, lemburg, loewis, mark.dickinson, merwok, neologix, pitrou, skorgu, skrah, terry.reedy, tim.peters, v+python, zbysz
Date 2012-01-28.05:13:36
SpamBayes Score 2.22045e-16
Marked as misclassified No
Message-id <1327727572.2219.95.camel@surprise>
In-reply-to <1327719792.36.0.734414179886.issue13703@psf.upfronthosting.co.za>
Content
On Sat, 2012-01-28 at 03:03 +0000, Benjamin Peterson wrote:
> Benjamin Peterson <benjamin@python.org> added the comment:
> 
> For the record, Barry and I agreed on what we'll be doing for stable releases [1]. David says he should have a patch soon.
> 
> [1] http://mail.python.org/pipermail/python-dev/2012-January/115892.html

I'm attaching what I've got so far (need sleep).

Attached patch is for 3.1 and adds opt-in hash randomization.

It's based on haypo's work: random-8.patch (thanks haypo!), with
additional changes as seen in my backport of that to 2.7:
http://bugs.python.org/issue13703#msg151847

* The randomization is off by default, and must be enabled by setting
a new environment variable PYTHONHASHRANDOMIZATION to a non-empty
string. (if so then, PYTHONHASHSEED also still works, if provided, in
the same way as in haypo's patch)

* All of the various "Py_hash_t" become "long" again (Py_hash_t was
added in 3.2: issue9778)

* I expanded the randomization from just PyUnicodeObject to also cover
PyBytesObject, and the types within datetime.

* It doesn't cover numeric types; see my explanation in msg151847; also
see http://bugs.python.org/issue13703#msg151870

* It doesn't yet cover the embedded copy of expat.

* I moved the hash tests from test_unicode.py to test_hash.py

* I tweaked the wording of the descriptions of the envvars in
cmdline.rst and the manpage

* I've tested it on a 32-bit box, and it successfully protects against
one set of test data (four cases: assembling then reading back items by
key for a dict vs set, bytes vs str, with 200000 distinct items of data
which all have hash() == 0 in unmodified build; each takes about 1.5
seconds on this --with-pydebug build, vs of the order of hours).

* I haven't yet benchmarked it

* Only tested on Linux (Fedora x86_64 and i686).  I don't know the
impact on windows (e.g. startup time without the envvar vs with the env
vars).

I'm seeing one failing test:
======================================================================
FAIL: test_clear_dict_in_ref_cycle (__main__.ModuleTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File
"/home/david/coding/python-hg/cpython-3.1-hash-randomization/Lib/test/test_module.py", line 79, in test_clear_dict_in_ref_cycle
    self.assertEqual(destroyed, [1])
AssertionError: Lists differ: [] != [1]
Files
File name Uploaded
optin-hash-randomization-for-3.1-dmalcolm-2012-01-27-001.patch dmalcolm, 2012-01-28.05:13:27
History
Date User Action Args
2012-01-28 05:13:40dmalcolmsetrecipients: + dmalcolm, lemburg, gvanrossum, tim.peters, loewis, barry, georg.brandl, terry.reedy, gregory.p.smith, jcea, mark.dickinson, pitrou, haypo, christian.heimes, benjamin.peterson, merwok, grahamd, Arfrever, v+python, alex, zbysz, skrah, gz, neologix, Arach, Mark.Shannon, eric.snow, Zhiping.Deng, Huzaifa.Sidhpurwala, Jim.Jewett, PaulMcMillan, fx5, skorgu
2012-01-28 05:13:39dmalcolmlinkissue13703 messages
2012-01-28 05:13:36dmalcolmcreate