This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author terry.reedy
Recipients lemburg, luis@luispedro.org, methane, rhettinger, serhiy.storchaka, terry.reedy, tim.peters, twouters, vstinner
Date 2018-02-16.19:57:30
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1518811050.87.0.467229070634.issue32846@psf.upfronthosting.co.za>
In-reply-to
Content
Luís, what you really need for the problem you outline is an immutable set with one operation, 'in', aside from create and delete.  If specialized to strings only, it could internally stores only character sequences, rather than Python objects.  I suspect someone has written such a class (in C), but did not find anything on pypi.python.org.

Lacking that, the following experiment suggests that you might be able to restore near O(n) time by partitioning the set of keys into a collection of sets.  My 16M set took 6.48 and 1.80 seconds.  Times 4 is 25.9 and 7.2.  My 64M set took 28 and 19 seconds, but 4 16M sets take 26.3 and 7.5 seconds, only about 5% more than the x4 target.

print(timeit.Timer(
    's = [{str(n) for n in range(i*1000000, (i+16)*1000000)}'
	        ' for i in (0, 16, 32, 48)]')
      .timeit(number=1))
print(timeit.Timer(
    'del s',
    's = [{str(n) for n in range(i*1000000, (i+16)*1000000)}'
	        ' for i in (0, 16, 32, 48)]')
      .timeit(number=1))
History
Date User Action Args
2018-02-16 19:57:30terry.reedysetrecipients: + terry.reedy, lemburg, tim.peters, twouters, rhettinger, vstinner, luis@luispedro.org, methane, serhiy.storchaka
2018-02-16 19:57:30terry.reedysetmessageid: <1518811050.87.0.467229070634.issue32846@psf.upfronthosting.co.za>
2018-02-16 19:57:30terry.reedylinkissue32846 messages
2018-02-16 19:57:30terry.reedycreate