This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author terry.reedy
Recipients rhettinger, terry.reedy, vstinner
Date 2018-02-15.00:23:33
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1518654213.84.0.467229070634.issue32846@psf.upfronthosting.co.za>
In-reply-to
Content
https://metarabbit.wordpress.com/2018/02/05/pythons-weak-performance-matters/, a blog post on cpython speed, clains "deleting a set of 1 billion strings takes >12 hours".  (No other details provided.)

I don't have the 100+ gigabytes of ram needed to confirm this, but with installed 64 bit 3.7.0b1 with Win10 and 12 gigabyes, I confirmed that there is a pronounced super-linear growth in string set deletion (unlike with an integer set).  At least half of ram was available.

      Seconds to create and delete sets
millions    integers        strings
of items  create delete  create delete
   1         .08    .02     .36    .08
   2         .15    .03     .75    .17
   4         .30    .06    1.55    .36
   8         .61    .12    3.18    .76
  16        1.22    .24    6.48   1.80  < slightly more than double
  32        2.4     .50   13.6    5.56  < more than triple
  64        4.9    1.04   28     19     < nearly quadruple
 128       10.9    2.25    <too large>
 100                      56     80     < quadruple with 1.5 x size

For 100 million strings, I got about the same 56 and 80 seconds when timing with a clock, without the timeit gc suppression.  I interrupted the 128M string run after several minutes.  Even if there is swapping to disk during creation, I would not expect it during deletion.

The timeit code:

import timeit

for i in (1,2,4,8,16,32,64,128):
    print(i, 'int')
    print(timeit.Timer(f's = {{n for n in range({i}*1000000)}}')
          .timeit(number=1))
    print(timeit.Timer('del s', f's = {{n for n in range({i}*1000000)}}')
          .timeit(number=1))

for i in (1,2,4,8,16,32,64,100):
    print(i, 'str')
    print(timeit.Timer(f's = {{str(n) for n in range({i}*1000000)}}')
          .timeit(number=1))
    print(timeit.Timer('del s', f's = {{str(n) for n in range({i}*1000000)}}')
          .timeit(number=1))

Raymond, I believe you monitor the set implementation, and I know Victor is interested in timing and performance.
History
Date User Action Args
2018-02-15 00:23:33terry.reedysetrecipients: + terry.reedy, rhettinger, vstinner
2018-02-15 00:23:33terry.reedysetmessageid: <1518654213.84.0.467229070634.issue32846@psf.upfronthosting.co.za>
2018-02-15 00:23:33terry.reedylinkissue32846 messages
2018-02-15 00:23:33terry.reedycreate