This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author luis@luispedro.org
Recipients lemburg, luis@luispedro.org, methane, rhettinger, serhiy.storchaka, terry.reedy, tim.peters, twouters, vstinner
Date 2018-02-16.15:47:43
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1518796064.36.0.467229070634.issue32846@psf.upfronthosting.co.za>
In-reply-to
Content
Original poster here.

The benchmark is artificial, but the problem setting is not. I did have a problem that is roughly:

    interesting = set(line.strip() for line in open(...))
    for line in open(...):
        key,rest = line.split('\t', 1)
        if key in interesting:
             process(rest)

Deleting the set (when it goes out of scope) was a significant chunk of the time. Surprisingly, deleting a very large set takes much longer than creating it.

Here are my controlled measurements (created with the attached script, which itself uses Jug http://jug.rtfd.io and assumes a file `input.txt` is present).


N                create (s)     delete (s)
           1         0.00         0.00
          10         0.00         0.00
         100         0.00         0.00
        1000         0.00         0.00
       10000         0.01         0.00
      100000         0.15         0.01
     1000000         1.14         0.12
    10000000        11.44         2.24
   100000000       126.41        70.34
   200000000       198.04       258.44
   300000000       341.27       646.81
   400000000       522.70      1044.97
   500000000       564.07      1744.54
   600000000      1380.04      3364.06
   700000000      1797.82      3300.20
   800000000      1294.20      4410.22
   900000000      3045.38      7646.59
  1000000000      3467.31     11042.97
  1100000000      5162.35     13750.22
  1200000000      6581.90     12544.67
  1300000000      1612.60     17204.67
  1400000000      1788.13     23772.84
  1500000000      6922.16     25068.49
History
Date User Action Args
2018-02-16 15:47:44luis@luispedro.orgsetrecipients: + luis@luispedro.org, lemburg, tim.peters, twouters, rhettinger, terry.reedy, vstinner, methane, serhiy.storchaka
2018-02-16 15:47:44luis@luispedro.orgsetmessageid: <1518796064.36.0.467229070634.issue32846@psf.upfronthosting.co.za>
2018-02-16 15:47:44luis@luispedro.orglinkissue32846 messages
2018-02-16 15:47:44luis@luispedro.orgcreate