Author nh2
Recipients nh2, pitrou, serhiy.storchaka
Date 2017-12-31.19:33:56
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1514748837.06.0.467229070634.issue32453@psf.upfronthosting.co.za>
In-reply-to
Content
Serhiy, did you run your benchmark on an SSD or a spinning disk?

The coreutils bug mentions that the problem is seek times.

My tests on a spinning disk with 400k files suggest that indeed rmtree() is ~30x slower than `rm -r`:

    # time (mkdir dirtest && cd dirtest && seq 1 100000 | xargs touch)

    real  0m0.722s
    user  0m0.032s
    sys 0m0.680s

    # time rm -rf dirtest/

    real  0m0.519s
    user  0m0.074s
    sys 0m0.437s

    # time (mkdir dirtest && cd dirtest && seq 1 100000 | xargs touch)

    real  0m0.693s
    user  0m0.039s
    sys 0m0.659s

    # time python -c 'import shutil; shutil.rmtree("dirtest")'

    real  0m0.756s
    user  0m0.225s
    sys 0m0.499s

    # time (mkdir dirtest && cd dirtest && seq 1 100000 | xargs touch)

    real  0m0.685s
    user  0m0.032s
    sys 0m0.658s

    # time python3 -c 'import shutil; shutil.rmtree("dirtest")'

    real  0m0.965s
    user  0m0.424s
    sys 0m0.528s

    # time (mkdir dirtest && cd dirtest && seq 1 400000 | xargs touch)

    real  0m4.249s
    user  0m0.098s
    sys 0m2.804s

    # time rm -rf dirtest/

    real  0m10.782s
    user  0m0.265s
    sys 0m2.213s

    # time (mkdir dirtest && cd dirtest && seq 1 400000 | xargs touch)

    real  0m5.236s
    user  0m0.107s
    sys 0m2.832s

    # time python -c 'import shutil; shutil.rmtree("dirtest")'

    real  3m8.006s
    user  0m1.323s
    sys 0m3.929s

    # time (mkdir dirtest && cd dirtest && seq 1 400000 | xargs touch)

    real  0m4.671s
    user  0m0.097s
    sys 0m2.832s

    # time python3 -c 'import shutil; shutil.rmtree("dirtest")'

    real  2m49.476s
    user  0m2.196s
    sys 0m3.695s

The tests were done with coreutils rm 8.28, Python 2.7.14, Python 3.6.3,  on ext4 (rw,relatime,data=ordered), on a dmraid RAID1 across 2 WDC_WD4000FYYZ disks (WD 4 TB Enterprise).

Also note how deleting 100k files takes ~0.5 seconds with `rm -r` and the Pythons, but deleting 4x more files takes 20x longer with `rm -r` and ~300x longer with the Pythons.

There is clearly some boundary below which we are hitting some nice cached behaviour.
History
Date User Action Args
2017-12-31 19:33:57nh2setrecipients: + nh2, pitrou, serhiy.storchaka
2017-12-31 19:33:57nh2setmessageid: <1514748837.06.0.467229070634.issue32453@psf.upfronthosting.co.za>
2017-12-31 19:33:57nh2linkissue32453 messages
2017-12-31 19:33:56nh2create