Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make shutil.copytree use os.scandir to take advantage of cached is_(dir|file|symlink) #77595

Closed
andresdelfino opened this issue May 3, 2018 · 5 comments
Labels
3.8 only security fixes stdlib Python modules in the Lib dir

Comments

@andresdelfino
Copy link
Contributor

BPO 33414
Nosy @vstinner, @giampaolo, @serhiy-storchaka, @andresdelfino
PRs
  • bpo-33414: Use os.scandir to take advantage of cached is_(dir|file|symlink) #6697
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-05-09.10:17:47.168>
    created_at = <Date 2018-05-03.04:09:33.488>
    labels = ['3.8', 'library']
    title = 'Make shutil.copytree use os.scandir to take advantage of cached is_(dir|file|symlink)'
    updated_at = <Date 2018-05-30.03:52:53.566>
    user = 'https://github.com/andresdelfino'

    bugs.python.org fields:

    activity = <Date 2018-05-30.03:52:53.566>
    actor = 'giampaolo.rodola'
    assignee = 'none'
    closed = True
    closed_date = <Date 2018-05-09.10:17:47.168>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2018-05-03.04:09:33.488>
    creator = 'adelfino'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 33414
    keywords = ['patch']
    message_count = 5.0
    messages = ['316108', '316111', '316122', '316279', '316298']
    nosy_count = 4.0
    nosy_names = ['vstinner', 'giampaolo.rodola', 'serhiy.storchaka', 'adelfino']
    pr_nums = ['6697']
    priority = 'normal'
    resolution = 'rejected'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue33414'
    versions = ['Python 3.8']

    @andresdelfino
    Copy link
    Contributor Author

    Right now we are using os.path.is(dir|file|symlink) in shutil.copytree, but taking advantage of os.scandir's is_(dir|file|symlink) seems the way to go.

    I'll make a PR with to start the discussion with a proof of concept.

    @andresdelfino andresdelfino added 3.8 only security fixes stdlib Python modules in the Lib dir labels May 3, 2018
    @serhiy-storchaka
    Copy link
    Member

    Could you show an example that exposes a significant overhead of the current implementation?

    There were reasons of using os.scandir() in os.walk() and glob.glob(). You may scan thousands of files and performs a heavy I/O only with few of them. There was smaller reason of using it in shutil.rmtree() -- removing an entry from the directory is fast, but likely you will spent much more time for creating a tree.

    But I don't know reasons of using it in shutil.copytree() in which you need to performs a heavy I/O with every file and directory.

    @andresdelfino
    Copy link
    Contributor Author

    To be frank, I just searched the tree for uses of listdir() combined with isdir()/isfile()/issymlink(). Thought that using scandir() would make sense, and didn't think of a reason for not using it. That being said, I cannot state a case and I'll be happy to close the PR if there's no need for this :)

    @serhiy-storchaka
    Copy link
    Member

    The code with using scandir() is more complex and is different enough from the code with using listdir() for having significant risk of introducing bugs. Also using scandir() introduces a risk of leaking file descriptors.

    We don't rewrite the code without good reasons. If you propose performance enhancement, please provide benchmark results that expose the benefit of this change. If it is not large enough, it is not worth to do.

    Actually your change looks making the code slower: it reads the directory twice.

    @andresdelfino
    Copy link
    Contributor Author

    I believe you are right about being conservative with changing working code, specially when there's no proof about performance being improved.

    I think its best to close this ticket. I'd do it myself, but I know what resolution applies here.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants