Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preventing errors of simultaneous access in zipfile #60773

Closed
serhiy-storchaka opened this issue Nov 28, 2012 · 10 comments
Closed

Preventing errors of simultaneous access in zipfile #60773

serhiy-storchaka opened this issue Nov 28, 2012 · 10 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@serhiy-storchaka
Copy link
Member

BPO 16569
Nosy @jcea, @dw, @serhiy-storchaka
Files
  • zipfile_simultaneous.patch
  • patch: Representative modification to zipfile.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2014-12-03.07:32:37.932>
    created_at = <Date 2012-11-28.14:21:51.432>
    labels = ['type-feature', 'library']
    title = 'Preventing errors of simultaneous access in zipfile'
    updated_at = <Date 2014-12-03.07:32:37.931>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2014-12-03.07:32:37.931>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2014-12-03.07:32:37.932>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2012-11-28.14:21:51.432>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = ['28147', '37172']
    hgrepos = []
    issue_num = 16569
    keywords = ['patch']
    message_count = 10.0
    messages = ['176544', '176546', '176548', '176849', '176852', '176856', '177006', '177037', '230998', '232071']
    nosy_count = 5.0
    nosy_names = ['jcea', 'alanmcintyre', 'kasal', 'dw', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'rejected'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue16569'
    versions = ['Python 3.4']

    @serhiy-storchaka
    Copy link
    Member Author

    If the ZipFile was created by passing in a file-like object as the first argument to the constructor, then simultaneous reading or writing of different file results in an non-consistent state. There is a warning about this in the documentation. The proposed patch forces this condition, raising the early exception if you attempt to simultaneously access.

    I'm not sure whether it's worth apply to older versions.

    @serhiy-storchaka serhiy-storchaka added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Nov 28, 2012
    @jcea
    Copy link
    Member

    jcea commented Nov 28, 2012

    I am -0 to this. We can't prevent programmers for shotting in the foot.

    @serhiy-storchaka
    Copy link
    Member Author

    Reading from closed ZipFile or reading from ZipFile opened for write already forbidden. This is a preventing of the same kind.

    @kasal
    Copy link
    Mannequin

    kasal mannequin commented Dec 3, 2012

    I agree that reading from a file open for write should be forbidden, no matter whether ZipFile was called with fp or a name.

    Actually, it is not yet forbidden, and two of the tests in the zipfile.py test suite do actually rely on this misfeature.
    The first chunk in the patch http://bugs.python.org/file24624/Proposed-fix-of-issue14099-second.patch contains a fix for this bug in test suite.

    OTOH, decompressing several files for a given zip file simultaneously does not sound that bad. You know, with all the current file managers, people look at a zip as if it were kind of a directory.

    @serhiy-storchaka
    Copy link
    Member Author

    Actually, it is not yet forbidden, and two of the tests in the zipfile.py test suite do actually rely on this misfeature.

    Indeed. I missed that.

    Actually these tests work by accident, due to the fact that the contents of the zipfile is placed in the file object buffer.

    OTOH, decompressing several files for a given zip file simultaneously does not sound that bad. You know, with all the current file managers, people look at a zip as if it were kind of a directory.

    I agree, but I'm afraid it's impossible to do without performance regression due to seek before every read. And for now ZipFile is not support simultaneous reading when external file object used. Also ZipFile is not thread-safe in any case. You can open several ZipFiles for simultaneous reading.

    @kasal
    Copy link
    Mannequin

    kasal mannequin commented Dec 3, 2012

    but I'm afraid it's impossible to do without performance regression due to seek before every read.

    I agree that this is key question.

    I would hope that the performance hit wouldn't be so bad, unless there are actually two decompressions running concurrently.
    So we can have an implementation that is generally correct, though some use scenarios result in slow execution.

    OTOH, if the seek() call were a problem even if the new position is the same as the old one, they can be optimized out by a simple wrapper around fp.

    @jcea
    Copy link
    Member

    jcea commented Dec 5, 2012

    Seek can be very cheap. Anybody could actually measure it???

    @serhiy-storchaka
    Copy link
    Member Author

    Seek can be very cheap. Anybody could actually measure it???

    I am waiting for an updated patch for bpo-14099 to make benchmarks.

    @serhiy-storchaka serhiy-storchaka self-assigned this Dec 29, 2012
    @dw
    Copy link
    Mannequin

    dw mannequin commented Nov 11, 2014

    Compared to the cost of everything else ZipExtFile must do (e.g. 4kb string concatenation in a loop, zlib), its surprising that lseek() would measurable at all.

    The attached file 'patch' is the minimal change I tested. It represents, in terms of computation and system call overhead, all required to implement the "seek before read" solution to simultaneous access. On OSX, churning over ever member of every ZIP in my downloads directory (about 400mb worth), this change results in around 0.9% overhead compared to the original module.

    Subsequently I'm strongly against the patch here. It is in effect papering over an implementation deficiency of the current zipfile module, one that could easily and cheaply be addressed.

    (My comment on this ticket is in the context of the now-marked-duplicate bpo-22842).

    @serhiy-storchaka
    Copy link
    Member Author

    Closed in favor of bpo-14099.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants