Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pickle.load expects an object that implements readinto #83862

Closed
NathanGoldbaum mannequin opened this issue Feb 19, 2020 · 9 comments
Closed

pickle.load expects an object that implements readinto #83862

NathanGoldbaum mannequin opened this issue Feb 19, 2020 · 9 comments
Assignees
Labels
3.8 only security fixes 3.9 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@NathanGoldbaum
Copy link
Mannequin

NathanGoldbaum mannequin commented Feb 19, 2020

BPO 39681
Nosy @pitrou, @ambv, @miss-islington
PRs
  • bpo-39681: Fix C pickle regression with minimal file-like objects #18592
  • [3.8] bpo-39681: Fix C pickle regression with minimal file-like objects (GH-18592) #18630
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/pitrou'
    closed_at = <Date 2020-02-24.13:06:05.266>
    created_at = <Date 2020-02-19.01:04:48.988>
    labels = ['3.8', 'type-bug', 'library', '3.9']
    title = 'pickle.load expects an object that implements readinto'
    updated_at = <Date 2020-02-24.16:19:04.853>
    user = 'https://bugs.python.org/NathanGoldbaum'

    bugs.python.org fields:

    activity = <Date 2020-02-24.16:19:04.853>
    actor = 'Nathan.Goldbaum'
    assignee = 'pitrou'
    closed = True
    closed_date = <Date 2020-02-24.13:06:05.266>
    closer = 'pitrou'
    components = ['Library (Lib)']
    creation = <Date 2020-02-19.01:04:48.988>
    creator = 'Nathan.Goldbaum'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 39681
    keywords = ['patch']
    message_count = 9.0
    messages = ['362242', '362259', '362281', '362291', '362400', '362546', '362548', '362585', '362596']
    nosy_count = 4.0
    nosy_names = ['pitrou', 'lukasz.langa', 'Nathan.Goldbaum', 'miss-islington']
    pr_nums = ['18592', '18630']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue39681'
    versions = ['Python 3.8', 'Python 3.9']

    @NathanGoldbaum
    Copy link
    Mannequin Author

    NathanGoldbaum mannequin commented Feb 19, 2020

    As of #7076, it looks like at least the C implementation of pickle.load expects the file argument to implement readinto:

    cpython/Modules/_pickle.c

    Lines 1617 to 1622 in ffd9753

    if (!self->readline || !self->readinto || !self->read) {
    if (!PyErr_Occurred()) {
    PyErr_SetString(PyExc_TypeError,
    "file must have 'read', 'readinto' and "
    "'readline' attributes");
    }

    This is a change in behavior relative to previous versions of Python and I don't see it mentioned in PEP-574 or in the pull request so I'm not sure why it was changed.

    This change breaks some PyTorch tests (see pytorch/pytorch#32289) and, at least one PyTorch user, although I don't have full details there.

    I can try to fix this on the PyTorch side but I first want to check that this was an intentional change on the Python side of things.

    @NathanGoldbaum NathanGoldbaum mannequin added 3.8 only security fixes stdlib Python modules in the Lib dir labels Feb 19, 2020
    @pitrou
    Copy link
    Member

    pitrou commented Feb 19, 2020

    Right, I think at some point during the PEP-574 implementation review, we decided that the input object had to be a proper file implementation
    (for example, deriving from io.BufferedIOBase).

    I think it should be easy for PyTorch to fix this internally (by deriving from the proper io base class), but we can also add a fix to pickle.

    @pitrou pitrou added 3.9 only security fixes labels Feb 19, 2020
    @NathanGoldbaum
    Copy link
    Mannequin Author

    NathanGoldbaum mannequin commented Feb 19, 2020

    In this case the tests are explicitly testing that a file-like object that does not implement readinto works with torch.load (which is using pickles under the hood). See https://github.com/pytorch/pytorch/blob/master/test/test_serialization.py#L416-L429, in particular the usage of FilelikeMock with has_readinto set to False.

    This goes back to pytorch/pytorch#5466, however I'm not sure offhand why it was decided to explicitly test file-like objects that don't implement readinto. I don't know how many consumers there are for this API in pytorch, but there is at least one user who has reported an error traceback caused by this change. I'm hoping to hear more to see whether there are any libraries that depend on pytorch and are implicitly depending on this functionality.

    Googling the error message that Python3.8 emits in this situation also leads me to this bug report in a different library: web2py/py4web#77

    @NathanGoldbaum
    Copy link
    Mannequin Author

    NathanGoldbaum mannequin commented Feb 19, 2020

    So I *think* I've pieced together what caused the user crash that originated in the flair library. It turns out that pickle.load, via torch.load, is getting passed an mmap.mmap.

    https://github.com/flairNLP/flair/blob/1d44abf35f1122d1e146c9a219b2cb5f98b41b40/flair/file_utils.py#L25-L36

    https://github.com/flairNLP/flair/blob/1d44abf35f1122d1e146c9a219b2cb5f98b41b40/flair/nn.py#L85-L86

    Since mmap doesn't implement readinto, pickle.load objects as of Python 3.8. This is new behavior in Python3.8, it used to be possible to load a memory-mapped pickle file.

    Short repro script:

    import pickle
    import mmap
    data = "some data"
    
    with open('my_data.pkl', 'wb') as f:
        pickle.dump(data, f)
    
    with open("my_data.pkl", "r+b") as f_in:
        mm = mmap.mmap(f_in.fileno(), 0)
    
    print(pickle.load(mm))

    On Python3.8, this script prints an error, on Python3.7 it prints "some data".

    @pitrou
    Copy link
    Member

    pitrou commented Feb 21, 2020

    Well, in the mmap case, this is trivially fixed by writing:

    with open("my_data.pkl", "r+b") as f_in:
        mm = mmap.mmap(f_in.fileno(), 0)
    
    print(pickle.loads(memoryview(mm)))
    

    It will also be faster this way, since the pickle will be read directly from memory instead of going through read() calls.

    This makes me think that perhaps this issue isn't as bad: it forces people to abandon bad idioms :-)

    @pitrou pitrou self-assigned this Feb 21, 2020
    @pitrou pitrou added the type-bug An unexpected behavior, bug, or error label Feb 21, 2020
    @pitrou pitrou self-assigned this Feb 21, 2020
    @pitrou pitrou added the type-bug An unexpected behavior, bug, or error label Feb 21, 2020
    @ambv
    Copy link
    Contributor

    ambv commented Feb 23, 2020

    New changeset 9f37872 by Antoine Pitrou in branch 'master':
    bpo-39681: Fix C pickle regression with minimal file-like objects (bpo-18592)
    9f37872

    @ambv
    Copy link
    Contributor

    ambv commented Feb 23, 2020

    New changeset b19f7ec by Miss Islington (bot) in branch '3.8':
    bpo-39681: Fix C pickle regression with minimal file-like objects (GH-18592) (bpo-18630)
    b19f7ec

    @pitrou
    Copy link
    Member

    pitrou commented Feb 24, 2020

    The regression is fixed in git master and in the 3.8 branch. However, I would still advice against the idiom of handling a mmap object as a regular file when using pickle.

    @pitrou pitrou closed this as completed Feb 24, 2020
    @pitrou pitrou closed this as completed Feb 24, 2020
    @NathanGoldbaum
    Copy link
    Mannequin Author

    NathanGoldbaum mannequin commented Feb 24, 2020

    Thank you for the fix! Yes I'm planning to file an issue with flair about this and patch this use case in PyTorch itself.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes 3.9 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants