This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author carlosfranzreb
Recipients carlosfranzreb
Date 2021-06-15.10:26:56
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1623752817.02.0.343634449589.issue44424@roundup.psfhosted.org>
In-reply-to
Content
I am trying to lazily load items from a compressed file that resides in Zenodo. My goal is to iteratively yield the items without storing the file in my computer. My problem is that an EOFerror occurs right after the first non-empty line is read. How can I overcome this issue?

Here is my code:

    import requests as req
    import json
    from bz2 import BZ2Decompressor


    def lazy_load(file_url):
        dec = BZ2Decompressor()
        with req.get(file_url, stream=True) as res:
            for chunk in res.iter_content(chunk_size=1024):
                data = dec.decompress(chunk).decode('utf-8')
                # do something with 'data'


    if __name__ == "__main__":
        creds = json.load(open('credentials.json'))
        url = 'https://zenodo.org/api/records/'
        id = '4617285'
        filename = '10.Papers.nt.bz2'
        res = req.get(f'{url}{id}', params={'access_token': creds['zenodo_token']})
        for file in res.json()['files']:
        if file['key'] == filename:
            for item in lazy_load(file['links']['self']):
                # do something with 'item'

The error I become is the following:

    Traceback (most recent call last):
    File ".\mag_loader.py", line 51, in <module>
      for item in lazy_load(file['links']['self']):
    File ".\mag_loader.py", line 18, in lazy_load
      data = dec.decompress(chunk)
    EOFError: End of stream already reache

To run the code you need a Zenodo access token, for which you need an account. Once you have logged in, you can create the token here: https://zenodo.org/account/settings/applications/tokens/new/
History
Date User Action Args
2021-06-15 10:26:57carlosfranzrebsetrecipients: + carlosfranzreb
2021-06-15 10:26:57carlosfranzrebsetmessageid: <1623752817.02.0.343634449589.issue44424@roundup.psfhosted.org>
2021-06-15 10:26:57carlosfranzreblinkissue44424 messages
2021-06-15 10:26:56carlosfranzrebcreate