Title: Item Count Error in Shelf
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.7, Python 3.6
Status: closed Resolution: third party
Dependencies: Superseder:
Assigned To: Nosy List: eric.smith, jessembacon
Priority: normal Keywords:

Created on 2019-06-12 00:13 by jessembacon, last changed 2019-06-13 16:29 by SilentGhost. This issue is now closed.

File name Uploaded Description Edit
KeyCount.png jessembacon, 2019-06-12 00:13 Screen shot of exercise
ShelfKeys.png jessembacon, 2019-06-12 15:25 Data Missing from Shelf
Python Proof.ipynb jessembacon, 2019-06-12 21:33 Jupyter Notebook with comments jessembacon, 2019-06-12 21:34 Test_Script
Messages (7)
msg345290 - (view) Author: Jesse Bacon (jessembacon) Date: 2019-06-12 00:13
I have loaded the National Vulnerability Database from NIST for 2019 and it includes 3989 JSON Documents.  This data I have placed in a shelf.  when I run len(db.keys()) I get 3658.  len(set(cves)) == 3989 : True

When I extract the data from the shelf I have the right amount of records, 3989.  I tested on python 3.7.3 and Python 3.6.5.  I am concerned this is going to ruin a metric in a security report.  For example, A risk exposure report may use the number of keys in a yearly vulnerability db as the baseline for a risk calculation which contrasts the number of patched CVE's.  

msg345291 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2019-06-12 00:16
Please do not post images: we can't copy and paste from them, and they're unfriendly to visually impaired users.

Can you create code that reproduces this? A small example, with no external dependencies would be best. Please attach the reproducer as a text file.
msg345369 - (view) Author: Jesse Bacon (jessembacon) Date: 2019-06-12 15:18
I am missing keys, when extracting the data back out with todays NVD pull.
KeyError                                  Traceback (most recent call last)
~/anaconda3/lib/python3.6/ in __getitem__(self, key)
    110         try:
--> 111             value = self.cache[key]
    112         except KeyError:

KeyError: 'CVE-2019-1842'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-62-aeb8a14b4774> in <module>
      1 results = []
      2 for x in raw_cves:
----> 3     results.append(db[x])

~/anaconda3/lib/python3.6/ in __getitem__(self, key)
    111             value = self.cache[key]
    112         except KeyError:
--> 113             f = BytesIO(self.dict[key.encode(self.keyencoding)])
    114             value = Unpickler(f).load()
    115             if self.writeback:

KeyError: b'CVE-2019-1842'
msg345381 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2019-06-12 16:56
This still isn't an example we can copy and paste to reproduce, so I'm going to be unable to help you. Sorry.

Again: please don't post images, for the reasons I previously stated.
msg345412 - (view) Author: Jesse Bacon (jessembacon) Date: 2019-06-12 21:39

The interpreter said something about passing a negative value when I converted the db.keys to a list.  I have attached a script in txt format and a Jupyter notebook for further analysis.  I apologize for posting images,  I just saw your note.  I'll go ahead and look at the shelve source while you determine if this information is sufficient. Thank you for your time.
msg345419 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2019-06-12 22:06
After fixing a missing import (import urllib.request), this is what I get:

$ /usr/local/bin/python3.6 
Fetching nvdcve-1.0-2019.json.gz
Storing Gzipped File
Loading JSON Content
4275 records
4275 unique records
Creating Shelve: cve_2019.shelf
Assembling Big Dictionary of 2019 Data in shelve
shelve reports 4275 unique records
Extracting data by keys from shelve
4275 extracted records
Number of missing records 0
data match

Are you seeing failures?

This is on a python3.6 that I compiled from source on an old Fedora box.

What OS are you using?
msg345525 - (view) Author: Jesse Bacon (jessembacon) Date: 2019-06-13 16:11
I was using anaconda distribution on OSX.  It failed for 3.6 and 3.7.  I pulled off anaconda and compiled from source and the script executed correctly regardless of whether or not "--enable-optimizations" was set.  Anaconda claims to be geared towards scientists so this is alarming.  Thank you for your time.
Date User Action Args
2019-06-13 16:29:52SilentGhostsetstatus: open -> closed
stage: resolved
2019-06-13 16:13:03jessembaconsetresolution: third party
2019-06-13 16:11:27jessembaconsetmessages: + msg345525
2019-06-12 22:06:49eric.smithsetmessages: + msg345419
2019-06-12 21:39:46jessembaconsetmessages: + msg345412
2019-06-12 21:34:45jessembaconsetfiles: +
2019-06-12 21:33:58jessembaconsetfiles: + Python Proof.ipynb
2019-06-12 16:56:44eric.smithsetmessages: + msg345381
2019-06-12 15:25:50jessembaconsetfiles: + ShelfKeys.png
2019-06-12 15:18:14jessembaconsetmessages: + msg345369
2019-06-12 00:16:01eric.smithsetnosy: + eric.smith
messages: + msg345291
2019-06-12 00:13:05jessembaconcreate