Message 375688 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	rhpvorderman
Recipients	gregory.p.smith, jack1142, jfrances21, rhpvorderman, scoder
Date	2020-08-20.06:43:17
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1597905798.04.0.584497168286.issue41566@roundup.psfhosted.org>
In-reply-to

Content
> Within the stdlib, I'd focus only on using things that can be used in a 100% api compatible way with the existing modules. > Otherwise creating a new module and putting it up on PyPI to expose the functionality from the libraries you want makes sense and will be easier to make available to everyone on existing Python versions rather than waiting on getting something into CPython. Agreed. 100% backwards compatibility is more important than speed. And getting it available to users is faster as a module than implementing it in CPython. I already have a PR open in the xopen module to implement the use of the igzip program. Implementing a module like the gzip module in CPython will be more work, but I will certainly consider it. Thanks for the suggestion! > There is a caveat to using any of these: how well maintained and security vetted are all of the code paths in the implementation? zlib proper gets massive security attention. Its low rate of change and staleness are a feature. I didn't consider that. So I looked at the CVE page for ZLIB. The latest issues are from 2017. Before that 2005. This is the 2017 report: https://wiki.mozilla.org/images/0/09/Zlib-report.pdf. Note how it states that the old compiler support etc. are a basis for vulnerabilities. Precisely zlib-ng did get rid of these parts. On the other hand, Mozilla notes that Zlib is a great candidate for periodic security audits, precisely for the same reasons you mention. > FWIW I tend to avoid software provided by Intel given any other choice. I know the feeling. They rightfully have a very bad reputation for things they did in the past. But this particular code is open source and compilable with open source compilers. Part of it is written in Assembly, to get the speed advantage. I benchmarked it on my AMD processor and I too get enormous speed advantages out of it. > even less about Intels self serving arm-ignoring oddity. They do provide instructions to build for arm. Right on their README. https://github.com/intel/isa-l#building-isa-l. I think it is very unfair to be so dismissive just because Intel pays the developers. This is good work, which speeds up bioinformatics workloads, which in turn helps us to help more patients. On the whole I think the arguments to make a module are very strong. So I think that is the appropriate way forward. I'd love everyone to switch to more efficient deflate algorithms, but CPython may not be the right project to drive this change. At least this discussion is now here as a reference for other people who are curious about improving this performance aspect.

> Within the stdlib, I'd focus only on using things that can be used in a 100% api compatible way with the existing modules.

> Otherwise creating a new module and putting it up on PyPI to expose the functionality from the libraries you want makes sense and will be easier to make available to everyone on existing Python versions rather than waiting on getting something into CPython.

Agreed. 100% backwards compatibility is more important than speed. And getting it available to users is faster as a module than implementing it in CPython. 

I already have a PR open in the xopen module to implement the use of the igzip program. Implementing a module like the gzip module in CPython will be more work, but I will certainly consider it. Thanks for the suggestion! 

> There is a caveat to using any of these: how well maintained and security vetted are all of the code paths in the implementation?  zlib proper gets massive security attention.  Its low rate of change and staleness are a feature.

I didn't consider that. So I looked at the CVE page for ZLIB. The latest issues are from 2017. Before that 2005. This is the 2017 report: https://wiki.mozilla.org/images/0/09/Zlib-report.pdf. 
Note how it states that the old compiler support etc. are a basis for vulnerabilities. Precisely zlib-ng did get rid of these parts. On the other hand, Mozilla notes that Zlib is a great candidate for periodic security audits, precisely for the same reasons you mention.

> FWIW I tend to avoid software provided by Intel given any other choice.

I know the feeling. They rightfully have a very bad reputation for things they did in the past. But this particular code is open source and compilable with open source compilers. Part of it is written in Assembly, to get the speed advantage. I benchmarked it on my AMD processor and I too get enormous speed advantages out of it.

>  even less about Intels self serving arm-ignoring oddity.

They *do* provide instructions to build for arm. Right on their README. https://github.com/intel/isa-l#building-isa-l. I think it is very unfair to be so dismissive just because Intel pays the developers. This is good work, which speeds up bioinformatics workloads, which in turn helps us to help more patients.

On the whole I think the arguments to make a module are very strong. So I think that is the appropriate way forward. I'd love everyone to switch to more efficient deflate algorithms, but CPython may not be the right project to drive this change. At least this discussion is now here as a reference for other people who are curious about improving this performance aspect.

History
Date	User	Action	Args
2020-08-20 06:43:18	rhpvorderman	set	recipients: + rhpvorderman, gregory.p.smith, scoder, jack1142, jfrances21
2020-08-20 06:43:18	rhpvorderman	set	messageid: <1597905798.04.0.584497168286.issue41566@roundup.psfhosted.org>
2020-08-20 06:43:18	rhpvorderman	link	issue41566 messages
2020-08-20 06:43:17	rhpvorderman	create