This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: bz2/lzma: Compressor/Decompressor objects are only initialized in __init__
Type: crash Stage: patch review
Components: Extension Modules Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Aaron1011, Oren Milman, ZackerySpytz, corona10, izbyshev, malin, martin.panter, nadeem.vawda, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2015-01-12 00:58 by martin.panter, last changed 2022-04-11 14:58 by admin.

Files
File name Uploaded Description Edit
fix-lzma-bz2-segfaults.patch Aaron1011, 2016-12-02 02:54 review
Pull Requests
URL Status Linked Edit
PR 7822 open ZackerySpytz, 2018-06-20 10:51
Messages (9)
msg233866 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-01-12 00:58
Noticed in a patch review around LZMModules/_lzmamodule.c:1055 that the C-level LZMADecompressor object is being initialized in an __init__() method. It crashes if you create the object with __new__() and never call __init__():

>>> from lzma import LZMADecompressor
>>> LZMADecompressor.__new__(LZMADecompressor).decompress(bytes())
Segmentation fault (core dumped)
[Exit 139]
msg233900 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-01-13 01:43
A patch for this might conflict with the LZMA patch for Issue 15955, so it would be simplest to wait for that issue to be resolved first
msg236355 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-21 10:03
Similar situation in the bzip module:

>>> BZ2Decompressor.__new__(BZ2Decompressor).decompress(bytes())
Segmentation fault (core dumped)
msg282216 - (view) Author: Aaron Hill (Aaron1011) * Date: 2016-12-02 02:54
I've upload a patch which should address the issue in both the lzma and bz2 modules.
msg282254 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-12-02 20:07
This is not the only issue. Calling __init__ multiple times causes leaking locks, bz2/lzma internal buffers, etc.

It looks to me that the most straightforward way is using __new__ instead of __init__.
msg320146 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-21 09:41
Calling directly __new__() is very unusual, so I'm not surprised that you found bugs. Calling __init__() twice is also an unusal practice. I concur with Serhiy: implementing new instead of init should prevent such bug. I recall that some objects decided instead to raise an exception if they are not fully initialized (if init has not been called), but using new instead of init seems to be safer approach if we can implement it.

In the meanwhile, I blocked by the memory leak (handle leak) in bz2 and lzma: bpo-33916. I proposed a very simple fix: PR 7843. I propose to apply this one since it's easy to backport it to all branches, whereas replacing init with new would be more difficult and risky to backport: I would suggest to only make this change in the master.

What do you think?
msg325698 - (view) Author: Alexey Izbyshev (izbyshev) * (Python triager) Date: 2018-09-18 23:51
bz2 in 2.7 is also affected.

Victor, do we want to fix the crash at all in stable branches? If yes, IMHO taking the slight risk of __init__ -> __new__ change is preferable to taking the trouble to implement the alternative backwards-compatible fix (init checks in all methods) for stable branches.
msg365920 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2020-04-07 17:20
The issue is not solved yet.
@ZackerySpytz would you like to finalize this issue?
msg408942 - (view) Author: Ma Lin (malin) * Date: 2021-12-20 03:23
These can be done in .__new__() method:
- create thread lock
- create (de)?compression context
- initialize (de)?compressor states

In .__init__() method, only set (de)?compression parameters. And prevent .__init__() method from being called multiple times. 

This mode works fine in my pyzstd module (A Python bindings to zstd library).
But I think very few people will encounter this problem, we can leave it.
History
Date User Action Args
2022-04-11 14:58:11adminsetgithub: 67413
2021-12-20 03:23:36malinsetnosy: + malin
messages: + msg408942
2021-12-19 14:57:17ajaksu2setversions: + Python 3.10, Python 3.11, - Python 3.7, Python 3.8
2020-04-07 17:20:19corona10setmessages: + msg365920
2020-04-07 17:18:57corona10setversions: + Python 3.9, - Python 2.7, Python 3.6
2020-04-07 17:18:25corona10setnosy: + corona10
2018-09-18 23:51:59izbyshevsetversions: + Python 2.7
nosy: + izbyshev
title: LZMADecompressor object is only initialized in __init__ -> bz2/lzma: Compressor/Decompressor objects are only initialized in __init__
messages: + msg325698

2018-09-18 23:09:12berker.peksaglinkissue34729 superseder
2018-06-21 09:41:25vstinnersetnosy: + vstinner
messages: + msg320146
2018-06-20 11:07:02ZackerySpytzsetnosy: + ZackerySpytz

versions: + Python 3.8
2018-06-20 10:51:47ZackerySpytzsetstage: needs patch -> patch review
pull_requests: + pull_request7430
2017-10-23 15:53:29serhiy.storchakasetnosy: + Oren Milman

versions: - Python 3.5
2016-12-02 20:07:54serhiy.storchakasetmessages: + msg282254
versions: + Python 3.6, Python 3.7, - Python 3.4
2016-12-02 02:54:26Aaron1011setfiles: + fix-lzma-bz2-segfaults.patch

nosy: + Aaron1011
messages: + msg282216

keywords: + patch
2015-02-21 10:03:13martin.pantersetmessages: + msg236355
2015-01-13 01:43:25martin.pantersetmessages: + msg233900
2015-01-12 06:43:14serhiy.storchakasetnosy: + nadeem.vawda, serhiy.storchaka
stage: needs patch

versions: + Python 3.5
2015-01-12 00:58:28martin.pantercreate