This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: stdlib wrongly uses len() for bytes-like object
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: closed Resolution: later
Dependencies: Superseder:
Assigned To: Nosy List: iritkatriel, malin, miss-islington, nadeem.vawda, pitrou, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2021-06-17 05:05 by malin, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 26764 merged malin, 2021-06-17 07:39
PR 26845 merged miss-islington, 2021-06-22 07:04
PR 26846 merged malin, 2021-06-22 07:35
PR 29468 merged malin, 2021-11-08 12:49
PR 31755 merged miss-islington, 2022-03-08 09:34
PR 31756 merged miss-islington, 2022-03-08 09:35
Messages (14)
msg395971 - (view) Author: Ma Lin (malin) * Date: 2021-06-17 05:05
If run this code, it will raise an exception: 

    import pickle
    import lzma
    import pandas as pd
    with lzma.open("test.xz", "wb") as file:
        pickle.dump(pd.DataFrame(range(1_000_000)), file, protocol=5)

The exception:

    Traceback (most recent call last):
      File "E:\testlen.py", line 7, in <module>
        pickle.dump(pd.DataFrame(range(1_000_000)), file, protocol=5)
      File "D:\Python39\lib\lzma.py", line 234, in write
        self._pos += len(data)
    TypeError: object of type 'pickle.PickleBuffer' has no len()
    
The exception is raised in lzma.LZMAFile.write() method:
https://github.com/python/cpython/blob/v3.10.0b2/Lib/lzma.py#L238
        
PickleBuffer doesn't have .__len__ method, is it intended?
msg395973 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-06-17 05:36
Oh, LZMAFile.write() should not use len() directly on input data because it does not always work correctly with memoryview and other objects supporting the buffer protocol. It should use memoryview(data).nbytes or data = memoryview(data).cast('B') if other byte-oriented operations (indexing, slicing) are used. See for example Lib/gzip.py, Lib/_pyio.py, Lib/_compression.py, Lib/ssl.py, Lib/socketserver.py, Lib/wave.py.
msg395976 - (view) Author: Ma Lin (malin) * Date: 2021-06-17 06:26
Ok, I'm working on a PR.
msg396305 - (view) Author: Ma Lin (malin) * Date: 2021-06-22 05:28
I am checking all the .py files in `Lib` folder.
hmac.py has two len() bugs:
https://github.com/python/cpython/blob/v3.10.0b3/Lib/hmac.py#L212
https://github.com/python/cpython/blob/v3.10.0b3/Lib/hmac.py#L214

I think PR 26764 is prepared, it fixes the len() bugs in bz2.py/lzma.py files.
msg396309 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-06-22 07:04
New changeset bc6c12c72a9536acc96e7b9355fd69d1083a43c1 by Ma Lin in branch 'main':
bpo-44439: BZ2File.write() / LZMAFile.write() handle buffer protocol correctly (GH-26764)
https://github.com/python/cpython/commit/bc6c12c72a9536acc96e7b9355fd69d1083a43c1
msg396334 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-06-22 13:57
New changeset 8bc26d8c9d092840054f57f9b4620de0d40d8423 by Ma Lin in branch '3.9':
bpo-44439: BZ2File.write()/LZMAFile.write() handle length correctly (GH-26846)
https://github.com/python/cpython/commit/8bc26d8c9d092840054f57f9b4620de0d40d8423
msg396335 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-06-22 13:59
Thank you for your contribution Ma Lin.
msg396336 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-06-22 14:00
New changeset 01858fbe31e8e0185edfbd3f10172f7c61391c9d by Miss Islington (bot) in branch '3.10':
bpo-44439: BZ2File.write() / LZMAFile.write() handle buffer protocol correctly (GH-26764) (GH-26845)
https://github.com/python/cpython/commit/01858fbe31e8e0185edfbd3f10172f7c61391c9d
msg405948 - (view) Author: Ma Lin (malin) * Date: 2021-11-08 13:11
Serhiy Storchaka:

Sorry, I found `zipfile` module also has this bug, fixed in PR29468.

This bug was reported & fixed by GitHub user `marcoffee` firstly, so I list him as a co-author, his work:
https://github.com/animalize/pyzstd/issues/4

The second commit fixes an omission of issue41735, a very simple fix, I fix it in PR29468 by the way.
msg414737 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2022-03-08 09:35
New changeset 36dd7396fcd26d8bf9919d536d05d7000becbe5b by Ma Lin in branch 'main':
bpo-44439: _ZipWriteFile.write() handle buffer protocol correctly (GH-29468)
https://github.com/python/cpython/commit/36dd7396fcd26d8bf9919d536d05d7000becbe5b
msg414742 - (view) Author: miss-islington (miss-islington) Date: 2022-03-08 10:03
New changeset 21c5b3f73fb11fb0d3239971f72e8f0574a07245 by Miss Islington (bot) in branch '3.10':
bpo-44439: _ZipWriteFile.write() handle buffer protocol correctly (GH-29468)
https://github.com/python/cpython/commit/21c5b3f73fb11fb0d3239971f72e8f0574a07245
msg414743 - (view) Author: miss-islington (miss-islington) Date: 2022-03-08 10:05
New changeset 0663ca17f5535178c083c6734fa52e40bd2db2de by Miss Islington (bot) in branch '3.9':
bpo-44439: _ZipWriteFile.write() handle buffer protocol correctly (GH-29468)
https://github.com/python/cpython/commit/0663ca17f5535178c083c6734fa52e40bd2db2de
msg415528 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2022-03-18 21:36
Can this be closed now or is there anything else to do?
msg415549 - (view) Author: Ma Lin (malin) * Date: 2022-03-19 13:19
`_Stream.write` method in tarfile.py also has this code:
https://github.com/python/cpython/blob/v3.11.0a6/Lib/tarfile.py#L434

But this bug will not be triggered. When calling this method, always pass bytes data.

`_ConnectionBase.send_bytes` method in multiprocessing\connection.py can be micro-optimized:
https://github.com/python/cpython/blob/v3.11.0a6/Lib/multiprocessing/connection.py#L193
This can be done in another issue.

So I think this issue can be closed.
History
Date User Action Args
2022-04-11 14:59:46adminsetgithub: 88605
2022-03-19 13:19:59malinsetstatus: pending -> closed

messages: + msg415549
stage: patch review -> resolved
2022-03-18 21:36:42iritkatrielsetstatus: open -> pending
nosy: + iritkatriel
messages: + msg415528

2022-03-08 10:05:03miss-islingtonsetmessages: + msg414743
2022-03-08 10:03:56miss-islingtonsetmessages: + msg414742
2022-03-08 09:35:02serhiy.storchakasetmessages: + msg414737
2022-03-08 09:35:01miss-islingtonsetpull_requests: + pull_request29869
2022-03-08 09:34:57miss-islingtonsetstage: resolved -> patch review
pull_requests: + pull_request29868
2021-11-08 13:11:40malinsetstatus: closed -> open
resolution: fixed -> later
messages: + msg405948
2021-11-08 12:49:38malinsetpull_requests: + pull_request27721
2021-06-22 14:00:01serhiy.storchakasetmessages: + msg396336
2021-06-22 13:59:17serhiy.storchakasetstatus: open -> closed

components: + Library (Lib)
versions: + Python 3.9, Python 3.10, Python 3.11
messages: + msg396335
type: behavior
resolution: fixed
stage: patch review -> resolved
2021-06-22 13:57:50serhiy.storchakasetmessages: + msg396334
2021-06-22 07:35:50malinsetpull_requests: + pull_request25427
2021-06-22 07:13:40christian.heimessetnosy: - christian.heimes
2021-06-22 07:04:47serhiy.storchakasetmessages: + msg396309
2021-06-22 07:04:35miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request25426
2021-06-22 05:28:41malinsetnosy: + christian.heimes

messages: + msg396305
title: PickleBuffer doesn't have __len__ method -> stdlib wrongly uses len() for bytes-like object
2021-06-17 07:39:57malinsetkeywords: + patch
stage: patch review
pull_requests: + pull_request25350
2021-06-17 06:26:03malinsetmessages: + msg395976
2021-06-17 05:36:11serhiy.storchakasetnosy: + serhiy.storchaka, nadeem.vawda
messages: + msg395973
2021-06-17 05:05:24malincreate