classification
Title: stdlib wrongly uses len() for bytes-like object
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: malin, miss-islington, nadeem.vawda, pitrou, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2021-06-17 05:05 by malin, last changed 2021-06-22 14:00 by serhiy.storchaka. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 26764 merged malin, 2021-06-17 07:39
PR 26845 merged miss-islington, 2021-06-22 07:04
PR 26846 merged malin, 2021-06-22 07:35
Messages (8)
msg395971 - (view) Author: Ma Lin (malin) * Date: 2021-06-17 05:05
If run this code, it will raise an exception: 

    import pickle
    import lzma
    import pandas as pd
    with lzma.open("test.xz", "wb") as file:
        pickle.dump(pd.DataFrame(range(1_000_000)), file, protocol=5)

The exception:

    Traceback (most recent call last):
      File "E:\testlen.py", line 7, in <module>
        pickle.dump(pd.DataFrame(range(1_000_000)), file, protocol=5)
      File "D:\Python39\lib\lzma.py", line 234, in write
        self._pos += len(data)
    TypeError: object of type 'pickle.PickleBuffer' has no len()
    
The exception is raised in lzma.LZMAFile.write() method:
https://github.com/python/cpython/blob/v3.10.0b2/Lib/lzma.py#L238
        
PickleBuffer doesn't have .__len__ method, is it intended?
msg395973 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-06-17 05:36
Oh, LZMAFile.write() should not use len() directly on input data because it does not always work correctly with memoryview and other objects supporting the buffer protocol. It should use memoryview(data).nbytes or data = memoryview(data).cast('B') if other byte-oriented operations (indexing, slicing) are used. See for example Lib/gzip.py, Lib/_pyio.py, Lib/_compression.py, Lib/ssl.py, Lib/socketserver.py, Lib/wave.py.
msg395976 - (view) Author: Ma Lin (malin) * Date: 2021-06-17 06:26
Ok, I'm working on a PR.
msg396305 - (view) Author: Ma Lin (malin) * Date: 2021-06-22 05:28
I am checking all the .py files in `Lib` folder.
hmac.py has two len() bugs:
https://github.com/python/cpython/blob/v3.10.0b3/Lib/hmac.py#L212
https://github.com/python/cpython/blob/v3.10.0b3/Lib/hmac.py#L214

I think PR 26764 is prepared, it fixes the len() bugs in bz2.py/lzma.py files.
msg396309 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-06-22 07:04
New changeset bc6c12c72a9536acc96e7b9355fd69d1083a43c1 by Ma Lin in branch 'main':
bpo-44439: BZ2File.write() / LZMAFile.write() handle buffer protocol correctly (GH-26764)
https://github.com/python/cpython/commit/bc6c12c72a9536acc96e7b9355fd69d1083a43c1
msg396334 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-06-22 13:57
New changeset 8bc26d8c9d092840054f57f9b4620de0d40d8423 by Ma Lin in branch '3.9':
bpo-44439: BZ2File.write()/LZMAFile.write() handle length correctly (GH-26846)
https://github.com/python/cpython/commit/8bc26d8c9d092840054f57f9b4620de0d40d8423
msg396335 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-06-22 13:59
Thank you for your contribution Ma Lin.
msg396336 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-06-22 14:00
New changeset 01858fbe31e8e0185edfbd3f10172f7c61391c9d by Miss Islington (bot) in branch '3.10':
bpo-44439: BZ2File.write() / LZMAFile.write() handle buffer protocol correctly (GH-26764) (GH-26845)
https://github.com/python/cpython/commit/01858fbe31e8e0185edfbd3f10172f7c61391c9d
History
Date User Action Args
2021-06-22 14:00:01serhiy.storchakasetmessages: + msg396336
2021-06-22 13:59:17serhiy.storchakasetstatus: open -> closed

components: + Library (Lib)
versions: + Python 3.9, Python 3.10, Python 3.11
messages: + msg396335
type: behavior
resolution: fixed
stage: patch review -> resolved
2021-06-22 13:57:50serhiy.storchakasetmessages: + msg396334
2021-06-22 07:35:50malinsetpull_requests: + pull_request25427
2021-06-22 07:13:40christian.heimessetnosy: - christian.heimes
2021-06-22 07:04:47serhiy.storchakasetmessages: + msg396309
2021-06-22 07:04:35miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request25426
2021-06-22 05:28:41malinsetnosy: + christian.heimes

messages: + msg396305
title: PickleBuffer doesn't have __len__ method -> stdlib wrongly uses len() for bytes-like object
2021-06-17 07:39:57malinsetkeywords: + patch
stage: patch review
pull_requests: + pull_request25350
2021-06-17 06:26:03malinsetmessages: + msg395976
2021-06-17 05:36:11serhiy.storchakasetnosy: + serhiy.storchaka, nadeem.vawda
messages: + msg395973
2021-06-17 05:05:24malincreate