Issue 34629: Python3 regression for urllib(2).urlopen(...).fp for chunked http responses

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/78810

classification

Title:	Python3 regression for urllib(2).urlopen(...).fp for chunked http responses
Type:	behavior	Stage:
Components:	Library (Lib)	Versions:	Python 3.8, Python 3.7, Python 3.6, Python 3.4, Python 3.5

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	jaswdr, orsenthil, tkruse, xtreak
Priority:	normal	Keywords:

Created on 2018-09-11 16:36 by tkruse, last changed 2022-04-11 14:59 by admin.

Files
File name	Uploaded	Description	Edit
urllib_issue.py	tkruse, 2018-09-11 16:36

Messages (2)
msg325025 - (view)	Author: Thibault Kruse (tkruse)	Date: 2018-09-11 16:36
We had a problem running code that downloads files from github when porting from python2.7 to python3.[3-7]. Not sure if a bug or not. With the given code, in python3 a file downloaded in chunks will contain the size of chunks when using the undocumented fp from urlopen(...).fp. In python2, only the chunk payload would make it into the file. We assume that we can just use the urlopen response directly as a fix (without '.fp'), but though it might still be nice to report the difference. Short code: resp = urlopen('http://someurl') fhand = os.fdopen(fdesc, "wb") shutil.copyfileobj(resp.fp, fhand) # using .fp here is the dodgy part fhand.close() The attached script demonstrates the difference: $ python --version Python 2.7.15rc1 $ python urllib_issue.py 127.0.0.1 - - [12/Sep/2018 01:27:28] "GET /downloads/1.0.tar.gz HTTP/1.1" 200 - $ python3 --version Python 3.6.5 $ python3 urllib_issue.py 127.0.0.1 - - [12/Sep/2018 01:27:37] "GET /downloads/1.0.tar.gz HTTP/1.1" 200 - Traceback (most recent call last): File "urllib_issue.py", line 87, in <module> assert data == FILE_CONTENT, '%s, %s'%(len(FILE_CONTENT), len(data)) AssertionError: 100000, 100493 !!! BASH reports ERROR: shell returned 1
msg396823 - (view)	Author: Jonathan Schweder (jaswdr) *	Date: 2021-07-01 18:31
Hello @tkruse, I have made some research and found that when using the Chunked transfer encoding [1], each chunk is preceded by its size in bytes, something that really happen if you check the content of one downloaded file from the example you provided [2]. So far, I would say that this is not a bug, it is just how the transfer encoding works. [1]: https://en.wikipedia.org/wiki/Chunked_transfer_encoding [2]: https://gist.github.com/jaswdr/95b2adc519d986c00b17f6572d470f2a

History
Date	User	Action	Args
2022-04-11 14:59:05	admin	set	github: 78810
2022-01-16 18:48:30	iritkatriel	set	nosy: + orsenthil
2021-07-01 18:31:04	jaswdr	set	nosy: + jaswdr messages: + msg396823
2018-09-11 17:52:22	xtreak	set	nosy: + xtreak
2018-09-11 16:36:09	tkruse	create