This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Memory corruption with urllib.parse
Type: security Stage: resolved
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: alexey@altlinux.org, imz, steven.daprano
Priority: normal Keywords:

Created on 2018-03-30 16:06 by alexey@altlinux.org, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
testcase.py alexey@altlinux.org, 2018-03-30 16:06 testcase for the issue
data.txt.gz alexey@altlinux.org, 2018-03-30 16:07 data.txt
Messages (7)
msg314688 - (view) Author: (alexey@altlinux.org) Date: 2018-03-30 16:06
There is a strange behavior while processing data in a "for" loop with urllib.parse.unquote() - looks like memory corruption - a list contains elements that have never been appended.

I'll explain the testcase.

I spotted the problem by checking for any remains of url encoding left in output_list. There were these strings with url encoding left - "bad_boys" dict in testcase. Now, when iterating through input_list (read from "data.txt"), I'm checking for those problematic entries and printing what is being appended to the output_list as well as all problematic (unwanted, "Bad Boys") and converted problematic entries ("Normal conversions") existing in the output_list. At some point, unwanted entries appear in output_list. The resulting output_list contains converted and unconverted problematic entries, though input_list's length equals output_list's length.

data.txt needs to be saved along with testcase.py, and then you can run testcase.py.

The output of running testcase.py:

Bad Boys are:
	 seil%2fturbo_firmware 140335684191552
	 intelligent_platforms_proficy_hmi%2fscada_cimplicity 140335684515920
	 seil%2fneu_2fe_plus_firmware 140335684536080
	 seil%2fb1_firmware 140335684134640
	 eil%2fx2_firmware 140335684191984
	 seil%2fx1_firmware 140335684190832
	 seil%2fx2_firmware 140335684190904
	 seil%2fx86_firmware 140335684192488

Input list length is: 17094 

Bad Boy detected
Element type: <class 'str'>
Convertation: seil%2fb1_firmware 140335679096848 >> seil/b1_firmware 140335681345768
Just appended: seil/b1_firmware 140335681345768
Normal conversions in output list:
	 seil/b1_firmware 140335681345768

Bad Boy detected
Element type: <class 'str'>
Convertation: seil%2fx1_firmware 140335679096920 >> seil/x1_firmware 140335681345840
Just appended: seil/x1_firmware 140335681345840
Normal conversions in output list:
	 seil/b1_firmware 140335681345768
	 seil/x1_firmware 140335681345840

Bad Boy detected
Element type: <class 'str'>
Convertation: seil%2fx2_firmware 140335679096992 >> seil/x2_firmware 140335681345912
Just appended: seil/x2_firmware 140335681345912
Normal conversions in output list:
	 seil/b1_firmware 140335681345768
	 seil/x1_firmware 140335681345840
	 seil/x2_firmware 140335681345912

Bad Boy detected
Element type: <class 'str'>
Convertation: seil%2fx86_firmware 140335679134936 >> seil/x86_firmware 140335681346704
Just appended: seil/x86_firmware 140335681346704
Normal conversions in output list:
	 seil/b1_firmware 140335681345768
	 seil/x1_firmware 140335681345840
	 seil/x2_firmware 140335681345912
	 seil/x86_firmware 140335681346704
Bad Boys in output list:
	 eil%2fx2_firmware 140335681346272

Bad Boy detected
Element type: <class 'str'>
Convertation: seil%2fturbo_firmware 140335679200976 >> seil/turbo_firmware 140335679269456
Just appended: seil/turbo_firmware 140335679269456
Normal conversions in output list:
	 seil/b1_firmware 140335681345768
	 seil/x1_firmware 140335681345840
	 seil/x2_firmware 140335681345912
	 seil/x86_firmware 140335681346704
	 seil/turbo_firmware 140335679269456
Bad Boys in output list:
	 eil%2fx2_firmware 140335681346272
	 seil%2fb1_firmware 140335679267800
	 seil%2fx1_firmware 140335679267872
	 seil%2fx2_firmware 140335679267944
	 seil%2fx86_firmware 140335679269384

Bad Boy detected
Element type: <class 'str'>
Convertation: seil%2fneu_2fe_plus_firmware 140335678867056 >> seil/neu_2fe_plus_firmware 140335680328928
Just appended: seil/neu_2fe_plus_firmware 140335680328928
Normal conversions in output list:
	 seil/b1_firmware 140335681345768
	 seil/x1_firmware 140335681345840
	 seil/x2_firmware 140335681345912
	 seil/x86_firmware 140335681346704
	 seil/turbo_firmware 140335679269456
	 seil/neu_2fe_plus_firmware 140335680328928
Bad Boys in output list:
	 eil%2fx2_firmware 140335681346272
	 seil%2fb1_firmware 140335679267800
	 seil%2fx1_firmware 140335679267872
	 seil%2fx2_firmware 140335679267944
	 seil%2fx86_firmware 140335679269384
	 seil%2fturbo_firmware 140335679849576

Bad Boy detected
Element type: <class 'str'>
Convertation: intelligent_platforms_proficy_hmi%2fscada_cimplicity 140335678546800 >> intelligent_platforms_proficy_hmi/scada_cimplicity 140335681194376
Just appended: intelligent_platforms_proficy_hmi/scada_cimplicity 140335681194376
Normal conversions in output list:
	 seil/b1_firmware 140335681345768
	 seil/x1_firmware 140335681345840
	 seil/x2_firmware 140335681345912
	 seil/x86_firmware 140335681346704
	 seil/turbo_firmware 140335679269456
	 seil/neu_2fe_plus_firmware 140335680328928
	 intelligent_platforms_proficy_hmi/scada_cimplicity 140335681194376
Bad Boys in output list:
	 eil%2fx2_firmware 140335681346272
	 seil%2fb1_firmware 140335679267800
	 seil%2fx1_firmware 140335679267872
	 seil%2fx2_firmware 140335679267944
	 seil%2fx86_firmware 140335679269384
	 seil%2fturbo_firmware 140335679849576
	 seil%2fneu_2fe_plus_firmware 140335678934512

FINAL RESULTS
Output list length is: 17094
Normal conversions in output list (Bad Boys -related):
	 seil/b1_firmware 140335681345768
	 seil/x1_firmware 140335681345840
	 seil/x2_firmware 140335681345912
	 seil/x86_firmware 140335681346704
	 seil/turbo_firmware 140335679269456
	 seil/neu_2fe_plus_firmware 140335680328928
	 intelligent_platforms_proficy_hmi/scada_cimplicity 140335681194376
Bad Boys in output list:
	 eil%2fx2_firmware 140335681346272
	 seil%2fb1_firmware 140335679267800
	 seil%2fx1_firmware 140335679267872
	 seil%2fx2_firmware 140335679267944
	 seil%2fx86_firmware 140335679269384
	 seil%2fturbo_firmware 140335679849576
	 seil%2fneu_2fe_plus_firmware 140335678934512
	 intelligent_platforms_proficy_hmi%2fscada_cimplicity 140335681195728
msg314692 - (view) Author: Ivan Zakharyaschev (imz) Date: 2018-03-30 16:35
I've tested this on a e2k machine (e2k is a VLIW/EPIC architecture, similar to ia64), where python3 3.5 has been compiled with a completely different proprietary compiler, and got the same bad result.

python3-3.5.1-alt7.3

(This has also been tested with the same results on an Ubuntu machine.)

So, this probably means that this is not a matter of wrong compiler optimizations.
msg314695 - (view) Author: Ivan Zakharyaschev (imz) Date: 2018-03-30 16:42
The bad results are also reproducible with:

# dpkg -l python3.2
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                                          Version                             Architecture                        Description
+++-=============================================================-===================================-===================================-===============================================================================================================================
ii  python3.2                                                     3.2.3-7                             ia64                                Interactive high-level object-oriented language (version 3.2)
msg314707 - (view) Author: Ivan Zakharyaschev (imz) Date: 2018-03-30 19:44
> a list contains elements that have never been appended

Why do we think that it's true?

1. The described result can be valid if there are "doubly quoted" strings in the input list.

2. Also, there could be a bug in unquote() which doesn't corrupt the list as a side effect, but simply returns a wrong result value, which is indeed appended to the output list at its turn.
msg314711 - (view) Author: Ivan Zakharyaschev (imz) Date: 2018-03-30 20:18
> 1. The described result can be valid if there are "doubly quoted" strings in the input list.

To check this I've modified your testcase.py:

diff --git a/testcase.py b/testcase.py
index b597205..fefcef2 100755
--- a/testcase.py
+++ b/testcase.py
@@ -53,6 +53,11 @@ for entry in input_list:
 	# Conveting
 	conv_entry=urllib.parse.unquote(entry)
 
+	if conv_entry in bad_boys:
+		flag=True
+		print("ABOUT TO APPEND A BAD BOY")
+		print("Conversion:",entry,id(entry),end=" ")
+
 	# Printing (debug)
 	if flag:
 		print(">>",conv_entry,id(conv_entry))
@@ -61,7 +66,7 @@ for entry in input_list:
 	output_list.append(conv_entry)
 
 	# Printing (debug)
-	if entry in bad_boys:
+	if flag:
 		print("Just appended:", output_list[-1], id(output_list[-1]))
 		flag=False	
 		for x in output_list:

Let's run it and look for my new message:

$ ./testcase.py | fgrep -A1 -e ABOUT
ABOUT TO APPEND A BAD BOY
Conversion: eil%252fx2_firmware 139920242273280 >> eil%2fx2_firmware 139920244465360
--
ABOUT TO APPEND A BAD BOY
Conversion: seil%252fb1_firmware 139920242325448 >> seil%2fb1_firmware 139920241899608
--
ABOUT TO APPEND A BAD BOY
Conversion: seil%252fx1_firmware 139920242325520 >> seil%2fx1_firmware 139920241899752
--
ABOUT TO APPEND A BAD BOY
Conversion: seil%252fx2_firmware 139920242325592 >> seil%2fx2_firmware 139920241899896
--
ABOUT TO APPEND A BAD BOY
Conversion: seil%252fx86_firmware 139920242356704 >> seil%2fx86_firmware 139920241901408
--
ABOUT TO APPEND A BAD BOY
Conversion: seil%252fturbo_firmware 139920242416280 >> seil%2fturbo_firmware 139920242993744
--
ABOUT TO APPEND A BAD BOY
Conversion: seil%252fneu_2fe_plus_firmware 139920242065856 >> seil%2fneu_2fe_plus_firmware 139920243007536
--
ABOUT TO APPEND A BAD BOY
Conversion: intelligent_platforms_proficy_hmi%252fscada_cimplicity 139920241710560 >> intelligent_platforms_proficy_hmi%2fscada_cimplicity 139920244342480
$ fgrep -ne 'seil%252fneu_2fe_plus_firmware' data.txt
15191:seil%252fneu_2fe_plus_firmware
$ 

So, there is indeed a "doubly" quoted data, at 15191 in data.txt.

This is not a bug, everything work correctly.

This issue needs to be closed.
msg314727 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2018-03-31 00:10
Alexey, I'm afraid I can't make heads or tails of your bug report. You've gone into a detailed description of the "bad results" you have, but you haven't explained *why* they're bad, or what you were expecting.

I'm afraid that I found understanding your test case code a hard slog. A much simpler test case showing input, expected output, and actual output is appreciated. It might help if you read this:

http://sscce.org/

Its written for Java programmers, but the same applies to any programming language.

Please don't go straight to assumptions about the cause of the bug ("memory corruption") until you can demonstrate that there actually is a bug.

Ivan has spent significant effort to understand your test case, thank you Ivan, and concludes:

> This is not a bug, everything work correctly.

so I'm going to close the bug report. If you disagree with Ivan's conclusion, please write back with reasons why you disagree.

It would also significantly help if you can provide a simpler test case that demonstrates expected results and actual results (and preferably input data small enough to fit inline in the testcase script, if possible).
msg314736 - (view) Author: Ivan Zakharyaschev (imz) Date: 2018-03-31 10:57
Actually, Alexey shared this problem with me orally first, and then asked to have a look at his report, and I felt that just describing the technical details about what is going on is not enough, and suggested to include a brief sentence which would state what kind of assumptions seemed to be broken to Alexey. And it was:

> a list contains elements that have never been appended.

And this statement actually appeared quite helpful for me: in a few hours after this statement was made, I got an idea that this statement might actually be not true in this case. And this quickly lead me to the demonstration what really happens here, and that it was correct Python behavior.

Without this brief formal statement of the broken assumptions, I had simply had no ideas (because I supposed that something was going wrong according to Alexey's words, and didn't question whether it is really wrong or we might expect these results under some conditions as correct results).

(Actually, I'm a fan of type-checking, and what is more, of languages with expressive types like Agda or Idris where the programmer is to make provable statements of the expected properties written down as types; sometimes, they can be derived automatically, but nevertheless my point is the attitude to programming, even without formal tools: think how the expected properties would be proved, how they would be decomposed into simpler statements during the proof. Well, this is offtopic here, so, Alexey, let's not continue this discussion here.)
History
Date User Action Args
2022-04-11 14:58:59adminsetgithub: 77367
2018-03-31 10:57:10imzsetmessages: + msg314736
2018-03-31 00:10:04steven.dapranosetstatus: open -> closed

nosy: + steven.daprano
messages: + msg314727

resolution: not a bug
stage: resolved
2018-03-30 20:18:26imzsetmessages: + msg314711
2018-03-30 19:44:48imzsetmessages: + msg314707
2018-03-30 16:42:07imzsetmessages: + msg314695
2018-03-30 16:35:14imzsetmessages: + msg314692
2018-03-30 16:12:12imzsetnosy: + imz
2018-03-30 16:07:46alexey@altlinux.orgsetfiles: + data.txt.gz
2018-03-30 16:06:55alexey@altlinux.orgcreate