Issue 1263: PEP 3137 patch - str8/str comparison should return false

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Unsupported provider

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/45604

classification

Title:	PEP 3137 patch - str8/str comparison should return false
Type:	behavior	Stage:
Components:	Interpreter Core	Versions:	Python 3.0

process

Status:	closed	Resolution:	accepted
Dependencies:		Superseder:
Assigned To:	brett.cannon	Nosy List:	brett.cannon, gvanrossum, thomaslee
Priority:	release blocker	Keywords:	patch

Created on 2007-10-11 12:20 by thomaslee, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
unicode-string-eq-false-r2.patch	thomaslee, 2007-10-11 12:20
unicode-string-eq-false-structmember-c-r1.patch	thomaslee, 2007-10-11 12:21
unicode-string-eq-false-r3.patch	thomaslee, 2007-10-11 12:28
unicode-string-eq-false-all-r4.patch	thomaslee, 2007-10-15 13:01
r4-revised.patch	brett.cannon, 2007-10-19 23:25
fix_test_compile.diff	brett.cannon, 2007-10-20 00:11
fix_test_format.diff	brett.cannon, 2007-10-20 00:44
fix_modulefinder.diff	brett.cannon, 2007-10-21 00:35
sqlite_fix.diff	brett.cannon, 2007-10-21 01:41
fix_test_str.diff	brett.cannon, 2007-10-21 01:52
fix_test_subprocess.diff	brett.cannon, 2007-10-21 02:15
fix_test_struct.diff	brett.cannon, 2007-10-22 07:38

Messages (16)
msg56343 - (view)	Author: Thomas Lee (thomaslee)	Date: 2007-10-11 12:20
The main patch - while exactly what is needed to make str8/str equality checks return False - breaks a bunch of tests due to PyString_* still being used elsewhere when it should be using PyUnicode. The second patch modifies structmember.c to use PyUnicode_* where it was previously using PyString_*, which fixes the first problem I stumbled across in trying to get test_unicode to run. Unfortunately, similar errors are present in Python/codecs.c and other places (maybe Python/modsupport.c too? not 100% sure yet) - these still need to be fixed!
msg56344 - (view)	Author: Thomas Lee (thomaslee)	Date: 2007-10-11 12:28
Oops - use unicode-string-eq-false-r3.patch, not unicode-string-eq-false-r2.patch.
msg56437 - (view)	Author: Thomas Lee (thomaslee)	Date: 2007-10-15 13:01
Hack to make Python/codecs.c use Unicode strings internally. I recognize the way I have fixed it here is probably not ideal (basically ripped out PyString_, replaced with a PyMem_Malloc/PyMem_Free call) but it fixes 10-12 tests that were failing with my earlier changes. If anybody can recommend a "nice" way to fix this, I'm all ears. The following still fail for me with this patch applied: -- test_compile This is due to PyString_/PyUnicode_/PyBytes_ confusion in the assembler struct (specifically: a_lnotab and a_bytecode) in Python/compile.c - tried replacing PyString_* calls with PyBytes_* calls, but this raises a TypeError because PyBytes is not hashable ... not sure what exactly is causing this. -- test_ctypes Looks like a simple case of ctypes using str8 instead of str. Appears to be an easy fix. -- test_modulefinder Failing because str8 >= str is now an invalid operation -- test_set This test needs some love. -- test_sqlite Not sure what's going on here. -- test_str This one is a little tricky: str8/str with __str__/__unicode__ ... how is this test supposed to behave with the fix in this patch? -- test_struct "unpack/pack not transitive" - what does that mean? -- test_subprocess Like modulefinder, this is probably just due to the use of str8 over str internally in the subprocess module. Likely to be an easy fix. The following tests fail for me irrespective of whether or not I have r4 of my patch applied: -- test_doctest -- test_email -- test_nis -- test_pty If anybody can give this new patch a try and let me know the result it would be much appreciated.
msg56452 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2007-10-15 17:31
I'll look at this at some point. One quick comment: the lnotab and bytecode should use PyString, which will be 'bytes' in 3.0a2. They must be immutable because code objects must be immutable (it must not be possible to modify an existing code object). On 10/15/07, Thomas Lee <report@bugs.python.org> wrote: > > > Thomas Lee added the comment: > > Hack to make Python/codecs.c use Unicode strings internally. I recognize > the way I have fixed it here is probably not ideal (basically ripped out > PyString_, replaced with a PyMem_Malloc/PyMem_Free call) but it fixes > 10-12 tests that were failing with my earlier changes. If anybody can > recommend a "nice" way to fix this, I'm all ears. > > The following still fail for me with this patch applied: > > -- test_compile > > This is due to PyString_/PyUnicode_/PyBytes_ confusion in the > assembler struct (specifically: a_lnotab and a_bytecode) in > Python/compile.c - tried replacing PyString_* calls with PyBytes_* > calls, but this raises a TypeError because PyBytes is not hashable ... > not sure what exactly is causing this. > > -- test_ctypes > Looks like a simple case of ctypes using str8 instead of str. Appears to > be an easy fix. > > -- test_modulefinder > Failing because str8 >= str is now an invalid operation > > -- test_set > This test needs some love. > > -- test_sqlite > Not sure what's going on here. > > -- test_str > > This one is a little tricky: str8/str with __str__/__unicode__ ... how > is this test supposed to behave with the fix in this patch? > > -- test_struct > "unpack/pack not transitive" - what does that mean? > > -- test_subprocess > Like modulefinder, this is probably just due to the use of str8 over str > internally in the subprocess module. Likely to be an easy fix. > > The following tests fail for me irrespective of whether or not I have r4 > of my patch applied: > > -- test_doctest > -- test_email > -- test_nis > -- test_pty > > If anybody can give this new patch a try and let me know the result it > would be much appreciated. > > __________________________________ > Tracker <report@bugs.python.org> > <http://bugs.python.org/issue1263> > __________________________________ >
msg56566 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2007-10-19 21:49
I've committed the half of this patch that doesn't break any tests: the changes to codecs.c and structmember.c. Committed revision 58551. I'm seeking help getting the remaining unit tests to pass. (Thanks Thomas for the enumeration of the test failures!)
msg56576 - (view)	Author: Brett Cannon (brett.cannon) *	Date: 2007-10-19 23:25
The file I just uploaded is unicode-string-eq-false-all-r4.patch with the codecs.c and structmember.c parts of the patch removed.
msg56578 - (view)	Author: Brett Cannon (brett.cannon) *	Date: 2007-10-20 00:11
Attached is a patch to fix test_compile. Simple fix of turning an empty string into ``str8('')``.
msg56580 - (view)	Author: Brett Cannon (brett.cannon) *	Date: 2007-10-20 00:44
Attached a fix for test_format. It was testing string interpolation on both str8 and str and using a str for the comparison. Since string interpolation is going away for str8 once it becomes bytes I just removed the testing of str8. The failures I know of left are: test_modulefinder test_sqlite test_str test_struct test_subprocess
msg56614 - (view)	Author: Brett Cannon (brett.cannon) *	Date: 2007-10-21 00:35
Attached is a fix for modulefinder. It is an ugly hack as modulefinder took the numeric opcode numbers from dis and passed them to chr(). But that doesn't work since that returns Unicode. So I took those single characters and passed them to str8(). Once str8() has its constructor match bytes() then the chr() call can be ditched and the dis values can be tossed into a single-item list.
msg56615 - (view)	Author: Brett Cannon (brett.cannon) *	Date: 2007-10-21 01:41
Attached is a fix for sqlite3. First issue was that the dictionary that was being used to store converters was having keys in Python code as Unicode but being compared against str8 in C. The second issue was that when an object was serialized using __conform__ and a Unicode object was returned, it was being unserialized as a str8 no matter what type of argument was returned. That makes the most sense if only a single type is going to be returned, so I left it as such and fixed the test to decode str8 to UTF-8 if passed to __init__.
msg56617 - (view)	Author: Brett Cannon (brett.cannon) *	Date: 2007-10-21 01:52
Attached is a patch to fix test_str. Basically there were a bunch of redundant tests for making sure that calling str() on an object called it's __str__ method. str8 no longer is directly relevant since it is no longer an actual string type.
msg56618 - (view)	Author: Brett Cannon (brett.cannon) *	Date: 2007-10-21 02:10
Guido, what do you want to do about the struct module for the various string formats (i.e., c, s, p)? Should they return str8 or Unicode?
msg56619 - (view)	Author: Brett Cannon (brett.cannon) *	Date: 2007-10-21 02:15
Attached is a fix for test_subprocess. Simply had to change a call to str8() to str(). I am going to run the test suite, but that should leave only test_struct failing and that can be fixed as soon as Guido makes a call on whether str8 or str should be used for the string formats.
msg56624 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2007-10-21 16:02
> Guido, what do you want to do about the struct module for the various > string formats (i.e., c, s, p)? Should they return str8 or Unicode? Oh, tough call. I think they should return str8 (i.e. bytes after the rename) because the encoding isn't known. Even though this will break more code, since I'm sure there's lots of code out there that assumes they return "text".
msg56642 - (view)	Author: Brett Cannon (brett.cannon) *	Date: 2007-10-22 07:38
Attached is a fix for test_struct. All of the string tests now assume str8 is returned when arguments of bytes, str8 or str are given for the various string formats. All tests now pass. Re-assigning to myself to check everything in when it isn't so late at night. =)
msg56656 - (view)	Author: Brett Cannon (brett.cannon) *	Date: 2007-10-22 20:25
Everything committed in r58596. Thanks for the help, Thomas!

History
Date	User	Action	Args
2022-04-11 14:56:27	admin	set	github: 45604
2008-01-06 22:29:45	admin	set	keywords: - py3k versions: Python 3.0
2007-10-22 20:25:29	brett.cannon	set	status: open -> closed resolution: accepted messages: + msg56656
2007-10-22 07:39:01	brett.cannon	set	priority: release blocker keywords: + py3k type: enhancement -> behavior
2007-10-22 07:38:37	brett.cannon	set	files: + fix_test_struct.diff assignee: gvanrossum -> brett.cannon messages: + msg56642
2007-10-21 16:02:44	gvanrossum	set	messages: + msg56624
2007-10-21 02:15:26	brett.cannon	set	files: + fix_test_subprocess.diff messages: + msg56619
2007-10-21 02:10:07	brett.cannon	set	messages: + msg56618
2007-10-21 01:52:23	brett.cannon	set	files: + fix_test_str.diff messages: + msg56617
2007-10-21 01:41:34	brett.cannon	set	files: + sqlite_fix.diff messages: + msg56615
2007-10-21 00:35:07	brett.cannon	set	files: + fix_modulefinder.diff messages: + msg56614
2007-10-20 00:44:24	brett.cannon	set	files: + fix_test_format.diff messages: + msg56580
2007-10-20 00:11:48	brett.cannon	set	files: + fix_test_compile.diff messages: + msg56578
2007-10-19 23:25:53	brett.cannon	set	files: + r4-revised.patch nosy: + brett.cannon messages: + msg56576
2007-10-19 22:32:42	brett.cannon	set	files: - unnamed
2007-10-19 21:49:42	gvanrossum	set	messages: + msg56566
2007-10-15 17:31:28	gvanrossum	set	files: + unnamed messages: + msg56452
2007-10-15 13:01:36	thomaslee	set	files: + unicode-string-eq-false-all-r4.patch messages: + msg56437
2007-10-11 19:28:37	loewis	set	keywords: + patch
2007-10-11 16:53:21	gvanrossum	set	assignee: gvanrossum nosy: + gvanrossum
2007-10-11 12:28:23	thomaslee	set	files: + unicode-string-eq-false-r3.patch messages: + msg56344
2007-10-11 12:21:08	thomaslee	set	files: + unicode-string-eq-false-structmember-c-r1.patch
2007-10-11 12:20:47	thomaslee	create