Message 103913 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	dmalcolm
Recipients	dmalcolm, loewis, pitrou, vstinner
Date	2010-04-21.21:29:01
SpamBayes Score	0.0
Marked as misclassified	No
Message-id	<1271885351.11.0.310328165413.issue8380@psf.upfronthosting.co.za>
In-reply-to

Content
I'm attaching a new version of the patch, for the py3k branch. I changed my mind back about the breakpoint, using "id" and "builtin_id" as in my original patch. I prefer it since it has a single argument, which makes it very convenient to work with in the various tests - textiowrapper_write takes an args tuple, which makes things like corrupting the pointer slightly more tricky. The big change here is that I've changed the output format throughout to try to emulate Python 3 literals: a PyLongObject instance is now printed as digits, without a trailing "L". I feel that the fact that gdb is running python 2 is really just an implementation detail, and that the pretty-printer ought to print in a format reflecting the language being debugged. This also removes the 'u' prefix from strings, and I've added tests for 'bytes' (which get a "b" prefix). I've also (I believe) correctly implemented the Python 3's literal representation for empty and non-empty sets and frozensets ( e.g. "{1, 2, 3}", as opposed to Python 2's "set([1, 2, 3])" ) More controversially, a PyUnicodeObject instance is printed using an emulation of Python 3's unicode_repr algorithm, which means that gdb prints unicode to sys.stdout, so that gdb will potentially print non-ASCII characters, using the encoding of sys.stdout. This will only work if gdb's encoding is set to something that can cope with said characters: Python 3.2a0 (py3k:80312M, Apr 21 2010, 17:00:02) [GCC 4.4.3 20100127 (Red Hat 4.4.3-4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> id('文字化け') Breakpoint 1, builtin_id (self=<module at remote 0x7ffff7fd7df8>, v='文字化け') at Python/bltinmodule.c:912 912 return PyLong_FromVoidPtr(v); Note the unicode characters in the rendering of "v" in the breakpoint. I suspect that this is a change too far (for example, I'm assuming a UTF-8 locale). Any suggestions on what the output should look like for the unicode case? Would it be better if I coerce everything back to an escaped literal syntax that's encodable as ASCII? That would probably avoid encoding and locale issues, but lose immediate readability for people able to read non-ASCII scripts. All tests pass with both UCS2 and UCS4 builds on this Fedora 12 x86_64 box, building with --with-pydebug in both cases.

I'm attaching a new version of the patch, for the py3k branch.

I changed my mind back about the breakpoint, using "id" and "builtin_id" as in my original patch. I prefer it since it has a single argument, which makes it very convenient to work with in the various tests - textiowrapper_write takes an args tuple, which makes things like corrupting the pointer slightly more tricky.

The big change here is that I've changed the output format throughout to try to emulate Python 3 literals: a PyLongObject instance is now printed as digits, without a trailing "L". I feel that the fact that gdb is running python 2 is really just an implementation detail, and that the pretty-printer ought to print in a format reflecting the language being debugged.

This also removes the 'u' prefix from strings, and I've added tests for 'bytes' (which get a "b" prefix). I've also (I believe) correctly implemented the Python 3's literal representation for empty and non-empty sets and frozensets ( e.g. "{1, 2, 3}", as opposed to Python 2's "set([1, 2, 3])" )

More controversially, a PyUnicodeObject instance is printed using an emulation of Python 3's unicode_repr algorithm, which means that gdb prints unicode to sys.stdout, so that gdb will potentially print non-ASCII characters, using the encoding of sys.stdout. This will only work if gdb's encoding is set to something that can cope with said characters:

Python 3.2a0 (py3k:80312M, Apr 21 2010, 17:00:02)
[GCC 4.4.3 20100127 (Red Hat 4.4.3-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> id('文字化け')

Breakpoint 1, builtin_id (self=<module at remote 0x7ffff7fd7df8>, v='文字化け') at Python/bltinmodule.c:912
912 return PyLong_FromVoidPtr(v);

Note the unicode characters in the rendering of "v" in the breakpoint.

I suspect that this is a change too far (for example, I'm assuming a UTF-8 locale).

Any suggestions on what the output should look like for the unicode case?

Would it be better if I coerce everything back to an escaped literal syntax that's encodable as ASCII? That would probably avoid encoding and locale issues, but lose immediate readability for people able to read non-ASCII scripts.

All tests pass with both UCS2 and UCS4 builds on this Fedora 12 x86_64 box, building with --with-pydebug in both cases.

History
Date	User	Action	Args
2010-04-21 21:29:11	dmalcolm	set	recipients: + dmalcolm, loewis, pitrou, vstinner
2010-04-21 21:29:11	dmalcolm	set	messageid: <1271885351.11.0.310328165413.issue8380@psf.upfronthosting.co.za>
2010-04-21 21:29:09	dmalcolm	link	issue8380 messages
2010-04-21 21:29:08	dmalcolm	create