Message 180442 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	glyph
Recipients	arjennienhuis, benjamin.peterson, christian.heimes, eric.smith, exarkun, ezio.melotti, glyph, gvanrossum, loewis, martin.panter, pitrou, serhiy.storchaka, terry.reedy, uau, vstinner
Date	2013-01-23.01:03:22
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<3ABE017B-5DCC-4569-A7D5-625707002EA3@twistedmatrix.com>
In-reply-to	<1358897672.74.0.323995453244.issue3982@psf.upfronthosting.co.za>

Content
On Jan 22, 2013, at 3:34 PM, Terry J. Reedy <report@bugs.python.org> wrote: > I presume this would mean adding 'if py3: out = out.encode()' after the formatting. As I said before, this works much better in 3.3+ than in 3.2-. Some actual numbers: I'm glad that this operation has been optimized, but treating blocks of protocol data as text is a hackish workaround that still doesn't perform as well (even on 3.3+) as bytes formatting in 2.7. > [If speed is really an issue, we could make binary file/socket write methods unicode implementation aware. They could directly access the ascii (or latin-1) bytes in a unicode object, just as they do with a bytes object, and the extra copy could be skipped.] Yes, speed is really an issue - this kind of message construction is on the critical path of many of the more popular protocols implemented with Twisted. But trying to work around the performance issue by pretending that strings are bytes will just give new life to old bugs. We've been loudly rejecting unicode from sockets I think for as long as Python has had unicode, and that's the way it should remain.

On Jan 22, 2013, at 3:34 PM, Terry J. Reedy <report@bugs.python.org> wrote:

> I presume this would mean adding 'if py3: out = out.encode()' after the formatting. As I said before, this works much better in 3.3+ than in 3.2-. Some actual numbers:

I'm glad that this operation has been optimized, but treating blocks of protocol data as text is a hackish workaround that still doesn't perform as well (even on 3.3+) as bytes formatting in 2.7.

> [If speed is really an issue, we could make binary file/socket write methods unicode implementation aware. They could directly access the ascii (or latin-1) bytes in a unicode object, just as they do with a bytes object, and the extra copy could be skipped.]

Yes, speed is really an issue - this kind of message construction is on the critical path of many of the more popular protocols implemented with Twisted.  But trying to work around the performance issue by pretending that strings are bytes will just give new life to old bugs.  We've been loudly rejecting unicode from sockets I think for as long as Python has had unicode, and that's the way it should remain.

History
Date	User	Action	Args
2013-01-23 01:03:22	glyph	set	recipients: + glyph, gvanrossum, loewis, terry.reedy, exarkun, pitrou, vstinner, eric.smith, christian.heimes, benjamin.peterson, ezio.melotti, arjennienhuis, uau, martin.panter, serhiy.storchaka
2013-01-23 01:03:22	glyph	link	issue3982 messages
2013-01-23 01:03:22	glyph	create