Message 180446 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	pitrou
Recipients	arjennienhuis, benjamin.peterson, christian.heimes, eric.smith, exarkun, ezio.melotti, glyph, gvanrossum, loewis, martin.panter, pitrou, serhiy.storchaka, terry.reedy, uau, vstinner
Date	2013-01-23.07:27:36
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1358925897.3453.14.camel@localhost.localdomain>
In-reply-to	<44306121-4684-459E-AFAA-47B934BEF7DE@twistedmatrix.com>

Content
> > What I know from Twisted is there are many specific cases where, indeed, > > binary protocol strings are formed by string formatting, e.g. in the FTP > > implementation (and for good reason since those protocols are either ASCII > > or an ASCII superset). > > These protocols (SMTP, SIP, HTTP, IMAP, POP, FTP), are not ASCII (nor > are they an "ASCII superset"); they are ASCII commands interspersed > with binary data. The "ASCII superset commands" part is clearly separated from the "binary data" part. Your own LineReceiver is able to switch between "raw mode" and "line mode"; one is text and the other is binary. > In many cases - such as when expressing a length, or a checksum - you > _must_ treat them as bytes, or you will emit incorrect data on the > wire. This is a non-sequitur. You can fully well take the len() of some binary data, format it using "%d" in a string Content-Length header, then encode the headers using utf-8 (or whatever encoding scheme the protocol mandates). Then at the end you concatenate the encoded headers and the body. I'm sure you're already doing the moral equivalent of this, except that the encoding step is absent. So, yes, it is reasonably possible, and it even makes sense. > This is exactly why I do not particularly want bytes.format() to allow > the presence of strs as formatted values, although that would make > porting certain things easier. At this point, I would remind you that I'm not againt bytes.format(), but I'd like it to be discussed in the open rather on the bug tracker. And, yes, starting that discusssion is, IMO, the proponents' job :-) > even disregarding compatibility with a 2.x codebase, b''.join() and > b'' + b'' and (''.format()).encode('charmap') are all slower _and_ > more awkward than simply b''.format() or b''%. How can existing constructions be slower than non-existing constructions that don't have performance numbers at all? Besides, if b''.join() is too slow, it deserves to be improved. Or perhaps you should try bytearray instead, or even io.BytesIO.

> > What I know from Twisted is there are many specific cases where, indeed,
> > binary protocol strings are formed by string formatting, e.g. in the FTP
> > implementation (and for good reason since those protocols are either ASCII
> > or an ASCII superset).
> 
> These protocols (SMTP, SIP, HTTP, IMAP, POP, FTP), are not ASCII (nor
> are they an "ASCII superset"); they are ASCII commands interspersed
> with binary data.

The "ASCII superset commands" part is clearly separated from the "binary
data" part. Your own LineReceiver is able to switch between "raw mode"
and "line mode"; one is text and the other is binary.

> In many cases - such as when expressing a length, or a checksum - you
> _must_ treat them as bytes, or you will emit incorrect data on the
> wire.

This is a non-sequitur. You can fully well take the len() of some
*binary* data, format it using "%d" in a *string* Content-Length header,
then encode the headers using utf-8 (or whatever encoding scheme the
protocol mandates). Then at the end you concatenate the encoded headers
and the body. I'm sure you're already doing the moral equivalent of
this, except that the encoding step is absent.

So, yes, it is reasonably possible, and it even makes sense.

> This is exactly why I do not particularly want bytes.format() to allow
> the presence of strs as formatted values, although that *would* make
> porting certain things easier.

At this point, I would remind you that I'm not againt bytes.format(),
but I'd like it to be discussed in the open rather on the bug tracker. 

And, yes, starting that discusssion is, IMO, the proponents' job :-)

> even disregarding compatibility with a 2.x codebase, b''.join() and
> b'' + b'' and (''.format()).encode('charmap') are all slower _and_
> more awkward than simply b''.format() or b''%.

How can existing constructions be slower than non-existing constructions
that don't have performance numbers at all?

Besides, if b''.join() is too slow, it deserves to be improved. Or
perhaps you should try bytearray instead, or even io.BytesIO.

History
Date	User	Action	Args
2013-01-23 07:27:37	pitrou	set	recipients: + pitrou, gvanrossum, loewis, terry.reedy, exarkun, vstinner, eric.smith, christian.heimes, benjamin.peterson, glyph, ezio.melotti, arjennienhuis, uau, martin.panter, serhiy.storchaka
2013-01-23 07:27:37	pitrou	link	issue3982 messages
2013-01-23 07:27:36	pitrou	create