Author pitrou
Recipients arjennienhuis, benjamin.peterson, christian.heimes, eric.smith, exarkun, ezio.melotti, glyph, gvanrossum, loewis, martin.panter, pitrou, serhiy.storchaka, terry.reedy, uau, vstinner
Date 2013-01-23.07:27:36
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1358925897.3453.14.camel@localhost.localdomain>
In-reply-to <44306121-4684-459E-AFAA-47B934BEF7DE@twistedmatrix.com>
Content
> > What I know from Twisted is there are many specific cases where, indeed,
> > binary protocol strings are formed by string formatting, e.g. in the FTP
> > implementation (and for good reason since those protocols are either ASCII
> > or an ASCII superset).
> 
> These protocols (SMTP, SIP, HTTP, IMAP, POP, FTP), are not ASCII (nor
> are they an "ASCII superset"); they are ASCII commands interspersed
> with binary data.

The "ASCII superset commands" part is clearly separated from the "binary
data" part. Your own LineReceiver is able to switch between "raw mode"
and "line mode"; one is text and the other is binary.

> In many cases - such as when expressing a length, or a checksum - you
> _must_ treat them as bytes, or you will emit incorrect data on the
> wire.

This is a non-sequitur. You can fully well take the len() of some
*binary* data, format it using "%d" in a *string* Content-Length header,
then encode the headers using utf-8 (or whatever encoding scheme the
protocol mandates). Then at the end you concatenate the encoded headers
and the body. I'm sure you're already doing the moral equivalent of
this, except that the encoding step is absent.

So, yes, it is reasonably possible, and it even makes sense.

> This is exactly why I do not particularly want bytes.format() to allow
> the presence of strs as formatted values, although that *would* make
> porting certain things easier.

At this point, I would remind you that I'm not againt bytes.format(),
but I'd like it to be discussed in the open rather on the bug tracker. 

And, yes, starting that discusssion is, IMO, the proponents' job :-)

> even disregarding compatibility with a 2.x codebase, b''.join() and
> b'' + b'' and (''.format()).encode('charmap') are all slower _and_
> more awkward than simply b''.format() or b''%.

How can existing constructions be slower than non-existing constructions
that don't have performance numbers at all?

Besides, if b''.join() is too slow, it deserves to be improved. Or
perhaps you should try bytearray instead, or even io.BytesIO.
History
Date User Action Args
2013-01-23 07:27:37pitrousetrecipients: + pitrou, gvanrossum, loewis, terry.reedy, exarkun, vstinner, eric.smith, christian.heimes, benjamin.peterson, glyph, ezio.melotti, arjennienhuis, uau, martin.panter, serhiy.storchaka
2013-01-23 07:27:37pitroulinkissue3982 messages
2013-01-23 07:27:36pitroucreate