Author glyph
Recipients arjennienhuis, benjamin.peterson, christian.heimes, eric.smith, exarkun, ezio.melotti, glyph, gvanrossum, loewis, martin.panter, pitrou, serhiy.storchaka, terry.reedy, uau, vstinner
Date 2013-01-23.00:59:29
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <44306121-4684-459E-AFAA-47B934BEF7DE@twistedmatrix.com>
In-reply-to <1358885448.3460.17.camel@localhost.localdomain>
Content
> Antoine Pitrou added the comment:
> The fact that "there are plenty of other Python applications that don't
> use Twisted which nevertheless need to emit formatted sequences of
> bytes" is *precisely* a good reason for this to be discussed more
> visibly.

I don't think anyone is opposing discussing it.  I don't personally think such a discussion would be useful, lots of points of view are represented on this ticket, but please feel free to raise it in whatever forum that you feel would be helpful.  (Even if I did object to that I don't see how I could stop you :)).

> I'm not sure what the "general case" is.

The "general case" that I'm referring to is the case of an application writing some protocol logic in terms of constructing some bytes objects and passing them to Twisted.  In other words, Twisted relied upon Python to provide a convenient way to assemble your bytes into protocol messages, and that was removed in 3.x.  We never provided one ourselves and I don't think it would be a particularly good idea to build that kind of basic string-manipulation functionality into Twisted rather than Python.

> What I know from Twisted is there are many specific cases where, indeed,
> binary protocol strings are formed by string formatting, e.g. in the FTP
> implementation (and for good reason since those protocols are either ASCII
> or an ASCII superset).

These protocols (SMTP, SIP, HTTP, IMAP, POP, FTP), are not ASCII (nor are they an "ASCII superset"); they are ASCII commands interspersed with binary data.  It makes sense to treat them as bytes, not text.  In many cases - such as when expressing a length, or a checksum - you _must_ treat them as bytes, or you will emit incorrect data on the wire.  By the time you're dealing with text - if you ever are - you're already somewhere in the body of the protocol, decorated with appropriate metadata.

But my point about the "general case" is that when implementing a *new* protocol with ASCII commands, or maintaining an existing one, bytes-object formatting is a convenient, expressive and performant way to express the interpolation of values in the protocol stream.

> As a workaround, it would probably be reasonable to make
> these protocols use str objects at the heart, and only convert to bytes
> after the formatting is done.

Protocols like SMTP (c.f. "8-bit MIME") and HTTP put binary data in-line; do you suggest that gzipped content be encoded as latin1 so it can squeeze into python 3's str type?  I thought the whole point of the porting pain here was to get a clean separation between bytes and text.  This is exactly why I do not particularly want bytes.format() to allow the presence of strs as formatted values, although that *would* make porting certain things easier.  It makes sense to do your encoding first, then interpolate.

> Code running on both 2.x and 3.x will *by construction* have some
> performance pessimizations inside it. It is inherent to that strategy.
> Not saying this is necessarily a problem, but you should be aware of it.

This is certainly true *now*, but it doesn't necessarily have to be.  Enhancements like this one could make this performance division go away.  In any case, the reason that ported code suffers from a performance penalty is because python 3 has no efficient way of doing this type of bytes construction; even disregarding compatibility with a 2.x codebase, b''.join() and b'' + b'' and (''.format()).encode('charmap') are all slower _and_ more awkward than simply b''.format() or b''%.
History
Date User Action Args
2013-01-23 00:59:30glyphsetrecipients: + glyph, gvanrossum, loewis, terry.reedy, exarkun, pitrou, vstinner, eric.smith, christian.heimes, benjamin.peterson, ezio.melotti, arjennienhuis, uau, martin.panter, serhiy.storchaka
2013-01-23 00:59:30glyphlinkissue3982 messages
2013-01-23 00:59:29glyphcreate