Author terry.reedy
Recipients arjennienhuis, benjamin.peterson, christian.heimes, eric.smith, exarkun, ezio.melotti, glyph, gvanrossum, loewis, martin.panter, pitrou, serhiy.storchaka, terry.reedy, uau, vstinner
Date 2013-01-22.23:34:32
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1358897672.74.0.323995453244.issue3982@psf.upfronthosting.co.za>
In-reply-to
Content
>it would probably be reasonable to make these protocols use str objects at the heart, and only convert to bytes after the formatting is done.

I presume this would mean adding 'if py3: out = out.encode()' after the formatting. As I said before, this works much better in 3.3+ than in 3.2-. Some actual numbers:

for len in (0, 100, 1000, 10000, 100000):
    a = 'a' * len
    print(timeit("a.encode()", "from __main__ import a"))
>>> 
0.19305401378265558
0.22193721412302575
0.2783227054755883
0.677596406192696
7.124387897799184

Given n = 1000000, these should be microseconds per encoding. Of note: 
the copying of bytes does not double the total time until there are a few thousand chars. Would protocols be using .format for much more than this?

[If speed is really an issue, we could make binary file/socket write methods unicode implementation aware. They could directly access the ascii (or latin-1) bytes in a unicode object, just as they do with a bytes object, and the extra copy could be skipped.]
History
Date User Action Args
2013-01-22 23:34:32terry.reedysetrecipients: + terry.reedy, gvanrossum, loewis, exarkun, pitrou, vstinner, eric.smith, christian.heimes, benjamin.peterson, glyph, ezio.melotti, arjennienhuis, uau, martin.panter, serhiy.storchaka
2013-01-22 23:34:32terry.reedysetmessageid: <1358897672.74.0.323995453244.issue3982@psf.upfronthosting.co.za>
2013-01-22 23:34:32terry.reedylinkissue3982 messages
2013-01-22 23:34:32terry.reedycreate