This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author terry.reedy
Recipients arjennienhuis, benjamin.peterson, christian.heimes, eric.smith, exarkun, ezio.melotti, glyph, gvanrossum, loewis, martin.panter, pitrou, serhiy.storchaka, terry.reedy, uau, vstinner
Date 2013-01-22.23:34:32
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1358897672.74.0.323995453244.issue3982@psf.upfronthosting.co.za>
In-reply-to
Content
>it would probably be reasonable to make these protocols use str objects at the heart, and only convert to bytes after the formatting is done.

I presume this would mean adding 'if py3: out = out.encode()' after the formatting. As I said before, this works much better in 3.3+ than in 3.2-. Some actual numbers:

for len in (0, 100, 1000, 10000, 100000):
    a = 'a' * len
    print(timeit("a.encode()", "from __main__ import a"))
>>> 
0.19305401378265558
0.22193721412302575
0.2783227054755883
0.677596406192696
7.124387897799184

Given n = 1000000, these should be microseconds per encoding. Of note: 
the copying of bytes does not double the total time until there are a few thousand chars. Would protocols be using .format for much more than this?

[If speed is really an issue, we could make binary file/socket write methods unicode implementation aware. They could directly access the ascii (or latin-1) bytes in a unicode object, just as they do with a bytes object, and the extra copy could be skipped.]
History
Date User Action Args
2013-01-22 23:34:32terry.reedysetrecipients: + terry.reedy, gvanrossum, loewis, exarkun, pitrou, vstinner, eric.smith, christian.heimes, benjamin.peterson, glyph, ezio.melotti, arjennienhuis, uau, martin.panter, serhiy.storchaka
2013-01-22 23:34:32terry.reedysetmessageid: <1358897672.74.0.323995453244.issue3982@psf.upfronthosting.co.za>
2013-01-22 23:34:32terry.reedylinkissue3982 messages
2013-01-22 23:34:32terry.reedycreate