Message 165748 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	eli.bendersky
Recipients	Arfrever, eli.bendersky, jcon, ncoghlan, pitrou, serhiy.storchaka, tshepang
Date	2012-07-18.09:32:41
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1342603963.23.0.290949025848.issue15381@psf.upfronthosting.co.za>
In-reply-to

Content
I wonder if this is a fair comparison, Serhiy. Strings are unicode underneath, so they have a large overhead per string (more data to copy around). Increasing the length of the strings changes the game because due to PEP 393, the overhead for ASCII-only Unicode strings is constant: >>> import sys >>> sys.getsizeof('a') 50 >>> sys.getsizeof(b'a') 34 >>> sys.getsizeof('a' * 1000) 1049 >>> sys.getsizeof(b'a' * 1000) 1033 >>> When re-running your tests with larger chunks, the results are quite interesting: $ ./python -m timeit -s "import io; d=[b'a'100,b'bb'50,b'ccc'50]1000" "b=io.BytesIO(); w=b.write" "for x in d: w(x)" "b.getvalue()" 1000 loops, best of 3: 509 usec per loop $ ./python -m timeit -s "import io; d=['a'100,'bb'50,'ccc'50]1000" "s=io.StringIO(); w=s.write" "for x in d: w(x)" "s.getvalue()" 1000 loops, best of 3: 282 usec per loop So, it seems to me that BytesIO could use some optimization!

I wonder if this is a fair comparison, Serhiy. Strings are unicode underneath, so they have a large overhead per string (more data to copy around). Increasing the length of the strings changes the game because due to PEP 393, the overhead for ASCII-only Unicode strings is constant:

>>> import sys
>>> sys.getsizeof('a')
50
>>> sys.getsizeof(b'a')
34
>>> sys.getsizeof('a' * 1000)
1049
>>> sys.getsizeof(b'a' * 1000)
1033
>>> 

When re-running your tests with larger chunks, the results are quite interesting:

$ ./python -m timeit -s "import io; d=[b'a'*100,b'bb'*50,b'ccc'*50]*1000"  "b=io.BytesIO(); w=b.write"  "for x in d: w(x)"  "b.getvalue()"
1000 loops, best of 3: 509 usec per loop
$ ./python -m timeit -s "import io; d=['a'*100,'bb'*50,'ccc'*50]*1000"  "s=io.StringIO(); w=s.write"  "for x in d: w(x)"  "s.getvalue()"
1000 loops, best of 3: 282 usec per loop

So, it seems to me that BytesIO could use some optimization!

History
Date	User	Action	Args
2012-07-18 09:32:43	eli.bendersky	set	recipients: + eli.bendersky, ncoghlan, pitrou, Arfrever, tshepang, jcon, serhiy.storchaka
2012-07-18 09:32:43	eli.bendersky	set	messageid: <1342603963.23.0.290949025848.issue15381@psf.upfronthosting.co.za>
2012-07-18 09:32:42	eli.bendersky	link	issue15381 messages
2012-07-18 09:32:41	eli.bendersky	create