I saw a small regression over 4k when using a 64k buffer on one of my machines (dual core amd64 linux).  With 32k everything (amd64 linux, armv7l 32-bit linux, 64-bit os x 10.6) showed a dramatic improvement on the microbenchmark.  approaching 50% less cpu use in many cases.

i doubt applications will notice as much as they're likely to be dominated by their own application code rather than the subprocess internals.

re: 3.3 or not, true, but since it doesn't change any APIs and is minor I did it anyways.  If you think it doesn't belong there, leave it to the release manager to back out.  This and the #19506 change should be invisible to users.
