Using your test script fixed (on Python 3.3), I get the following numbers:

Starting multiproc...done in 2.1014609336853027 s.
Starting futures...done in 20.209479093551636 s.
Starting futures "fixed"...done in 2.026125907897949 s.

So there's a 0.2ms overhead per remote function call here (20/(100100000-100000000)).

Can't your chunks() function use itertools.islice()?

Also, the chunksize can't be anything else than 1 by default, since your approach is increasing latency of returning results.
