See the attached timings for sample().  Patched sample2 is at worst 4% slower than original sample, and optimized sample3 is between sample and sample2. In any case the difference is pretty small, so I'm good with Raymond's variant if it looks better for him.

Please note that 3.x also needs the patch.
