Here is a patch that improves coverage and addresses the uneven accuracy. Required accuracy is now specified in ulps. Mostly, I have choses 1 ulp, since this passed for me on an x86 architecture (and also ARM), but this may be too ambitious.

I have also responded to the comment relating to erfc:
    # XXX Would be better to weaken this test only
    # for large x, instead of for all x."

I found I could not contribute the code I used to generate the additional test cases in Tools/scripts without failing test_tools. (It complained of a missing dependency. The generator uses mpmath.)
