What problem is purposed to solve clean_surrogate_escapes()? Could you please provide user scenario or two?

Possible alternative implementation is:

def clean_surrogate_escapes(s):
    return s.encode('utf-8', 'surrogatepass').decode('utf-8', 'replace')

It can be faster for some data (for mostly ASCII with rare surrogates it is superfast). For other data 'utf-16' can be better choice.
