One solution would be to duplicate the UTF-8 decoder for OSX, incorporating surrogate escape. This should be much shorter than the full UTF-8 codec, and perhaps at least utf8_code_length could be shared.
