BTW, CPython does not use UTF-8 and UTF-16 encoding in internal representation of strings. It uses Latin1, UCS2 and UCS4 (UTF-32).

What benchmarks show? Is your code always faster and how much? If it is slower for some data, for what data and how much?
