Thanks Serhiy, the other issue noted about performance improvement removing casefold and I thought per call to be inefficient. My bad that I didn't consider the cost of moving the compilation to module level that affects import time and about using -X importtime. I agree that the cost is not worthy given that regex is used only inside b16decode. I will keep these factors in mind when I am doing similar sort of work and try to do a better analysis.
