Message 199508 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	barry, brett.cannon, christian.heimes, kristjan.jonsson, pitrou, ronaldoussoren, serhiy.storchaka, vstinner
Date	2013-10-11.21:35:33
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1381527333.51.0.257694807529.issue19219@psf.upfronthosting.co.za>
In-reply-to

Content
> (however, a quick test suggests that PyUnicode_DecodeUTF8 is quite slower) It's surprising that PyUnicode_DecodeUTF8() is quite slower than _PyUnicode_FromUCS1(). _PyUnicode_FromUCS1() calls ucs1lib_find_max_char() and then memcpy(). PyUnicode_DecodeUTF8() first tries ascii_decode() which is very similar than ucs1lib_find_max_char(). The difference is maybe that _PyUnicode_FromUCS1() copies all bytes at once (memcpy()), whereas ascii_decode() copies bytes while if the string is ASCII or not.

> (however, a quick test suggests that PyUnicode_DecodeUTF8 is quite slower)

It's surprising that PyUnicode_DecodeUTF8() is quite slower than _PyUnicode_FromUCS1(). _PyUnicode_FromUCS1() calls ucs1lib_find_max_char() and then memcpy(). PyUnicode_DecodeUTF8() first tries ascii_decode() which is very similar than ucs1lib_find_max_char().

The difference is maybe that _PyUnicode_FromUCS1() copies all bytes at once (memcpy()), whereas ascii_decode() copies bytes while if the string is ASCII or not.

History
Date	User	Action	Args
2013-10-11 21:35:33	vstinner	set	recipients: + vstinner, barry, brett.cannon, ronaldoussoren, pitrou, kristjan.jonsson, christian.heimes, serhiy.storchaka
2013-10-11 21:35:33	vstinner	set	messageid: <1381527333.51.0.257694807529.issue19219@psf.upfronthosting.co.za>
2013-10-11 21:35:33	vstinner	link	issue19219 messages
2013-10-11 21:35:33	vstinner	create