Message122816
On Mon, Nov 29, 2010 at 4:41 AM, Marc-Andre Lemburg
<report@bugs.python.org> wrote:
..
> It would be better to copy and iterate over the Unicode string first,
> replacing any decimal code points with ASCII ones and then call the
> UTF-8 encoder.
>
Good idea.
> The code as it stands is very inefficient, since it will most likely
> run the memcpy() part for every code point after the first non-ASCII
> decimal one.
>
I doubt there are measurable gains from this optimization, but doing
conversion in Unicode characters results in cleaner API. The new
patch, issue10557a.diff, implements
_PyUnicode_NormalizeDecimal(Py_UNICODE *s, Py_ssize_t length) which is
defined as follows:
/* Strip leading and trailing space and convert code points that have
decimal
digit property to the corresponding ASCII digit code point.
Returns a new Unicode string on success, NULL on failure.
*/
Note that I used deprecated _PyUnicode_AsStringAndSize() in
floatobject.c not only because it is convenient, but also because I
believe that in the future numerical value parsers should be converted
to operate on unicode characters. When this happens, the use of
_PyUnicode_AsStringAndSize() can be removed. |
|
Date |
User |
Action |
Args |
2010-11-29 15:39:52 | belopolsky | set | recipients:
+ belopolsky, lemburg, mark.dickinson, vstinner, eric.smith, ezio.melotti, skrah |
2010-11-29 15:39:49 | belopolsky | link | issue10557 messages |
2010-11-29 15:39:49 | belopolsky | create | |
|