This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Multicharacter replacements in PyUnicode_TranslateCharmap
Type: enhancement Stage:
Components: None Versions:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: doerwalter, lemburg, nobody, tim.peters
Priority: normal Keywords:

Created on 2001-01-04 17:50 by doerwalter, last changed 2022-04-10 16:03 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
None doerwalter, 2001-01-04 17:50 None
Messages (9)
msg53079 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2001-01-04 17:50
This patch modifies Objects/unicodeobject.c/PyUnicode_TranslateCharmap,
so that the error

   PyErr_SetString(PyExc_NotImplementedError,
        "1-n mappings are currently not implemented");

no longer occurs. I.e.

   u"ab".translate({ord(u"a"): u"bbb", ord(u"b"): u"aaa"})

now works. It does this by exponentially
reallocating the string, when there is no more
available space.
msg53080 - (view) Author: Nobody/Anonymous (nobody) Date: 2001-01-04 18:33
I like the idea, but the implementation needs some reworking:
the common case is 1-1 mapping so this should be as fast
as possible; extra size checks slow things down too much.

You can take a different approach, though:
leave things as they are and only add a special case for the 1-n
which does resizing depending on how many extra chars are inserted.
Then as final step, if resizing occurred, call _PyUnicode_Resize()
to cut down the allocate buffer to its true size.

-- Marc-Andre
msg53081 - (view) Author: Nobody/Anonymous (nobody) Date: 2001-01-05 18:45
I'll checkin a patch for this tomorrow which implements what I had 
in mind. The patch doesn't change the performance of the charmap 
codec.

Thanks,
-- Marc-Andre
msg53082 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2001-01-06 15:03
Checked in a different patch providing the same functionality.
Please see the CVS checking message for details.
msg53083 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2001-01-05 17:07
The problem, that you can't know beforehand how long
the result string will be, i.e. if there really will be any 1-n
replacements happening.

It would be possible to do a loop through the replacement
strings and see if there are any that are longer than one character,
but even if there are, you don't know if they will really be used.

So you have three choices:
(1) You either guess how much space you need and reallocate
when the space is not enough or 
(2) you do a dry run of the algorithm once and count how much 
space you need and do the algorithm a second time and this 
time use the strings.
(3) you can keep the strings in a list and join the list into
one string in the end.

For the case of 1-1 mapping the following will happen:

(1) The first allocation has exactly the right amount of space, 
there won't be any reallocations, but a size check for every
character will be don (which should be only a few assembler instructions).
The mapping will have to be accessed for every character
in the source string once.

(2) There will only be one allocation, but for every character in
the source string, the mapping has to be accessed twice, which
are calls to Python function, exception handling etc.

(3) You have to make as many memory allocations are are parts
of the final string that you create, including error handling etc.

I think (1) is clearly the fastest method.
msg53084 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2001-06-07 10:09
Logged In: YES 
user_id=89016

The patch that was checked in changes 
PyUnicode_DecodeCharmap and PyUnicode_EncodeCharmap, but 
not PyUnicode_TranslateCharmap, where this functionality is 
also useful. . (e.g. for 
u"<foo>".translate({ord("<"): u"&lt;", ord(">"): u"&gt;"})
)
msg53085 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2001-06-07 12:32
Logged In: YES 
user_id=38388

Reopened. This should really be marked as feature request
but for some reason SF won't let me change the Data Type.
msg53086 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2001-08-09 21:02
Logged In: YES 
user_id=31435

Changed to Feature Requests, at MvL's request.
msg53087 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2002-09-04 20:37
Logged In: YES 
user_id=89016

This is implemented by the PEP 293 patch. Closing the 
request.
History
Date User Action Args
2022-04-10 16:03:35adminsetgithub: 33662
2001-01-04 17:50:43doerwaltercreate