classification
Title: PyUnicode_EncodeDecimal: reject error handlers different than strict
Type: Stage:
Components: Unicode Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, loewis, skrah, vstinner
Priority: normal Keywords: patch

Created on 2011-11-22 12:24 by vstinner, last changed 2011-11-25 19:10 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
encode_decimal_errors.patch vstinner, 2011-11-22 12:24
Messages (4)
msg148111 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-11-22 12:24
Error handling of PyUnicode_EncodeDecimal() is broken by design. The caller cannot know the size of the output buffer because each error handler produce a variable output, whereas the caller has to allocate 
this buffer and it is not possible to specify the size of the output buffer.

I propose to raise a ValueError if the error handler is different than "strict" and do this change in Python 2.7, 3.2 and 3.3.

In Python 2.7 code base, PyUnicode_EncodeDecimal() is always called with 
errors=NULL. In Python 3.x, the function is no more called.

Attached patch is for Python 3.2.

See also the issue #13093.
msg148115 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-11-22 13:09
I'm only using the function with the NULL error handler. If I had
to use 'xmlcharrefreplace', presumably I'd overallocate 'output'
for the worst case scenario: sizeof("&#4294967295") per encoded
character.

It's hard to tell if people are using this feature. PyUnicode_EncodeDecimal()
was always undocumented (#8646), but part of the official Unicode API.
msg148142 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-11-22 20:41
> I'm only using the function with the NULL error handler.

I don't think that anyone uses it without something else. The function is used to prepare a string input for a function converting a string to an integer. I don't see how xmlcharrefreplace can be useful.
msg148349 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-11-25 19:10
Hum, I only changed PyUnicode_EncodeDecimal in Python 3.3, I prefer to not touch stable releases (2.7, 3.2).

New changeset a20fae95618c by Victor Stinner in branch 'default':
Close #13093: PyUnicode_EncodeDecimal() doesn't support error handlers
http://hg.python.org/cpython/rev/a20fae95618c

(Oops, I specified the wrong issue number: fixed in 9a712ad593bb)
History
Date User Action Args
2011-11-25 19:10:17vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg148349
2011-11-22 20:41:15vstinnersetmessages: + msg148142
2011-11-22 13:09:54skrahsetmessages: + msg148115
2011-11-22 12:24:48vstinnercreate