classification
Title: Remove unused "errors" argument from _PyUnicode_AsDefaultEncodedString()
Type: Stage:
Components: Unicode Versions: Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: haypo, lemburg
Priority: normal Keywords: patch

Created on 2010-06-06 18:28 by haypo, last changed 2011-03-02 01:06 by haypo. This issue is now closed.

Files
File name Uploaded Description Edit
unicode_errors.patch haypo, 2010-06-06 18:28
utf8_defenc.patch haypo, 2010-06-09 10:46
Messages (5)
msg107204 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-06-06 18:28
_PyUnicode_AsDefaultEncodedString() has two arguments: unicode (input string) and errors. If errors is not NULL, it calls Py_FatalError()!

The argument is useful: all functions call it with errors=NULL.

Attached patch removes the argument.
msg107282 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-06-07 21:37
STINNER Victor wrote:
> 
> New submission from STINNER Victor <victor.stinner@haypocalc.com>:
> 
> _PyUnicode_AsDefaultEncodedString() has two arguments: unicode (input string) and errors. If errors is not NULL, it calls Py_FatalError()!
> 
> The argument is useful: all functions call it with errors=NULL.
> 
> Attached patch removes the argument.

While it's an internal API, it's still public and we cannot
just remove the extra argument - we're in stable branch mode.

Since Python3 fixes the UTF-8 default encoding, it's better
to enhance PyUnicode_AsUTF8String() to cache the UTF-8
string in the Unicode object or simply return it directly
and then replace all uses of _PyUnicode_AsDefaultEncodedString()
with PyUnicode_AsUTF8String().

We should phase out use of _PyUnicode_AsDefaultEncodedString()
as well as the whole default encoding terminology altogether.

Please also add a documentation patch and a NEWS entry.
msg107380 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-06-09 10:46
> Since Python3 fixes the UTF-8 default encoding, it's better
> to enhance PyUnicode_AsUTF8String() to cache the UTF-8
> string in the Unicode object

Right, that sounds like a great idea. Attached patch implements that: patch PyUnicode_AsUTF8String() and PyUnicode_AsEncodedString(). Does it look ok?

> replace all uses of _PyUnicode_AsDefaultEncodedString() 
> with PyUnicode_AsUTF8String()

I'm waiting for your approval of the first patch before working on the second part.
msg107385 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-06-09 11:13
STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
>> Since Python3 fixes the UTF-8 default encoding, it's better
>> to enhance PyUnicode_AsUTF8String() to cache the UTF-8
>> string in the Unicode object
> 
> Right, that sounds like a great idea. Attached patch implements that: patch PyUnicode_AsUTF8String() and PyUnicode_AsEncodedString(). Does it look ok?

Looks good.

>> replace all uses of _PyUnicode_AsDefaultEncodedString() 
>> with PyUnicode_AsUTF8String()
> 
> I'm waiting for your approval of the first patch before working on the second part.

When replacing uses of _PyUnicode_AsDefaultEncodedString() with
PyUnicode_AsUTF8String() you have to take great care to decref
the object returned by the latter. Otherwise, we get huge memory
leaks.
msg129842 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-03-02 01:05
Fixed in Python 3.3: r88708 removes errors argument of _PyUnicode_AsDefaultEncodedString(), r88709 caches the result of str.encode().

> replace all uses of _PyUnicode_AsDefaultEncodedString()
> with PyUnicode_AsUTF8String()

It makes the code more complex because PyUnicode_AsUTF8String() increments the reference counter. I prefer to keep _PyUnicode_AsDefaultEncodedString().
History
Date User Action Args
2011-03-02 01:06:05hayposetstatus: open -> closed
nosy: lemburg, haypo
resolution: fixed
2011-03-02 01:05:43hayposetnosy: lemburg, haypo
messages: + msg129842
2010-06-09 11:13:59lemburgsetmessages: + msg107385
title: Remove unused "errors" argument from _PyUnicode_AsDefaultEncodedString() -> Remove unused "errors" argument from _PyUnicode_AsDefaultEncodedString()
2010-06-09 10:46:56hayposetfiles: + utf8_defenc.patch

messages: + msg107380
2010-06-07 21:37:05lemburgsetnosy: + lemburg
title: Remove unused "errors" argument from _PyUnicode_AsDefaultEncodedString() -> Remove unused "errors" argument from _PyUnicode_AsDefaultEncodedString()
messages: + msg107282
2010-06-06 18:28:40haypocreate