classification
Title: PyUnicode_FromFormatV() doesn't support utf-8 text
Type: enhancement Stage: resolved
Components: Interpreter Core, Unicode Versions: Python 3.4
process
Status: closed Resolution: duplicate
Dependencies: Superseder: PyUnicode_FromFormatV() doesn't handle non-ascii text correctly
View: 9769
Assigned To: Nosy List: chris.jerdonek, ezio.melotti, haypo, reingart
Priority: normal Keywords: patch

Created on 2012-10-28 04:06 by reingart, last changed 2012-10-28 20:14 by chris.jerdonek. This issue is now closed.

Files
File name Uploaded Description Edit
pyunicode_fromformat_utf8.patch reingart, 2012-10-28 04:06 PyUnicode_FromFormatV patch to use UTF-8 review
Messages (4)
msg173996 - (view) Author: Mariano Reingart (reingart) Date: 2012-10-28 04:06
Working in an internationalization proposal <http://python.org.ar/pyar/TracebackInternationalizationProposal>
I've stopped at #9769 where multi byte encodings (like utf-8) is not supported by PyUnicode_FromFormatV()

Beside my proposal, I think utf-8 should be supported for consistency with the other unicode functions, like PyUnicode_FromString() or even unicode_fromformat_arg()

Attached is a patch that:
- enhanced the iterator to detect multibyte sequences, with sanity checks about start & continuation bytes
- replaced unicode_write_cstr with PyUnicode_DecodeUTF8Stateful
- tests

Hope it helps, this is my first patch for cpython and my C skills are a bit rusty, so excuse me if there is any newbie glitch
msg174022 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-10-28 10:02
Shouldn't this patch be attached to the referenced issue 9769 instead of creating a new issue?  Even the issue title is nearly the same:

9769:  PyUnicode_FromFormatV() doesn't handle non-ascii text correctly
16343: PyUnicode_FromFormatV() doesn't support utf-8 text
msg174072 - (view) Author: Mariano Reingart (reingart) Date: 2012-10-28 20:01
I thought #9769 was closed (in fact, that patch was already applied).
Now, PyUnicode_FromFormatV() doesn't handle non-ascii text at all.
Maybe I misread the part telling to open a new issue in the comments, sorry for that.
msg174073 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-10-28 20:14
Issue 9769 is still open.  It looks like there was some disagreement in the comments between Alexander and Victor as to whether a new issue should be created (since Victor had a different idea when first opening the issue), but it looks like Victor deferred to Alexander in his final comment.

I will close this issue as a duplicate, and if you could repost your patch on the other issue, that would be great.  The discussion there is relevant to your patch.
History
Date User Action Args
2012-10-28 20:14:48chris.jerdoneksetstatus: open -> closed
superseder: PyUnicode_FromFormatV() doesn't handle non-ascii text correctly
messages: + msg174073

resolution: duplicate
stage: resolved
2012-10-28 20:01:20reingartsetmessages: + msg174072
2012-10-28 10:02:16chris.jerdoneksetnosy: + chris.jerdonek
messages: + msg174022
2012-10-28 04:42:27r.david.murraysetnosy: + haypo
2012-10-28 04:06:11reingartcreate