classification
Title: PyUnicode_FromFormat("%V") decodes the byte string from ISO-8859-1
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: vstinner, ysj.ray
Priority: normal Keywords: patch

Created on 2011-02-18 22:12 by vstinner, last changed 2011-03-01 22:49 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
issue11246.diff ysj.ray, 2011-02-22 03:55
Messages (5)
msg128816 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-02-18 22:12
While testing a patch fixing issue #7330, I found a bug in PyUnicode_FromFormat() in the %V format: it decodes the byte string from ISO-8859-1, whereas I would expect that the string is decodes from UTF-8, as the "%s" format.
msg128945 - (view) Author: ysj.ray (ysj.ray) Date: 2011-02-21 07:21
Yes. The %V should be combination of %U and %s.

Here is a patch which fixed this problem.
msg128990 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-02-21 20:24
+        text = PyUnicode_FromFormat(b'repr=%V', 'abcdef', b'abcdef')
+        self.assertEqual(text, 'repr=abcdef')

How do you know which argument is used? For example, you should use instead 'abc' and b'xyz'.

+        text = PyUnicode_FromFormat(b'repr=%V', None, '人民'.encode('UTF-8'))
+        self.assertEqual(text, 'repr=人民')

I prefer ASCII literals using \x or \u: '\xe4\xba\xe6\xb0\u2018'.

You should also add a test specific to the replace error handler, e.g. (None, b'abc\xff') => 'abc\ufffd'.
msg129033 - (view) Author: ysj.ray (ysj.ray) Date: 2011-02-22 03:55
Thanks haypo!

Here the updated patch, following your comments.
msg129829 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-03-01 22:49
Fixed in Python 3.3 (r88697) and 3.2 (r88698). Thank you Ray.
History
Date User Action Args
2011-03-01 22:49:08vstinnersetstatus: open -> closed

messages: + msg129829
resolution: fixed
nosy: vstinner, ysj.ray
2011-02-22 03:55:46ysj.raysetfiles: - issue11246.diff
nosy: vstinner, ysj.ray
2011-02-22 03:55:16ysj.raysetfiles: + issue11246.diff

type: behavior
messages: + msg129033
nosy: vstinner, ysj.ray
2011-02-21 20:24:19vstinnersetnosy: vstinner, ysj.ray
messages: + msg128990
2011-02-21 07:22:33ysj.raysetfiles: + issue11246.diff
nosy: vstinner, ysj.ray
keywords: + patch
2011-02-21 07:21:45ysj.raysetnosy: vstinner, ysj.ray
messages: + msg128945
2011-02-18 22:12:30vstinnersetnosy: vstinner, ysj.ray
components: + Library (Lib)
2011-02-18 22:12:10vstinnercreate