classification
Title: Redundant code in PyUnicode_EncodeDecimal()
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: facundobatista, loewis, mark.dickinson, ned.deily, neologix, python-dev, rhettinger, skrah, vstinner
Priority: normal Keywords: needs review, patch

Created on 2011-10-02 15:39 by skrah, last changed 2011-11-29 01:16 by ned.deily. This issue is now closed.

Files
File name Uploaded Description Edit
encode_decimal_redundant_code.diff skrah, 2011-10-02 15:39
Messages (15)
msg144774 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-10-02 15:39
I can't see what this code is supposed to accomplish (see patch):

  while (collend < end) {
      if ((0 < *collend && *collend < 256) ||
          !Py_UNICODE_ISSPACE(*collend) ||
          Py_UNICODE_TODECIMAL(*collend))
          break;
  }


Since 'collend' and 'end' don't change in the loop, it would be
infinite if the 'if' condition evaluated to false. But the 'if'
condition is always true.
msg147360 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-11-09 18:02
The code is also present in Python 2.7 and 3.2.
msg147361 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-11-09 18:04
PyUnicode_EncodeDecimal() was used in Python 2 by int, long and complex constructors. In Python 3, the function is no more used: it has been replaced by _PyUnicode_TransformDecimalAndSpaceToASCII().
msg147362 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011-11-09 18:31
Looking at it again, the intention was probably to increment
collend so that it points to the first non-garbage character
(or '\0'). If that's the case, the loop should be something
like this:

  while (collend < end) {
      if ((0 < *collend && *collend < 256) ||
          Py_UNICODE_ISSPACE(*collend) ||
          Py_UNICODE_TODECIMAL(*collend) >= 0)
          break;
      collend++;
  }


I can understand if no one wants to touch this given that
the function is deprecated, provided that you agree that
the existing code is redundant but harmless; i.e. it cannot
enter an infinite loop.
msg147370 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-11-09 20:22
AFAICT, this code was introduced in 0337dad8403e, implementing PEP 293 (see #432401). The intention clearly was what Stefan figured out: compute the list of unencodable characters, to pass the longest run of unencodable characters to the error handler.

Fixing this in this respect is fine with me for all branches.
msg148098 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-11-22 00:48
New changeset f29d7d597fae by Victor Stinner in branch '3.2':
Issue #13093: Fix error handling on PyUnicode_EncodeDecimal()
http://hg.python.org/cpython/rev/f29d7d597fae

New changeset bc53c11804ab by Victor Stinner in branch 'default':
(Merge 3.2) Issue #13093: Fix error handling on PyUnicode_EncodeDecimal()
http://hg.python.org/cpython/rev/bc53c11804ab
msg148099 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-11-22 00:52
New changeset 0cd197f13400 by Victor Stinner in branch '2.7':
Issue #13093: Fix error handling on PyUnicode_EncodeDecimal()
http://hg.python.org/cpython/rev/0cd197f13400
msg148101 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-11-22 00:53
I fixed the issue in Python 2.7, 3.2 and 3.3.

See also the changeset 849e9277906a (Python 3.3).
msg148133 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-11-22 17:56
New changeset 52fecdc1c5d8 by Charles-François Natali in branch 'default':
Issue #13093: Perform a real merge.
http://hg.python.org/cpython/rev/52fecdc1c5d8
msg148510 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-11-28 18:48
A 2.7 OS X buildbot segfaults in test_unicode since 0cd197f13400 :

http://www.python.org/dev/buildbot/all/builders/AMD64 Snow Leopard 2 2.7/builds/409/steps/test/logs/stdio

"""
test_unicode
make: *** [buildbottest] Segmentation fault
program finished with exit code 2
"""
msg148515 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-11-28 22:00
> A 2.7 OS X buildbot segfaults in test_unicode since 0cd197f13400
> http://www.python.org/dev/buildbot/all/builders/AMD64 Snow Leopard 2 2.7/builds/409/steps/test/logs/stdio

Hum, I'm unable to reproduce the crash on Linux or Mac OS X Tiger.
msg148516 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011-11-28 22:28
I am able to reproduce the problem on 2.7 OS X 10.7 64-bit. unicode._encodedecimal is gobbling up memory.  Looks like length is incorrect.

Breakpoint 1, unicode_encodedecimal (self=0x0, args=0x1007ce0d0) at /Users/nad/Projects/PyDev/active/temp/u27-clang/Modules/_testcapimodule.c:1122
1122	    decimal = PyBytes_FromStringAndSize(NULL, decimal_length);
(gdb) p length
$1 = 4294967299
Current language:  auto; currently minimal
(gdb) p unicode
$2 = (Py_UNICODE *) 0x101827ab8
(gdb) p errors
$3 = 0x0
msg148526 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-11-28 23:50
New changeset 3ecddf168f1f by Victor Stinner in branch '2.7':
Issue #13093: Fix _testcapi.unicode_encodedecimal()
http://hg.python.org/cpython/rev/3ecddf168f1f
msg148527 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-11-28 23:51
> Looks like length is incorrect.

Oh ok, _testcapimodule.c is "sssize_t" safe in Python 3, not in Python 2. Can you please try with the last tip?
msg148532 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011-11-29 01:16
That fixes it.
History
Date User Action Args
2011-11-29 01:16:19ned.deilysetstatus: open -> closed
resolution: fixed
messages: + msg148532

stage: needs patch -> resolved
2011-11-28 23:51:42vstinnersetmessages: + msg148527
2011-11-28 23:50:43python-devsetmessages: + msg148526
2011-11-28 22:28:07ned.deilysetnosy: + ned.deily
messages: + msg148516

resolution: fixed -> (no value)
stage: resolved -> needs patch
2011-11-28 22:00:24vstinnersetmessages: + msg148515
2011-11-28 18:48:01neologixsetstatus: closed -> open
nosy: + neologix
messages: + msg148510

2011-11-25 19:09:42vstinnersetmessages: - msg148348
2011-11-25 19:06:51python-devsetmessages: + msg148348
stage: patch review -> resolved
2011-11-22 17:56:55python-devsetmessages: + msg148133
2011-11-22 00:53:59vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg148101
2011-11-22 00:52:24python-devsetmessages: + msg148099
2011-11-22 00:48:14python-devsetnosy: + python-dev
messages: + msg148098
2011-11-09 20:22:39loewissetmessages: + msg147370
2011-11-09 18:31:23skrahsetmessages: + msg147362
2011-11-09 18:04:52vstinnersetnosy: + loewis
2011-11-09 18:04:46vstinnersetmessages: + msg147361
2011-11-09 18:02:49vstinnersetnosy: + vstinner

messages: + msg147360
versions: + Python 2.7, Python 3.2
2011-11-08 23:08:48ezio.melottisetnosy: + rhettinger, facundobatista, mark.dickinson
2011-10-02 15:39:15skrahcreate