classification
Title: codecs utf7 decoding error
Type: behavior Stage:
Components: Versions: Python 2.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, arnimar
Priority: normal Keywords:

Created on 2007-11-19 09:29 by arnimar, last changed 2007-11-21 01:10 by amaury.forgeotdarc. This issue is now closed.

Files
File name Uploaded Description Edit
utf7.py arnimar, 2007-11-19 09:29
test arnimar, 2007-11-19 09:29
Messages (3)
msg57629 - (view) Author: Árni Már Jónsson (arnimar) Date: 2007-11-19 09:29
There is a utf-7 decoding error when decoding strings which have a shift
sequence at a certain place. To reproduce run the attached program on a
file containing the string:
"0123456789012345678901234567890123456789012345678901234567890123456789X+-".

The shift sequence starts at character 72. The culprit seems to be in
codecs.py: StreamReader.read. The input is split on the 72 character
boundary, and the first decode call raises an exception since the shift
sequence is not terminated. The second one falls back 1 character,
raises no exception, but the previous exception is raised since there is
no newline in the output (?).

The lines I don't understand are, and are the ones raising the exception.

if len(lines)<=1:
  raise
msg57630 - (view) Author: Árni Már Jónsson (arnimar) Date: 2007-11-19 09:29
Added a test file.
msg57722 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2007-11-21 01:10
The utf-7 incremental decoder was indeed losing its state between two
chunks of data.
Corrected as r59076.
History
Date User Action Args
2007-11-21 01:10:13amaury.forgeotdarcsetstatus: open -> closed
resolution: fixed
messages: + msg57722
nosy: + amaury.forgeotdarc
2007-11-19 09:29:44arnimarsetfiles: + test
messages: + msg57630
2007-11-19 09:29:18arnimarcreate