Title: UnicodeDecodeError when retrieving binary data from cgi.FieldStorage()
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.1
Status: closed Resolution: duplicate
Dependencies: Superseder: cgi module cannot handle POST with multipart/form-data in 3.x
View: 4953
Assigned To: Nosy List: ezio.melotti, loveminix, r.david.murray, terry.reedy
Priority: normal Keywords:

Created on 2009-09-07 05:13 by loveminix, last changed 2010-01-10 17:16 by r.david.murray. This issue is now closed.

Messages (6)
msg92343 - (view) Author: (loveminix) Date: 2009-09-07 05:13
The following cgi applet uploads a binary file to the server. It gets a
"UnicodeDecodeError" inside cgi.FieldStorage(). The same code works in
python 2.6.

#! /usr/bin/python3

import os, cgi;
import cgitb; cgitb.enable();

pathInfo = os.environ.get("PATH_INFO", "");
serverName = os.environ.get("SERVER_NAME", "");
scriptName = os.environ.get("SCRIPT_NAME", "");

if pathInfo == "":
Content-type: text/html
    <meta http-equiv="Content-type" content="text/html;charset=UTF-8">
        <form action="http://{0}{1}/upload"
enctype="multipart/form-data" method="post">
            <input type="file" name="file"><br>
            <input type="submit" value="Upload">
""".format(serverName, scriptName)
elif pathInfo == "/upload":
    fieldStorage = cgi.FieldStorage();
    fileItem = fieldStorage["file"];
    if fileItem.filename != "":
        file = open("/tmp/test.txt", mode="wb");
Content-type: text/html
    <meta http-equiv="Content-type" content="text/html;charset=UTF-8">
msg92360 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2009-09-07 11:51
Can you paste the traceback of the error?
msg92385 - (view) Author: (loveminix) Date: 2009-09-07 19:32
Here is the trackback (the uploaded file is a PDF file):

UnicodeDecodeError Python 3.1.1: /home/chu7/software/bin/python3
Mon Sep 7 12:31:07 2009 

A problem occurred in a Python script. Here is the sequence of function 
calls leading up to the error, in the order they occurred.

 /home/chu7/web/cgi-bin/ in () 
     35 );
     36 elif pathInfo == "/upload":
=>   37     fieldStorage = cgi.FieldStorage();
     38     fileItem = fieldStorage["file"];
     39     if fileItem.filename != "":
fieldStorage undefined, cgi = <module 'cgi' 
from '/home/chu7/software/lib/python3.1/'>, cgi.FieldStorage = 
<class 'cgi.FieldStorage'> 
 /home/chu7/software/lib/python3.1/ in __init__(self=FieldStorage
(None, None, []), fp=None, headers={'content-length': '76784', 'content-
type': 'multipart/form-data; boundary=---------------------------
7d95563062a'}, outerboundary='', environ=<os._Environ object at 
0x8ee040c>, keep_blank_values=0, strict_parsing=0) 
    489             self.read_urlencoded()
    490         elif ctype[:10] == 'multipart/':
=>  491             self.read_multi(environ, keep_blank_values, 
    492         else:
    493             self.read_single()
self = FieldStorage(None, None, []), self.read_multi = <bound method 
FieldStorage.read_multi of FieldStorage(None, None, [])>, environ = 
<os._Environ object at 0x8ee040c>, keep_blank_values = 0, 
strict_parsing = 0 
 /home/chu7/software/lib/python3.1/ in read_multi
(self=FieldStorage(None, None, []), environ=<os._Environ object at 
0x8ee040c>, keep_blank_values=0, strict_parsing=0) 
    609         # Create bogus content-type header for proper multipart 
    610         parser.feed('Content-Type: %s; boundary=%s\r\n\r\n' % 
(self.type, ib))
=>  611         parser.feed(
    612         full_msg = parser.close()
    613         # Get subparts
parser = <email.feedparser.FeedParser object at 0x910daac>, parser.feed 
= <bound method FeedParser.feed of <email.feedparser.FeedParser object 
at 0x910daac>>, self = FieldStorage(None, None, []), self.fp = 
<_io.TextIOWrapper name='<stdin>' encoding='ANSI_X3.4-1968'>, = <built-in method read of _io.TextIOWrapper object at 
 /home/chu7/software/lib/python3.1/encodings/ in decode
(self=<encodings.ascii.IncrementalDecoder object at 0x8f6b74c>, 
----------------7d95563062a--\r\n', final=True) 
     24 class IncrementalDecoder(codecs.IncrementalDecoder):
     25     def decode(self, input, final=False):
=>   26         return codecs.ascii_decode(input, self.errors)[0]
     28 class StreamWriter(Codec,codecs.StreamWriter):
global codecs = <module 'codecs' 
from '/home/chu7/software/lib/python3.1/'>, 
codecs.ascii_decode = <built-in function ascii_decode>, input = b'------
--7d95563062a--\r\n', self = <encodings.ascii.IncrementalDecoder object 
at 0x8f6b74c>, self.errors = 'strict' 

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc7 in position 
158: ordinal not in range(128) 
      args = ('ascii', b'-----------------------------
158, 159, 'ordinal not in range(128)') 
      encoding = 'ascii' 
      end = 159 
      object = b'-----------------------------7d95563062a\r\nCo...\n----
      reason = 'ordinal not in range(128)' 
      start = 158 
      with_traceback = <built-in method with_traceback of 
UnicodeDecodeError object at 0x905bd2c>
msg92470 - (view) Author: (loveminix) Date: 2009-09-09 22:05
Is there an update on this? Let me know if more information is needed.
msg92520 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2009-09-11 19:41
0. When you have a problem, and you are not sure it is an error in the
interpreter or stdlib, try posting to python-list first. If you do not
soon get a definitive determination here, do that. This is not a
question-answering or user-code debugging list.

1. When posting code so others can read it, please leave off trailing
semi-colons. Python is not C. The unnecessary ';'s are distracting noise.

2. When posting problematic code, please strip it down to the minimum
necessary to show the problem. From what I understand, the minimum here is
"import cgi; d = cgi.FieldStorage()". The rest is distracting noise. 

3. When you get a error traceback, copy and paste *the whole traceback*
(especially when requested).

4. Post the exact interpreter version used.  Is this with 3.1.1, with
all the latest bug fixes? Even better, try with the lastest version and
say so.

5. When the problem involved interacting with the external system (such
as os.environ), specify the os. It often makes a difference.

In this case, your problem seems to be that the relevant field of
os.environ has a non-ascii char. My impression is that this is not
intented to be allowed, at least for posix systems, but I am not sure.
It is possible that something in the module has not been updated
properly, but I do not know enough of the specs to say whether the
problem is your data or the library code.

You should determine the 'offending' string and print out its
representation (with repr(s)) and include that with any further post
here or on python-list. In general, include offending data (also
minimized) along with offending code.
msg97523 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-01-10 17:16
This appears to be a duplicate of issue 4953.
Date User Action Args
2010-01-10 17:16:40r.david.murraysetstatus: open -> closed
priority: normal
superseder: cgi module cannot handle POST with multipart/form-data in 3.x

nosy: + r.david.murray
messages: + msg97523
resolution: duplicate
stage: resolved
2009-09-11 19:41:09terry.reedysetnosy: + terry.reedy
messages: + msg92520
2009-09-09 22:05:25loveminixsetmessages: + msg92470
2009-09-07 19:32:47loveminixsetmessages: + msg92385
2009-09-07 11:51:54ezio.melottisetnosy: + ezio.melotti
messages: + msg92360
2009-09-07 05:13:32loveminixcreate