classification
Title: cgi.py multipart/form-data
Type: performance Stage: needs patch
Components: Library (Lib) Versions: Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: flox, hynek, ishimoto, orsenthil, pitrou, r.david.murray, teyc
Priority: normal Keywords: easy

Created on 2006-12-07 09:18 by teyc, last changed 2012-12-27 10:28 by hynek.

Messages (6)
msg61046 - (view) Author: Chui Tey (teyc) * Date: 2006-12-07 09:18
Uploading large binary files using multipart/form-data can be very inefficient because LF character may occur too frequently, resulting in the read_line_to_outer_boundary looping too many times.

*** cgi.py.Py24	Thu Dec  7 18:46:13 2006
--- cgi.py	Thu Dec  7 16:38:04 2006
***************
*** 707,713 ****
          last = next + "--"
          delim = ""
          while 1:
!             line = self.fp.readline()
              if not line:
                  self.done = -1
                  break
--- 703,709 ----
          last = next + "--"
          delim = ""
          while 1:
!             line = self.fp_readline()
              if not line:
                  self.done = -1
                  break
***************
*** 729,734 ****
--- 730,753 ----
                  delim = ""
              self.__write(odelim + line)
  
+     def fp_readline(self):
+ 
+         tell   = self.fp.tell()
+         buffer = self.fp.read(1 << 17)
+         parts  = buffer.split("\n")
+         retlst = []
+         for part in parts:
+             if part.startswith("--"):
+                 if retlst:
+                     retval = "\n".join(retlst) + "\n"
+                 else:
+                     retval = part + "\n"
+                 self.fp.seek(tell + len(retval))
+                 return retval
+             else:
+                 retlst.append(part)
+         return buffer
+ 
      def skip_lines(self):
          """Internal: skip lines until outer boundary if defined."""
          if not self.outerboundary or self.done:


The patch reads the file in larger increments. For my test file of 138 Mb, it reduced parsing time from 168 seconds to 19 seconds.

#------------ test script --------------------
import cgi
import cgi
import os
import profile
import stat

def run():
    filename = 'body.txt'
    size = os.stat(filename)[stat.ST_SIZE]
    fp = open(filename,'rb')
    environ = {}
    environ["CONTENT_TYPE"]   = open('content_type.txt','rb').read()
    environ["REQUEST_METHOD"] = "POST"
    environ["CONTENT_LENGTH"] = str(size)

    fieldstorage = cgi.FieldStorage(fp, None, environ=environ)
    return fieldstorage

import hotshot, hotshot.stats
import time
if 1:
    t1 = time.time()
    prof = hotshot.Profile("bug1718.prof")
    # hotshot profiler will crash with the 
    # patch applied on windows xp
    #prof_results = prof.runcall(run)
    prof_results  = run()
    prof.close()
    t2 = time.time()
    print t2-t1
    if 0:
      for key in prof_results.keys():
        if len(prof_results[key].value)> 100:
            print key, prof_results[key].value[:80] + "..."
        else:
            print key, prof_results[key]

content_type.txt
----------------------------
multipart/form-data; boundary=----------ThIs_Is_tHe_bouNdaRY_$
msg110090 - (view) Author: Mark Lawrence (BreamoreBoy) Date: 2010-07-12 14:35
Chui Tey does this issue still apply?  If yes, could you please provide a patch according to the guidelines here.
python.org/dev/patches
msg114915 - (view) Author: Mark Lawrence (BreamoreBoy) Date: 2010-08-25 14:32
No reply to msg110090.
msg119535 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-10-25 02:24
I don't think it was appropriate to close this issue.
msg166021 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2012-07-21 13:14
It needs tests to demonstrate the issue in 3.x, and an updated patch.
msg178292 - (view) Author: Hynek Schlawack (hynek) * (Python committer) Date: 2012-12-27 10:28
It would be great if someone could port this patch to Python 3.4 and verify its effectiveness.
History
Date User Action Args
2012-12-27 10:28:14hyneksetkeywords: + easy, - patch

stage: test needed -> needs patch
messages: + msg178292
versions: + Python 3.4, - Python 3.2, Python 3.3
2012-07-30 14:46:10ishimotosetnosy: + ishimoto
2012-07-21 13:42:41pitrousetnosy: + orsenthil
2012-07-21 13:14:52floxsetversions: + Python 3.3
nosy: + pitrou, hynek

messages: + msg166021

stage: patch review -> test needed
2010-10-25 02:24:47r.david.murraysetstatus: closed -> open
versions: + Python 3.2, - Python 3.1, Python 2.7
nosy: + r.david.murray, - BreamoreBoy
messages: + msg119535

resolution: wont fix ->
stage: test needed -> patch review
2010-08-27 03:10:20floxsetnosy: + flox
2010-08-25 14:32:16BreamoreBoysetstatus: open -> closed
resolution: wont fix
messages: + msg114915
2010-07-12 14:35:15BreamoreBoysetnosy: + BreamoreBoy
messages: + msg110090
2009-03-30 17:39:32ajaksu2setkeywords: + patch
stage: test needed
type: performance
components: + Library (Lib), - Interpreter Core
versions: + Python 3.1, Python 2.7
2006-12-07 09:18:18teyccreate