Issue231249
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2001-02-06 12:59 by stadt, last changed 2022-04-10 16:03 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
cgi.py.txt | gvanrossum, 2001-04-11 20:20 | Patch for cgi.py |
Messages (15) | |||
---|---|---|---|
msg3217 - (view) | Author: Richard van de Stadt (stadt) | Date: 2001-02-06 12:59 | |
cgi.FieldStorage() is used to get the contents of a webform. It turns out that for each <input> line, a new temporary file is opened. This causes the script that is using cgi.FieldStorage() to reach the webserver's limit of number of opened files, as described by 'ulimit -n'. The standard value for Solaris systems seems to be 64, so webforms with that many <input> fields cannot be dealt with. A solution would seem to use the same temporary filename, since only a maxmimum one file is (temporarily) used at the same time. I did an "ls|wc -l" while the script was running, which showed only zeroes and ones. (I'm using Python for CyberChair, an online paper submission and reviewing system. The webform under discussion has one input field for each reviewer, stating the papers he or she is supposed to be reviewing. One conference that is using CyberChair has almost 140 reviewers. Their system's open file limit is 64. Using the same data on a system with an open file limit of 260 _is_ able to deal with this.) |
|||
msg3218 - (view) | Author: A.M. Kuchling (akuchling) * | Date: 2001-02-17 05:41 | |
I assume you mean 64 file upload fields, right? Can you provide a small test program that triggers the problem? |
|||
msg3219 - (view) | Author: Richard van de Stadt (stadt) | Date: 2001-02-18 02:37 | |
I do *not* mean file upload fields. I stumbled upon this with a webform that contains 141 'simple' input fields like the form you can see here (which 'only' contains 31 of those input fields): http://www.cyberchair.org/cgi-cyb/genAssignPageReviewerPapers.py (use chair/chair to login) When the maximum number of file descriptors used per process was increased to 160 (by the sysadmins), the problem did not occur anymore, and the webform could be processed. This was the error message I got: Traceback (most recent call last): File "/usr/local/etc/httpd/DocumentRoot/ICML2001/cgi-bin/submitAssignRP.py", line 144, in main File "/opt/python/2.0/sparc-sunos5.6/lib/python2.0/cgi.py", line 504, in __init__ File "/opt/python/2.0/sparc-sunos5.6/lib/python2.0/cgi.py", line 593, in read_multi File "/opt/python/2.0/sparc-sunos5.6/lib/python2.0/cgi.py", line 506, in __init__ File "/opt/python/2.0/sparc-sunos5.6/lib/python2.0/cgi.py", line 603, in read_single File "/opt/python/2.0/sparc-sunos5.6/lib/python2.0/cgi.py", line 623, in read_lines File "/opt/python/2.0/sparc-sunos5.6/lib/python2.0/cgi.py", line 713, in make_file File "/opt/python/2.0/sparc-sunos5.6/lib/python2.0/tempfile.py", line 144, in TemporaryFile OSError: [Errno 24] Too many open files: '/home/yara/brodley/icml2001/tmp/@26048.61' I understand why you assume that it would concern *file* uploads, but this is not the case. As I reported before, it turns out that for each 'simple' <input> field a temporary file is used in to transfer the contents to the script that uses the cgi.FieldStorage() method, even if no files are being uploaded. The problem is not that too many files are open at the same time (which is 1 at most). It is the *amount* of files that is causing the troubles. If the same temporary file would be used, this problem would probably not have happened. My colleague Fred Gansevles wrote a possible solution, but mentioned that this might introduce the need for protection against a 'symlink attack' (whatever that may be). This solution(?) concentrates on the open file descriptor's problem, while Fred suggests a redesign of FieldStorage() would probably be better. import os, tempfile AANTAL = 50 class TemporaryFile: def __init__(self): self.name = tempfile.mktemp("") open(self.name, 'w').close() self.offset = 0 def seek(self, offset): self.offset = offset def read(self): fd = open(self.name, 'w+b', -1) fd.seek(self.offset) data = fd.read() self.offset = fd.tell() fd.close() return data def write(self, data): fd = open(self.name, 'w+b', -1) fd.seek(self.offset) fd.write(data) self.offset = fd.tell() fd.close() def __del__(self): os.unlink(self.name) def add_fd(l, n) : map(lambda x,l=l: l.append(open('/dev/null')), range(n)) def add_tmp(l, n) : map(lambda x,l=l: l.append(TemporaryFile()), range(n)) def main (): import getopt, sys try: import resource soft, hard = resource.getrlimit (resource.RLIMIT_NOFILE) resource.setrlimit (resource.RLIMIT_NOFILE, (hard, hard)) except ImportError: soft, hard = 64, 1024 opts, args = getopt.getopt(sys.argv[1:], 'n:t') aantal = AANTAL tmp = add_fd for o, a in opts: if o == '-n': aantal = int(a) elif o == '-t': tmp = add_tmp print "aantal te gebruiken fd's:", aantal #dutch; English: 'number of fds to be used' print 'tmp:', tmp.func_name tmp_files = [] files=[] tmp(tmp_files, aantal) try: add_fd(files,hard) except IOError: pass print "aantal vrije gebruiken fd's:", len(files) #enlish: 'number of free fds' main() Running the above code: python ulimit.py [-n number] [-t] default number = 50, while using 'real' fd-s for temporary files. When using the '-t' flag 'smart' temporary files are used. Output: $ python ulimit.py aantal te gebruiken fd's: 50 tmp: add_fd aantal vrije gebruiken fd's: 970 $ python ulimit.py -t aantal te gebruiken fd's: 50 tmp: add_tmp aantal vrije gebruiken fd's: 1020 $ python ulimit.py -n 1000 aantal te gebruiken fd's: 1000 tmp: add_fd aantal vrije gebruiken fd's: 20 $ python ulimit.py -n 1000 -t aantal te gebruiken fd's: 1000 tmp: add_tmp aantal vrije gebruiken fd's: 1020 |
|||
msg3220 - (view) | Author: A.M. Kuchling (akuchling) * | Date: 2001-02-18 22:08 | |
Ah, I see; the traceback makes this much clearer. When you're uploading a file, everything in the form is sent as a MIME document in the body; every field is accompanied by a boundary separator and Content-Disposition header. In multipart mode, cgi.py copies each field into a temporary file. The first idea I had was to only use tempfiles for the actual upload field; unfortunately, that doesn't help because the upload field isn't special, and cgi.py has no way to know which it is ahead of time. Possible second approach: measure the size of the resulting file; if it's less than some threshold (1K? 10K?), read its contents into memory and close the tempfile. This means only the largest fields will require that a file descriptor be kept open. I'll explore this more after beta1. |
|||
msg3221 - (view) | Author: Kurt B. Kaiser (kbk) * | Date: 2001-02-19 07:32 | |
In the particular HTML form referenced it appears that a workaround might be to eliminate the enctype attribute in the <form> tag and take the application/x-www-form-urlencoded default since no files are being uploaded. When make_file creates the temporary files they are immediately unlinked. There is probably a brief period before the unlink is finalized during which the ls process might see a file; that would account for the output of ls | wc. It appears that the current cgi.py implementation leaves all the (hundreds of) files open until the cgi process releases the FieldStorage object or exits. My first thought was, if the filename recovered from the header is None have make_file create a StringIO object instead of a temp file. That way a temp file is only created when a file is uploaded. This is not inconsistent with the cgi.py docs. Unfortunately, RFC2388 4.4 states that a filename is not required to be sent, so it looks like your solution based on the size of the data received is the correct one. Below 1K you could copy the temp file contents to a StringIO and assign it to self.file, then explicitly close the temp file via its descriptor. If only I understood the module better ::-(( and had a way of tunnel testing it I might have had the temerity to offer a patch. (I'm away for a couple of weeks starting tomorrow.) |
|||
msg3222 - (view) | Author: A.M. Kuchling (akuchling) * | Date: 2001-04-10 17:54 | |
Logged In: YES user_id=11375 Unassigning so someone else can take a look at it. |
|||
msg3223 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2001-04-10 18:54 | |
Logged In: YES user_id=6380 I wish I'd heard about this sooner. It does seem a problem and it does make sense to use StringIO unless there's a lot of data. But we can't fix this in time for 2.1... |
|||
msg3224 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2001-04-11 20:20 | |
Logged In: YES user_id=6380 Here's a proposed patch. Can anyone think of a reason why this should not be checked in as part of 2.1? |
|||
msg3225 - (view) | Author: Kurt B. Kaiser (kbk) * | Date: 2001-04-12 17:05 | |
Logged In: YES user_id=149084 I have a thought on this, but it will be about 10 hours before I can submit it. |
|||
msg3226 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2001-04-12 18:59 | |
Logged In: YES user_id=6380 Uploading a new patch, more complicated. I don't like it as much. But it works even if the caller uses item.file.fileno(). |
|||
msg3227 - (view) | Author: Kurt B. Kaiser (kbk) * | Date: 2001-04-13 04:04 | |
Logged In: YES user_id=149084 The patch posted 11 Apr is a neat and compact solution! The only thing I can imagine would be a problem would be if a form had a large number of (small) fields which set the content-length attribute. I don't have an example of such, though. Text fields perhaps? If that was a realistic problem, a solution might be for make_file() to maintain a pool of temporary files; if the field (binary or not) turned out to be small a StringIO could be created and the temporary file returned to the pool. There are a couple of things I've been thinking about in cgi.py; the patch doesn't seem to change the situation one way or the other: There doesn't seem to be any RFC requirement that a file upload be accompanied by a content-length attribute, regardless of whether it is binary or ascii. In fact, some of the RFC examples I've seen omit it. If content-length is not specified, the upload will be processed by file.readline(). Can this cause problems for arbitrary binary files? |
|||
msg3228 - (view) | Author: Richard Jones (richard) * | Date: 2001-06-08 05:19 | |
Logged In: YES user_id=6405 I've just encountered this bug myself on Mac OS X. The default number for "ulimit -n" is 256, so you can imagine that it's a little worrying that I ran out :) As has been discussed, the multipart/form-data sumission sends a sub-part for every form name=value pair. I ran into the bug in cgi.py because I have a select list with >256 options - which I selected all entries in. This tips me over the 256 open file limit. I have two half-baked alternative suggestions for a solution: 1. use a single tempfile, opened when the multipart parsing is started. That tempfile may then be passed to the child FieldStorage instances and used by the parse_single calls. The child instances just keep track of their index and length in the tempfile. 2. use StringIO for parts of type "text/plain" and use a tempfile for all the rest. This has the problem that someone could cut-paste a core image into a text field though. I might have a crack at a patch for approach #1 this weekend... |
|||
msg3229 - (view) | Author: Douglas Bagnall (dbagnall) * | Date: 2001-06-09 22:06 | |
Logged In: YES user_id=107204 This has been causing me trouble too, on various machines. The patch from 2001-04-12 08:20 fixed the problem, but since then I haven't been able to upload files bigger than about 1k. I will try using 2.1 before I investigate that tho. Guido mentioned another more complicated, less likable, patch on 2001-04-13, which doesn't seem to have been uploaded. Or do I just not know where to look? |
|||
msg3230 - (view) | Author: Richard Jones (richard) * | Date: 2001-06-29 06:47 | |
Logged In: YES user_id=6405 Sorry for leaving this so long, but I wanted to say that I tried hacking a solution myself and gave up after it got too coplex. I have ended up adopting the solution in the patch here, and it's all working fine! |
|||
msg3231 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2001-06-29 13:06 | |
Logged In: YES user_id=6380 Thanks for reminding me of this patch. I've finally checked it in, so I can close the bug report! |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-10 16:03:42 | admin | set | github: 33858 |
2001-02-06 12:59:22 | stadt | create |