Issue9968
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2010-09-28 11:58 by phep, last changed 2022-04-11 14:57 by admin.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
fix9968.patch | phep, 2011-07-29 22:42 | review |
Messages (16) | |||
---|---|---|---|
msg117512 - (view) | Author: Patrice Pillot (phep) | Date: 2010-09-28 11:58 | |
Hi, Presently, in cgi.FieldStorage, uploaded file are accessible through a file-like object created by a call to tempfile.TemporaryFile(), probably in order to ensure the file is deleted when the process terminates. The problem is that when one wants to save the data to some permanent location, one does not have other choice thant read() from this file descriptor then write() to the desired place, which means the file data shall be duplicated. This is greatly undesirable when the uploaded file is huge. Replacing the call to tempfile.TemporaryFile() by a call to tempfile.NamedTemporaryFile() (available from 2.3) would have the following advantages : 1) no impact on existing code, 2) saving a file would be as simple as invoking os.link(fieldstorage['filenamefield'].file.name, "/some/path") 3) There would be no need to duplicate data anymore. The sole real inconvenience of this change would be to update the documentation accordingly. tia, phep |
|||
msg117513 - (view) | Author: Patrice Pillot (phep) | Date: 2010-09-28 12:15 | |
Oops. I forgot to aknowledge the fact that presently cgi.FieldStorage class documentation (but not the cgi module documentation) tells about the possibility to override the make_file() method in a user subclass to change the present limitation (which is actually how I work around the problem presently). Yet, this does not (IMHO) render this feature request invalid. phep |
|||
msg117710 - (view) | Author: Éric Araujo (eric.araujo) * | Date: 2010-09-30 09:01 | |
Thanks for the report. Do you want to work on a patch? |
|||
msg117933 - (view) | Author: Patrice Pillot (phep) | Date: 2010-10-03 22:21 | |
Well, this is actually somewhat more complicated than what my first tests showed due to the way multipart/form-data is dealt with in FieldStorage.read_multi(). The solution I proposed last time only works if the uploaded file is passed as the first form field. I have to further study how things work in order to find some coherent solution. I may not have time to go back to this issue before next week unfortunately. phep |
|||
msg140673 - (view) | Author: Éric Araujo (eric.araujo) * | Date: 2011-07-19 15:00 | |
> Well, this is actually somewhat more complicated than what my first > tests showed due to the way multipart/form-data is dealt with in > FieldStorage.read_multi(). > > The solution I proposed last time only works if the uploaded file is > passed as the first form field. Can you copy here the tests you did and the error output you got? |
|||
msg141179 - (view) | Author: Patrice Pillot (phep) | Date: 2011-07-26 20:56 | |
Hi, This was test code :-/. It was not under revision control and unfortunately, as I feared then, it turned out I had to stop working shortly after my last message on the low-priority project that made me report this issue . Yet, I have a tarball of a presumably-not-so-different version of the code and I experimented with it this afternoon. Actually, I could not reproduce the problem I reported in msg117933. So, basically, the idea of replacing tempfile.TemporaryFile() by tempfile.NamedTemporaryFile() seems to really be valid, whatever the position of the form field. Yet I found another problem, quite reproducible that one, that seems to have some fame on Google. Due to the way cgi.__write() is written, contents of uploaded files with small sizes will be stored as StringIO. Consequently, sort-of calling fieldstorage['filenamefield'].file.name will fail. De facto, this may actually be the error I alluded to in msg117933 - I honestly cannot remember what I observed then: « c'est l'âge...». So the patch should then be one line longer: we would need to change __write() to add a check on the value of 'filename' : - if self.__file.tell() + len(line) > 1000: + if self.filename or self.__file.tell() + len(line) > 1000: This way, long textareas (for example), keep being stored on disk if this has any importance. There is yet one question I cannot answer: where does the 1ko barrier come from? Was it only chosen out of performance considerations, or was there another reason? As I am quite new to Python I fear I may miss something obvious. HTH |
|||
msg141186 - (view) | Author: Patrice Pillot (phep) | Date: 2011-07-26 21:42 | |
Oh, my... As we are working with python2.6 on our servers, I overlooked the fact that tempfile.NamedTemporaryFile was already used in python 3 (fixed in r57595, with not many explanations though). Yet, the problem with short length files (cf. msg141179) is still valid and the documentation has not yet been updated. |
|||
msg141187 - (view) | Author: Patrice Pillot (phep) | Date: 2011-07-26 21:53 | |
I got my head in a brown paper bag.... This is obviously not fixed... except in a locally edited checkout :-( Please be kind to the tired me, forget my last message and don't talk about this to my boss, will you ? It's closing time, really. |
|||
msg141234 - (view) | Author: Éric Araujo (eric.araujo) * | Date: 2011-07-27 15:26 | |
> where does the 1ko barrier come from? Was it only chosen out of > performance considerations [...] Most certainly. I’ll look at the history of the file later to try to find the developer who decided that. > tempfile.NamedTemporaryFile was already used in python 3 (fixed in > r57595, with not many explanations though). That’s a TemporaryFile, without a name. For more explanations, follow the parents of the changeset and you’ll find aee21e0c9a70 (referencing #1033) and cbd50ece3b61, where you can see an XXX note that is probably the “wish” referred to in the cryptic commit message. |
|||
msg141272 - (view) | Author: Patrice Pillot (phep) | Date: 2011-07-27 19:56 | |
>> where does the 1ko barrier come from? Was it only chosen out of >> performance considerations [...] > > Most certainly. I’ll look at the history of the file later to try to > find the developer who decided that. Guido van Rossum made the changes. Before that a temporary file was created for every form field. >> tempfile.NamedTemporaryFile was already used in python 3 (fixed in >> r57595, with not many explanations though). > > That’s a TemporaryFile, without a name. For more explanations, follow Yes, my sleepy eyes and drowsy brain fooled me more than once last night. Sigh... > the parents of the changeset and you’ll find aee21e0c9a70 > (referencing #1033) and cbd50ece3b61, where you can see an XXX > note that is probably the “wish” referred to in the cryptic commit > message. Thanks. A last question before trying to write the patch: In order for the change I propose to be interesting, one should let caller code decide where (on disk) the NamedTemporaryFile should be created since writing this on a different partition than the one the file will eventually reside on would ruin the whole trick. Also, I believe one can think of other reasons to give this freedom. Unfortunately, in order to do that, I can see no other solution than to change the FieldStorage constructor signature since the temp files are created as soon as the object is instantiated. So, we would end up with something along those lines: class FieldStorage(cgi.FieldStorage): def __init__(self, tempdir=tempfile.gettempdir()): self.tempdir = tempdir ... or a somewhat more convoluted form that would avoid importing tempfile needlessly. Do you think this would be acceptable ? |
|||
msg141292 - (view) | Author: Éric Araujo (eric.araujo) * | Date: 2011-07-28 13:49 | |
>>> where does the 1ko barrier come from? Was it only chosen out of >>> performance considerations [...] >> Most certainly. I’ll look at the history of the file later to try to >> find the developer who decided that. > Guido van Rossum made the changes. Before that a temporary file was > created for every form field. Do you have the changeset ID? > A last question [...] For a new feature, it’s okay to change signatures. Backward compatibility is preserved by appending the new argument at the end of the arguments lists, and making it optional. Given that tempfile respects the TMP and TMPDIR environment variables, do you think it would be possible for users to control the download dir for uploaded files by editing os.environ? This would require no change to cgi. If that’s not possible, then we’ll have to change the FieldStorage class. > Also, I believe one can think of other reasons to give this freedom. Can you think about some of them? > def __init__(self, tempdir=tempfile.gettempdir()): > or a somewhat more convoluted form that would avoid importing > tempfile needlessly. You could probably have a None argument that would be passed down to tempfile. |
|||
msg141407 - (view) | Author: Patrice Pillot (phep) | Date: 2011-07-29 18:40 | |
These are the changeset details: changeset: 18337:c2a60de91d2c branch: legacy-trunk user: Guido van Rossum <guido@python.org> date: Fri Jun 29 13:06:06 2001 +0000 summary: Solve SF bug #231249: cgi.py opens too many (temporary) files. You're right that we might use environment variables or tempfile.tempdir to attain the same goal but this would impact _all_ of the code executed during request data parsing given the monolithic construction of FieldStorage. This implies that the context of every call to tempfile members would be impacted during this process. Presently this is not a problem at all, but this looks fragile for future developments. On the other hand: 1) this has the advantage of not changing FieldStorage interface, 2) this alleviates us of wondering if passing to FieldStorage constructor all of the arguments to NamedTemporaryFile is a possibility worth considering ;-). After pondering this for a while I think the simpler is the better and I propose to add documentation to inform the reader that changing the temp directory through os.environ of tempfile.tempdir might worth consideration. As for other use cases for changing the temp directory, I thought about letting the user choose the FS of its choice with regard to performance or security (crypted FS) or even having the temp files created in a directory with 700 permissions. Last, you're perfectly right about the None argument. I fiddled last night with setting an environment to deploy and test a patched Python (I had some problem to understand what happened when I encountered 6755). This now works and the patch does not introduced any regression. I still have to add unit tests (I only tested with my embryonic cgi script) and update the documentation before to send the patch. I should be able to do that in a few days at most. |
|||
msg141414 - (view) | Author: Patrice Pillot (phep) | Date: 2011-07-29 22:42 | |
So, this is the patch. |
|||
msg141440 - (view) | Author: Éric Araujo (eric.araujo) * | Date: 2011-07-30 13:45 | |
> After pondering this for a while I think the simpler is the better > and I propose to add documentation to inform the reader that changing > the temp directory through os.environ of tempfile.tempdir might worth > consideration. Okay. Your patch has this doc change and also code changes. I’ve read in the tempfile module docstring that in order to control the directory, you have to set tempfile.tempdir before calling any tempfile function. I suspect this will not be okay for some applications. I wonder if the same limitation applies when one sets os.environ['TMP']. In the light of that, do you think the tempfile solution is enough, or do you want to get back to the earlier idea of adding an argument to FieldStorage? (One advantage of doc changes is that they can get committed to 2.7, 3.2 and 3.3 and get published immediately. We can anyway make a code change in 3.3 and a doc change for older versions.) > As for other use cases for changing the temp directory, I thought > about letting the user choose the FS of its choice with regard to > performance or security (crypted FS) or even having the temp files > created in a directory with 700 permissions. Excellent use cases. > I fiddled last night with setting an environment to deploy and test a > patched Python Are you aware of the developers’ guide? http://docs.python.org/devguide/ > (I had some problem to understand what happened when I encountered > 6755). What is 6755? |
|||
msg141472 - (view) | Author: Patrice Pillot (phep) | Date: 2011-07-31 17:58 | |
Le 30/07/2011 15:45, Éric Araujo a écrit : > > Éric Araujo<merwok@netwok.org> added the comment: > I’ve read in the tempfile module docstring that in order to control the > directory, you have to set tempfile.tempdir before calling any tempfile > function. I suspect this will not be okay for some applications. I > wonder if the same limitation applies when one sets os.environ['TMP']. You were right in pointing into this direction I forgot to mention in the doc part of the patch. I've had a look at tempfile implementation and made some tests and this led me to notice a documentation problem that might be worth considering (cf. infra). But first things first: Actually, after reading tempfile.py, I cannot see the reason of the tempfile docstring about tempdir, except maybe for performance reasons. Setting tempfile.tempdir manually should not be a problem in any case: direct access to the member or call to tempfile.gettempdir() - except maybe in the very improbable case where tempdir would first be set to some not None value then re-set to None and a nasty trick I overlooked is lurking inside of the gettempdir() concurrent-access lock... But, on the other hand, trying to tweak the system by using os.environ on TMP and such may well fail in a number of occasions since those variables won't be checked after the first call to tempfile.gettempdir() (except if tempfile.tempdir is reset to None). I wonder if the file docstring does not relate to this problem, as a matter of fact. > In the light of that, do you think the tempfile solution is enough, or do > you want to get back to the earlier idea of adding an argument to > FieldStorage? I can't think it of any reasonable use case that would make necessary to overload the interface right now provided the need to (maybe) set tempfile.tempdir is documented. At last, as a side note, during the tests I ran, I noticed one should really not use os.rename() since this raises an OSError (file not found) upon script termination (since the NamedTemporaryFile has been ... renamed, hey). As this problem does not concern the present issue, I did not deemed necessary to edit the documentation accordingly. Yet this might be debatable since the doc never explicitly says FieldStorage uses a (Named)TemporaryFile under the hood. Do you think one should state explicitly the coupling of those modules ? > (One advantage of doc changes is that they can get committed to 2.7, 3.2 > and 3.3 and get published immediately. We can anyway make a code change > in 3.3 and a doc change for older versions.) On this topic, I was wondering if the changes I propose have any chance of landing some day in 2.7 land - dunno Python workflow precisely enough. >> I fiddled last night with setting an environment to deploy and test a >> patched Python > Are you aware of the developers’ guide? > http://docs.python.org/devguide/ Yep, it helped me. It just took me some time to be sure to get it right, run the tests, ... (by the way, patchcheck.py seems to be broken). >> (I had some problem to understand what happened when I encountered >> 6755). > What is 6755? Sorry, I meant issue 6755. > ---------- title: Let cgi.FieldStorage have named uploaded file -> > cgi.FieldStorage: Give control about the directory used for uploads Well, actually, the useful feature in the patch is the possibility to have a named-hence-linkable uploaded file. Giving control on the upload directory is but a useful-while-quite-necessary sister-feature, isn't it ? > > _______________________________________ Python > tracker<report@bugs.python.org> <http://bugs.python.org/issue9968> > _______________________________________ > |
|||
msg146145 - (view) | Author: Éric Araujo (eric.araujo) * | Date: 2011-10-21 22:27 | |
I read somewhere that the fact that TemporaryFile does not expose its name is a security feature :/ > On this topic, I was wondering if the changes I propose have any chance of > landing some day in 2.7 land - dunno Python workflow precisely enough. Only bug fixes and doc fixes go into 2.7. This report is about a new feature, which has to target 3.3. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:07 | admin | set | github: 54177 |
2020-07-20 20:51:02 | Rhodri James | set | nosy:
- Rhodri James |
2019-08-03 14:56:02 | Rhodri James | set | nosy:
+ ethan.furman, Rhodri James |
2011-10-21 22:27:52 | eric.araujo | set | messages:
+ msg146145 title: cgi.FieldStorage: Give control about the directory used for uploads -> Let cgi.FieldStorage have named uploaded file |
2011-07-31 17:58:09 | phep | set | messages: + msg141472 |
2011-07-30 13:45:20 | eric.araujo | set | messages:
+ msg141440 title: Let cgi.FieldStorage have named uploaded file -> cgi.FieldStorage: Give control about the directory used for uploads |
2011-07-29 22:42:08 | phep | set | files:
+ fix9968.patch keywords: + patch messages: + msg141414 |
2011-07-29 18:40:45 | phep | set | messages: + msg141407 |
2011-07-28 13:49:15 | eric.araujo | set | messages: + msg141292 |
2011-07-27 19:56:51 | phep | set | messages: + msg141272 |
2011-07-27 15:26:15 | eric.araujo | set | messages: + msg141234 |
2011-07-26 21:53:57 | phep | set | messages: + msg141187 |
2011-07-26 21:42:52 | phep | set | messages: + msg141186 |
2011-07-26 20:56:47 | phep | set | messages: + msg141179 |
2011-07-19 15:00:19 | eric.araujo | set | messages: + msg140673 |
2011-07-02 13:36:29 | eric.araujo | set | versions: + Python 3.3, - Python 3.2 |
2010-10-03 22:21:55 | phep | set | messages: + msg117933 |
2010-09-30 09:01:33 | eric.araujo | set | versions:
+ Python 3.2 nosy: + eric.araujo, orsenthil messages: + msg117710 components: + Library (Lib), - Extension Modules stage: needs patch |
2010-09-28 12:15:32 | phep | set | messages: + msg117513 |
2010-09-28 11:58:14 | phep | create |