This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Let cgi.FieldStorage have named uploaded file
Type: enhancement Stage: needs patch
Components: Library (Lib) Versions: Python 3.3
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eric.araujo, ethan.furman, orsenthil, phep
Priority: normal Keywords: patch

Created on 2010-09-28 11:58 by phep, last changed 2022-04-11 14:57 by admin.

Files
File name Uploaded Description Edit
fix9968.patch phep, 2011-07-29 22:42 review
Messages (16)
msg117512 - (view) Author: Patrice Pillot (phep) Date: 2010-09-28 11:58
Hi,

Presently, in cgi.FieldStorage, uploaded file are accessible through a file-like object created by a call to tempfile.TemporaryFile(), probably in order to ensure the file is deleted when the process terminates.

The problem is that when one wants to save the data to some permanent location, one does not have other choice thant read() from this file descriptor then write() to the desired place, which means the file data shall be duplicated. This is greatly undesirable when the uploaded file is huge.

Replacing the call to tempfile.TemporaryFile() by a call to tempfile.NamedTemporaryFile() (available from 2.3) would have the following advantages :
1) no impact on existing code,
2) saving a file would be as simple as invoking os.link(fieldstorage['filenamefield'].file.name, "/some/path")
3) There would be no need to duplicate data anymore.

The sole real inconvenience of this change would be to update the documentation accordingly.

tia,

phep
msg117513 - (view) Author: Patrice Pillot (phep) Date: 2010-09-28 12:15
Oops. I forgot to aknowledge the fact that presently cgi.FieldStorage class documentation (but not the cgi module documentation) tells about the possibility to override the make_file() method in a user subclass to change the present limitation (which is actually how I work around the problem presently).

Yet, this does not (IMHO) render this feature request invalid.

phep
msg117710 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-09-30 09:01
Thanks for the report.  Do you want to work on a patch?
msg117933 - (view) Author: Patrice Pillot (phep) Date: 2010-10-03 22:21
Well, this is actually somewhat more complicated than what my first tests showed due to the way multipart/form-data is dealt with in FieldStorage.read_multi().

The solution I proposed last time only works if the uploaded file is passed as the first form field.

I have to further study how things work in order to find some coherent solution.

I may not have time to go back to this issue before next week unfortunately.

phep
msg140673 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-07-19 15:00
> Well, this is actually somewhat more complicated than what my first
> tests showed due to the way multipart/form-data is dealt with in
> FieldStorage.read_multi().
>
> The solution I proposed last time only works if the uploaded file is
> passed as the first form field.

Can you copy here the tests you did and the error output you got?
msg141179 - (view) Author: Patrice Pillot (phep) Date: 2011-07-26 20:56
Hi,

This was test code :-/. It was not under revision control and 
unfortunately, as I feared then, it turned out I had to stop
working shortly after my last message on the low-priority
project that made me report this issue .

Yet, I have a tarball of a presumably-not-so-different version
of the code and I experimented with it this afternoon. Actually,
I could not reproduce the problem I reported in msg117933. So, 
basically, the idea of replacing tempfile.TemporaryFile() by
tempfile.NamedTemporaryFile() seems to really be valid, whatever
the position of the form field.

Yet I found another problem, quite reproducible that one, that
seems to have some fame on Google. Due to the way cgi.__write()
is written, contents of uploaded files with small sizes will be
stored as StringIO. Consequently, sort-of calling
fieldstorage['filenamefield'].file.name will fail. De facto,
this may actually be the error I alluded to in msg117933 - I 
honestly cannot remember what I observed then: « c'est
l'âge...».

So the patch should then be one line longer: we would need
to change __write() to add a check on the value of 'filename' :
-           if self.__file.tell() + len(line) > 1000:
+           if self.filename or self.__file.tell() + len(line) > 1000:

This way, long textareas (for example), keep being stored on
disk if this has any importance.

There is yet one question I cannot answer: where does the 1ko 
barrier come from? Was it only chosen out of performance
considerations, or was there another reason? As I am quite new 
to Python I fear I may miss something obvious.

HTH
msg141186 - (view) Author: Patrice Pillot (phep) Date: 2011-07-26 21:42
Oh, my...

As we are working with python2.6 on our servers, I overlooked the fact that tempfile.NamedTemporaryFile was already used in python 3 (fixed in r57595, with not many explanations though).

Yet, the problem with short length files (cf. msg141179) is still valid and the documentation has not yet been updated.
msg141187 - (view) Author: Patrice Pillot (phep) Date: 2011-07-26 21:53
I got my head in a brown paper bag.... 

This is obviously not fixed... except in a locally edited checkout :-(

Please be kind to the tired me, forget my last message and don't talk about this to my boss, will you ?

It's closing time, really.
msg141234 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-07-27 15:26
> where does the 1ko barrier come from? Was it only chosen out of
> performance considerations [...]

Most certainly.  I’ll look at the history of the file later to try to find the developer who decided that.

> tempfile.NamedTemporaryFile was already used in python 3 (fixed in
> r57595, with not many explanations though).

That’s a TemporaryFile, without a name.  For more explanations, follow the parents of the changeset and you’ll find aee21e0c9a70 (referencing #1033) and cbd50ece3b61, where you can see an XXX note that is probably the “wish” referred to in the cryptic commit message.
msg141272 - (view) Author: Patrice Pillot (phep) Date: 2011-07-27 19:56
>> where does the 1ko barrier come from? Was it only chosen out of 
>> performance considerations [...]
> 
> Most certainly.  I’ll look at the history of the file later to try to
> find the developer who decided that.

Guido van Rossum made the changes. Before that a temporary file was created for every form field.

>> tempfile.NamedTemporaryFile was already used in python 3 (fixed in 
>> r57595, with not many explanations though).
> 
> That’s a TemporaryFile, without a name.  For more explanations, follow

Yes, my sleepy eyes and drowsy brain fooled me more than once last
night. Sigh...

> the parents of the changeset and you’ll find aee21e0c9a70 
> (referencing #1033) and cbd50ece3b61, where you can see an XXX
> note that is probably the “wish” referred to in the cryptic commit
> message.

Thanks.

A last question before trying to write the patch: In order for the change I propose to be interesting, one should let caller code decide where (on disk) the NamedTemporaryFile should be created since writing this on a different partition than the one the file will eventually reside on would ruin the whole trick. Also, I believe one can think of other reasons to give this freedom.

Unfortunately, in order to do that, I can see no other solution than to change the FieldStorage constructor signature since the temp files are created as soon as the object is instantiated. So, we would end up with something along those lines:

class FieldStorage(cgi.FieldStorage):
    def __init__(self, tempdir=tempfile.gettempdir()):
        self.tempdir = tempdir
	...

or a somewhat more convoluted form that would avoid importing tempfile needlessly.

Do you think this would be acceptable ?
msg141292 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-07-28 13:49
>>> where does the 1ko barrier come from? Was it only chosen out of 
>>> performance considerations [...]
>> Most certainly.  I’ll look at the history of the file later to try to
>> find the developer who decided that.
> Guido van Rossum made the changes. Before that a temporary file was
> created for every form field.

Do you have the changeset ID?

> A last question [...]

For a new feature, it’s okay to change signatures.  Backward compatibility is preserved by appending the new argument at the end of the arguments lists, and making it optional.

Given that tempfile respects the TMP and TMPDIR environment variables, do you think it would be possible for users to control the download dir for uploaded files by editing os.environ?  This would require no change to cgi.  If that’s not possible, then we’ll have to change the FieldStorage class.

> Also, I believe one can think of other reasons to give this freedom.

Can you think about some of them?

>    def __init__(self, tempdir=tempfile.gettempdir()):
> or a somewhat more convoluted form that would avoid importing
> tempfile needlessly.

You could probably have a None argument that would be passed down to tempfile.
msg141407 - (view) Author: Patrice Pillot (phep) Date: 2011-07-29 18:40
These are the changeset details:
changeset:   18337:c2a60de91d2c
branch:      legacy-trunk
user:        Guido van Rossum <guido@python.org>
date:        Fri Jun 29 13:06:06 2001 +0000
summary:     Solve SF bug #231249: cgi.py opens too many (temporary) files.


You're right that we might use environment variables or tempfile.tempdir
to attain the same goal but this would impact _all_ of the code executed
during request data parsing given the monolithic construction of
FieldStorage. This implies that the context of every call to tempfile
members would be impacted during this process.  Presently this is not a
problem at all, but this looks fragile for future developments. 

On the other hand:
1) this has the advantage of not changing FieldStorage interface,
2) this alleviates us of wondering if passing to FieldStorage
constructor all of the arguments to NamedTemporaryFile is a possibility
worth considering ;-).

After pondering this for a while I think the simpler is the better and I
propose to add documentation to inform the reader that changing the
temp directory through os.environ of tempfile.tempdir might worth
consideration.

As for other use cases for changing the temp directory, I thought about
letting the user choose the FS of its choice with regard to
performance or security (crypted FS) or even having the temp files created in a directory with 700 permissions. 

Last, you're perfectly right about the None argument.

I fiddled last night with setting an environment to deploy and test a patched Python (I had some problem to understand what happened when I encountered 6755). This now works and the patch does not introduced any regression. I still have to add unit tests (I only tested with my embryonic cgi script) and update the documentation before to send the patch. I should be able to do that in a few days at most.
msg141414 - (view) Author: Patrice Pillot (phep) Date: 2011-07-29 22:42
So, this is the patch.
msg141440 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-07-30 13:45
> After pondering this for a while I think the simpler is the better
> and I propose to add documentation to inform the reader that changing
> the temp directory through os.environ of tempfile.tempdir might worth
> consideration.
Okay.  Your patch has this doc change and also code changes.

I’ve read in the tempfile module docstring that in order to control the directory, you have to set tempfile.tempdir before calling any tempfile function.  I suspect this will not be okay for some applications.  I wonder if the same limitation applies when one sets os.environ['TMP'].

In the light of that, do you think the tempfile solution is enough, or do you want to get back to the earlier idea of adding an argument to FieldStorage?

(One advantage of doc changes is that they can get committed to 2.7, 3.2 and 3.3 and get published immediately.  We can anyway make a code change in 3.3 and a doc change for older versions.)

> As for other use cases for changing the temp directory, I thought
> about letting the user choose the FS of its choice with regard to
> performance or security (crypted FS) or even having the temp files
> created in a directory with 700 permissions.
Excellent use cases.

> I fiddled last night with setting an environment to deploy and test a
> patched Python
Are you aware of the developers’ guide?  http://docs.python.org/devguide/

> (I had some problem to understand what happened when I encountered
> 6755).
What is 6755?
msg141472 - (view) Author: Patrice Pillot (phep) Date: 2011-07-31 17:58
Le 30/07/2011 15:45, Éric Araujo a écrit :
>
> Éric Araujo<merwok@netwok.org>  added the comment:
> I’ve read in the tempfile module docstring that in order to control the
> directory, you have to set tempfile.tempdir before calling any tempfile
> function.  I suspect this will not be okay for some applications.  I
> wonder if the same limitation applies when one sets os.environ['TMP'].

You were right in pointing into this direction I forgot to mention in the 
doc part of the patch. I've had a look at tempfile implementation and made 
some tests and this led me to notice a documentation problem that might be 
worth considering (cf. infra).

But first things first: Actually, after reading tempfile.py, I cannot see 
the reason of the tempfile docstring about tempdir, except maybe for 
performance reasons. Setting tempfile.tempdir manually should not be a 
problem in any case: direct access to the member or call to 
tempfile.gettempdir() - except maybe in the very improbable case where 
tempdir would first be set to some not None value then re-set to None and a 
nasty trick I overlooked is lurking inside of the gettempdir() 
concurrent-access lock...

But, on the other hand, trying to tweak the system by using os.environ on 
TMP and such may well fail in a number of occasions since those variables 
won't be checked after the first call to tempfile.gettempdir() (except if 
tempfile.tempdir is reset to None). I wonder if the file docstring does not 
relate to this problem, as a matter of fact.

> In the light of that, do you think the tempfile solution is enough, or do
> you want to get back to the earlier idea of adding an argument to
> FieldStorage?

I can't think it of any reasonable use case that would make necessary to 
overload the interface right now provided the need to (maybe) set 
tempfile.tempdir is documented.

At last, as a side note, during the tests I ran, I noticed one should really 
not use os.rename() since this raises an OSError (file not found) upon 
script termination (since the NamedTemporaryFile has been ... renamed, hey).

As this problem does not concern the present issue, I did not deemed 
necessary to edit the documentation accordingly. Yet this might be debatable 
since the doc never explicitly says FieldStorage uses a (Named)TemporaryFile 
under the hood. Do you think one should state explicitly the coupling of 
those modules ?

> (One advantage of doc changes is that they can get committed to 2.7, 3.2
> and 3.3 and get published immediately.  We can anyway make a code change
> in 3.3 and a doc change for older versions.)

On this topic, I was wondering if the changes I propose have any chance of 
landing some day in 2.7 land - dunno Python workflow precisely enough.

>> I fiddled last night with setting an environment to deploy and test a
>> patched Python
> Are you aware of the developers’ guide?
> http://docs.python.org/devguide/

Yep, it helped me. It just took me some time to be sure to get it right, run 
the tests, ... (by the way, patchcheck.py seems to be broken).

>> (I had some problem to understand what happened when I encountered
>> 6755).
> What is 6755?

Sorry, I meant issue 6755.

> ---------- title: Let cgi.FieldStorage have named uploaded file ->
> cgi.FieldStorage: Give control about the directory used for uploads

Well, actually, the useful feature in the patch is the possibility to have a 
named-hence-linkable uploaded file. Giving control on the upload directory 
is but a useful-while-quite-necessary sister-feature, isn't it ?

>
> _______________________________________ Python
> tracker<report@bugs.python.org> <http://bugs.python.org/issue9968>
> _______________________________________
>
msg146145 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-10-21 22:27
I read somewhere that the fact that TemporaryFile does not expose its name is a security feature :/

> On this topic, I was wondering if the changes I propose have any chance of 
> landing some day in 2.7 land - dunno Python workflow precisely enough.
Only bug fixes and doc fixes go into 2.7.  This report is about a new feature, which has to target 3.3.
History
Date User Action Args
2022-04-11 14:57:07adminsetgithub: 54177
2020-07-20 20:51:02Rhodri Jamessetnosy: - Rhodri James
2019-08-03 14:56:02Rhodri Jamessetnosy: + ethan.furman, Rhodri James
2011-10-21 22:27:52eric.araujosetmessages: + msg146145
title: cgi.FieldStorage: Give control about the directory used for uploads -> Let cgi.FieldStorage have named uploaded file
2011-07-31 17:58:09phepsetmessages: + msg141472
2011-07-30 13:45:20eric.araujosetmessages: + msg141440
title: Let cgi.FieldStorage have named uploaded file -> cgi.FieldStorage: Give control about the directory used for uploads
2011-07-29 22:42:08phepsetfiles: + fix9968.patch
keywords: + patch
messages: + msg141414
2011-07-29 18:40:45phepsetmessages: + msg141407
2011-07-28 13:49:15eric.araujosetmessages: + msg141292
2011-07-27 19:56:51phepsetmessages: + msg141272
2011-07-27 15:26:15eric.araujosetmessages: + msg141234
2011-07-26 21:53:57phepsetmessages: + msg141187
2011-07-26 21:42:52phepsetmessages: + msg141186
2011-07-26 20:56:47phepsetmessages: + msg141179
2011-07-19 15:00:19eric.araujosetmessages: + msg140673
2011-07-02 13:36:29eric.araujosetversions: + Python 3.3, - Python 3.2
2010-10-03 22:21:55phepsetmessages: + msg117933
2010-09-30 09:01:33eric.araujosetversions: + Python 3.2
nosy: + eric.araujo, orsenthil

messages: + msg117710

components: + Library (Lib), - Extension Modules
stage: needs patch
2010-09-28 12:15:32phepsetmessages: + msg117513
2010-09-28 11:58:14phepcreate