classification
Title: possible characters in temporary file name is too few
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, haypo, jwilk, merwok, ncoghlan, neologix, pitrou, planet36, python-dev, rhettinger
Priority: normal Keywords: easy, needs review, patch

Created on 2011-05-06 05:51 by planet36, last changed 2014-01-30 14:49 by jwilk. This issue is now closed.

Files
File name Uploaded Description Edit
tempfile.py.diff planet36, 2011-05-06 05:51 Patch to revert the character set of temporary file names to [A-Za-z0-9_] review
tempfile.py.diff planet36, 2011-05-10 02:54 Choose 8 random characters rather than 6. review
Messages (17)
msg135261 - (view) Author: Steve Ward (planet36) Date: 2011-05-06 05:51
In this revision <http://hg.python.org/cpython/rev/b6da97930f63>, the possible characters that comprise a temporary file decreased.  It was noted in this thread <http://mail.python.org/pipermail/python-dev/2010-November/105452.html>.  The old character set had more than 24 times as many possibilities! [1]

According to POSIX, the possible characters from which temporary file names should be constructed is [A-Za-z0-9._-].  [2]

The character set should be restored to [A-Za-z0-9_] at the least.  A patch is attached.

The practices of other platforms, libraries, and languages are listed below.  [3-12]



[1]
Old character set: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_
(26+26+10+1) ^ 6
62523502209

Current character set: abcdefghijklmnopqrstuvwxyz0123456789_
(26+10+1) ^ 6
2565726409

62523502209 / 2565726409
24.36873315474378000994



[2] POSIX <http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_276>

3.276 Portable Filename Character Set

The set of characters from which portable filenames are constructed.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z
0 1 2 3 4 5 6 7 8 9 . _ -



[3] int mkstemp(char *template); <http://pubs.opengroup.org/onlinepubs/9699919799/functions/mkstemp.html>
The string in template should look like a filename with six trailing 'X' s; mkstemp() replaces each 'X' with a character from the portable filename character set.



[4] glibc <http://sourceware.org/git/?p=glibc.git;a=blob_plain;f=sysdeps/posix/tempname.c;hb=HEAD>
/* These are the characters used in temporary filenames.  */
static const char letters[] =
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";



[5] gnulib <http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/tempname.c;hb=HEAD>
/* These are the characters used in temporary file names.  */
static const char letters[] =
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";



[6] freebsd libc <http://svnweb.freebsd.org/base/head/lib/libc/stdio/mktemp.c?view=markup>
static const unsigned char padchar[] =
"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";



[7] openbsd libc <http://www.openbsd.org/cgi-bin/cvsweb/src/lib/libc/stdio/mktemp.c?rev=HEAD;content-type=text%2Fplain>
#define TEMPCHARS	"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"



[8] netbsd libc <http://cvsweb.netbsd.org/bsdweb.cgi/~checkout~/src/lib/libc/stdio/gettemp.c?rev=HEAD&content-type=text/plain>
(uses pid)



[9] perl <http://perl5.git.perl.org/perl.git/blob/HEAD:/cpan/File-Temp/Temp.pm>
my @CHARS = (qw/ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
                 a b c d e f g h i j k l m n o p q r s t u v w x y z
                 0 1 2 3 4 5 6 7 8 9 _
               /);



[10] php <http://svn.php.net/viewvc/php/php-src/trunk/main/php_open_temporary_file.c?view=markup>
(if Windows, call GetTempFileName; else if havel mkstemp, call it; else call mktemp)



[11] ruby <http://svn.ruby-lang.org/cgi-bin/viewvc.cgi/trunk/lib/tmpdir.rb?view=markup>
(uses current date, pid, and a random number between 0 and 4294967296 represented in base 36 (lowercase))



[12] Windows <http://msdn.microsoft.com/en-us/library/aa364991.aspx>
<path>\<pre><uuuu>.TMP

Component	Meaning
<path>	Path specified by the lpPathName parameter
<pre>	First three letters of the lpPrefixString string
<uuuu>	Hexadecimal value of uUnique

Only the lower 16 bits of the uUnique parameter are used. This limits GetTempFileName to a maximum of 65,535 unique file names if the lpPathName and lpPrefixString parameters remain the same.
msg135264 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011-05-06 06:31
- How is it an issue? is the number of combinations really important to you?
- Isn't there a greater risk of collisions on a case-insensitive filesystem?
msg135265 - (view) Author: Steve Ward (planet36) Date: 2011-05-06 08:00
It is an issue because...

1) Python falls out of line with most other popular systems and languages.

2) There was never a good reason to decrease the entropy in the first place.  If there was, the reason of case-sensitive files systems was not given.

3) If case-sensitivity was an issue, why wasn't it raised before?  For more than 8 years (2002-08-09 till 2010-11-09) the issue never warranted a change.

The initial commit is here.
http://svn.python.org/view?view=revision&revision=28113
msg135336 - (view) Author: Éric Araujo (merwok) * (Python committer) Date: 2011-05-06 17:15
FWIW: Conformity with popular systems is not a goal.
msg135602 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-05-09 16:05
Python should still work reliably even if the temp directory is on a case-insensitive filesystem. Since that isn't an easy thing to determine, -1 on restoring the uppercase characters to the filenames.

As for it not being fixed previously, it's a rare bug that only affects case-insensitive filesystems. Just because other groups have decided not to care about it (typically due to a heavy bias towards case-sensitive POSIX filesystems), that is no reason for us to ignore it once we're aware of it.

If we really wanted to increase the entropy, better to increase the number of random characters included in the name to 7 or 8 (with each additional character increasing the pool size by a factor of 37). After all, why perpetuate an arbitrary restriction that is due primarily to the difficulty of constructing nice string manipulation interfaces in C?
msg135603 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-05-09 16:09
If you would like more entry, use a longer filename:

>>> (26+10+1) ** 6 > (26+26+10+1) ** 6
False
>>> (26+10+1) ** 7 > (26+26+10+1) ** 6
True
msg135604 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-05-09 16:15
The number of characters in the filename is hardcoded in the guts of tempfile (in _RandomNameSequence.__next__ to be exact). Increasing that to 7 (or even 8) would be a straightforward change at the library level, but it can't be done from user code.
msg135605 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-05-09 16:20
Could someone explain me what's the risk on a case-insensitive filesystem?
Since files are created with O_CREAT|O_EXCL, I don't see where the problem is.
msg135607 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-05-09 16:34
Basically, adding the uppercase letters back in would provide no gain in entropy on a case-insensitive filesystem (as the upper and lower case letters will collide). Increasing the length of the random sequence would provide a consistent gain regardless of filesystem properties.
msg135609 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-05-09 16:37
+1 for increasing the number of random characters to 8
-1 for restoring uppercase letters
msg135611 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-05-09 16:48
Minor clarification: after my first comment, I realised that the original change in the character set didn't actually fix anything, it just propagated the effectively smaller sample space on case-insensitive filesystems to all filesystems. Increasing the number of random characters to 8 will more than compensate for that reduction while retaining the cross-platform consistency in behaviour.
msg135612 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-05-09 16:50
@Nick
I fully agree with you, but my point was that it doesn't make it less safe on case-insensitive filesystems.
Apart from that, it's of course better to increase the length of the random sequence.
msg135615 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-05-09 17:06
Oh, you mean the security risk if the temporary names can be guessed? My recollection is that it is more of a problem for temporary directories than it is for temporary files, since the module can't control how files inside a temp directory get created. It's been a long time since I read anything detailed on the threat models and attack scenarios mkstemp() and friends were designed to handle though, so Google may be a better source of answers on that front.
msg135644 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-05-09 20:49
If you care about security, you should not use the Python random module because it is not cryptographic. Oh oh, ssl doesn't expose RAND_bytes().
msg135646 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-05-09 20:54
> If you care about security, you should not use the Python random
> module because it is not cryptographic. Oh oh, ssl doesn't expose
> RAND_bytes().

Feel free to open a separate issue and even provide a patch!
msg195103 - (view) Author: Roundup Robot (python-dev) Date: 2013-08-13 23:28
New changeset de5077aca668 by Victor Stinner in branch 'default':
Close #12015: The tempfile module now uses a suffix of 8 random characters
http://hg.python.org/cpython/rev/de5077aca668
msg195104 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-08-13 23:35
With 8 lowercase characters, the entropy is 41.7 bits, whereas it is only 35.9 bits for 6 characters with uppercase and lowercase letters.

>>> math.log(26+26+10+1, 2) * 6 # (a-zA-Z0-9_) x 6
35.8636795409995
>>> math.log(26+10+1, 2) * 6 # (a-z0-9_) x 6
31.256720193773702
>>> math.log(26+10+1, 2) * 8 # (a-z0-9_) x 8
41.6756269250316

My changeset improves the entropy, it is now higher than with the old charset.

I don't know if it is enough or not to be safe. systemd creates a temporary directory per service. Linux 3.11 will add a new O_TMPFILE to open() which allow to create a file with no name. Using the flag should help to workaround the race condition attack. See #18673 for O_TMPFILE.
History
Date User Action Args
2014-01-30 14:49:11jwilksetnosy: + jwilk
2013-08-13 23:35:48hayposetmessages: + msg195104
2013-08-13 23:28:40python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg195103

resolution: fixed
stage: patch review -> resolved
2013-08-03 13:38:08neologixsetkeywords: + easy, needs review
stage: patch review
type: enhancement
versions: + Python 3.4, - Python 3.3
2011-05-10 02:54:07planet36setfiles: + tempfile.py.diff
2011-05-09 20:54:15pitrousetmessages: + msg135646
2011-05-09 20:49:22hayposetmessages: + msg135644
2011-05-09 17:06:36ncoghlansetmessages: + msg135615
2011-05-09 16:50:08neologixsetmessages: + msg135612
2011-05-09 16:48:15ncoghlansetmessages: + msg135611
2011-05-09 16:37:34rhettingersetmessages: + msg135609
2011-05-09 16:34:58ncoghlansetmessages: + msg135607
2011-05-09 16:20:52neologixsetnosy: + neologix
messages: + msg135605
2011-05-09 16:15:24ncoghlansetmessages: + msg135604
2011-05-09 16:09:03hayposetnosy: + haypo
messages: + msg135603
2011-05-09 16:05:26ncoghlansetmessages: + msg135602
2011-05-06 17:15:59merwoksetnosy: + ncoghlan, pitrou
2011-05-06 17:15:33merwoksetnosy: + merwok

messages: + msg135336
versions: - Python 3.1, Python 3.2, Python 3.4
2011-05-06 08:13:03amaury.forgeotdarcsetnosy: + rhettinger
2011-05-06 08:00:27planet36setmessages: + msg135265
2011-05-06 06:31:22amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg135264
2011-05-06 05:51:46planet36create