This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Inconsistent unicode repr for fileobject
Type: behavior Stage: patch review
Components: Interpreter Core Versions: Python 2.7
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: gvanrossum, kristjan.jonsson, loewis, terry.reedy
Priority: normal Keywords: needs review, patch

Created on 2009-01-13 09:21 by kristjan.jonsson, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
fileobject.patch kristjan.jonsson, 2009-01-13 09:21
test_file.patch kristjan.jonsson, 2009-01-13 10:02
Messages (17)
msg79726 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-01-13 09:21
The repr for fileobjects is inconsistent.  On windows:
f = file("tmp\\foo", "wb")
f
<open file 'tmp\foo', mode 'wb' at ...>
f = file(u"tmp\\foo", "wb")
f
<open file u'tmp\\foo', mode 'wb' at ...>

For unicode, the filename is a proper "repr", but for a string it is 
just the plain string, enclosed in single quotes.  Note that I consider 
the string case in error.  Like the repr of a list or tuple, it should 
be a string containin the repr of its components.

This leads to problems in the testsuite on vista.

I propose to change this with the patch provided, which always displays 
the repr of the filename.  It may have side effects in the testsuite, I 
will add a separate file for that.
msg79729 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-01-13 10:02
Adding patch for test_file
msg79761 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-01-13 18:38
> This leads to problems in the testsuite on vista.

To which problem specifically?
msg79797 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-01-13 23:30
Turns out it wasn't vista related at all, but related to TESTFN being 
tmp\@test
When this happens, test_file.py fails when testing the repr for a 
unicode file, because the repr contains u'tmp\\@test' whereas for a 
string filename it will contain 'tmp\@test'
msg79905 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-01-15 17:35
Just to clarify, 
Ascii files and unicode files will repr() as
<open file 'tmp\foo', mode 'wb' at ...>
<open file u'tmp\\foo', mode 'wb' at ...>
respectively.  In the former, the backslash isn't escaped, but in the 
latter it is.  The single quotes certainly imply repr, and if this were 
a list, it would repr as:
['tmp\\foo']
and so I think that the ascii syntax really ought to be
<open file 'tmp\\foo', mode 'wb' at ...>


The inconsistency shows up in the testsuite for test_file() when the 
TMPFN is tmp\@test (because test_support didn't succeed in creating a 
temp file @test) when it is checking the repr of a file opened with a 
unicode filename with code that corresponds exactly to code used for an 
string filename.

Any thoughts?
msg79910 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-01-15 18:12
> Any thoughts?

I still hadn't time to reproduce the problem, so I still don't
understand it. What specific failure (file and line number) occurs,
and what specific strings get compared? test_file.py has 600 lines,
and I can't guess which test is failing. (also, you say test_file();
if you really refer to a function - where is that function defined?)
msg79919 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-01-15 22:30
I thought I'd explained it in detail without having to invoke the 
testsuite.  If TEMPFN is "tmp\\@test" you get the following failure:
FAILED (failures=1)
test test_file failed -- Traceback (most recent call last):
  File "D:\pydev\python\trunk\lib\test\test_file.py", line 175, in 
testUnicodeOpen
    self.assert_(repr(f).startswith("<open file u'" + TESTFN))
AssertionError: None

It fails, because print repr(f) will start with:
<open file u'tmp\\@test


Essentially, when generating a repr of a fileobject with a string 
filename, the string filename is interpolated between the single colons.
But when it generates a repr of a fileobject with a unicode filename, 
the unicode name is  escaped and then interpolated.
This results in unicode filename fileobjects having a repr with escaped 
backslashes, while the string filename fileobjects have a repr where the 
backslashes appear unescaped between the colons.

My patch proposes to 
a) make sure that the repr(f) is a string containing the repr of the 
filename (be it unicode, string or whatever)
b) Fix the testsuite so that both this case and the one testing string 
filenames assumes a repr of the filename.
msg79921 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-01-15 22:46
> I thought I'd explained it in detail [...]

Well, so far, you didn't say that it was testUnicodeOpen that
failed. Now that I see that, I can understand the problem.

In your original message, I failed to see a problem. Yes, the repr
is inconsistent - but why is that a problem? (and then, why does
it actually cause real problems in the test suite on Vista - as
it turns out, the inconsistency causes no problems in the testsuite
at all)

> My patch proposes to 
> a) make sure that the repr(f) is a string containing the repr of the 
> filename (be it unicode, string or whatever)

I cannot understand how this contributes to solving the problem. -1.

> b) Fix the testsuite so that both this case and the one testing string 
> filenames assumes a repr of the filename.

As it is clearly the test suite that is broken, we should just fix the
test suite. If the two chunks in the test, only the second chunk is
relevant for fixing the problem.
msg79923 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-01-15 22:55
I think the testsuite just exposed a design fault in python or a bug, if 
you will.  When we by accident had a backslash in the filename, I 
noticed the inconsistency between
'tmp\@test' and u'tmp\\@test'.
One is a valid repr, the other isn't.  It is customary when producing a 
repr() of an object, to have it contain a repr() of the any inner 
object, such as a string.  'tmp\@test' looks like a valid python literal 
but if entered, would produce an incorrect result.

Fixing the testsuite only, will just keep this discrepancy (or incorrect 
string repr maybe) hidden for longer.

It is a different story, maybe, that thee test_file.py hasn't done more 
aggressive testing on filenames with backslashes.  This is far from 
being the only case that fails because of this.  Some is due to 
incorrectly coded tests.  Others are due to surprising python behaviour, 
like this one.
msg79926 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-01-15 23:03
> I think the testsuite just exposed a design fault in python or a bug, if 
> you will.

I disagree. The feature is intentional as-is, and well-designed.

> It is customary when producing a 
> repr() of an object, to have it contain a repr() of the any inner 
> object, such as a string.

Not at all. It is customary to invoke the nested object's repr, in
the hope that the result of repr will be a string that you can pass
to eval(). This is completely different, because there is no hope
that the result of repr() of a file can be eval()ed.

As the repr is a string, it is natural to just concatenate the pieces
of the repr, including the file name string. For Unicode file names,
this natural approach fails, because one needs to encode the string
first. Mark chose to use an escape representation, because this gives
plain ASCII.

> Fixing the testsuite only, will just keep this discrepancy (or incorrect 
> string repr maybe) hidden for longer.

Right. Changing the repr will break existing code, for no good reason.
msg79941 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-01-16 09:10
So you are saying that you see no problem with the repr of a fileobject 
being generated in different ways depending on the fileobject being 
unicode or string?
You see no danger of confusion when one is confonted with
<open file 'foo\bar' ...>
vs.
<open file u'foo\\bar' ...>
You don't think anyone will be surprised when they try to 
eval 'foo\bar' as they can with u'foo\\bar' and get nonsensical results?

I maintain that this is a bug in python that ought to be fixed and not 
hidden by changing the test suite.  If you are going to rely on using 
the repr of the fileobject to extract the filename, you better be able 
to do that in a consistent manner irrespective of the type of the 
underlying filename.
msg79942 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-01-16 09:12
And btw,
This can be fixed in fileobject.c without affecting the testsuite at 
all, since test_file.py has never tested filenames with backslashes 
except by accident.  It is therefore quite unlikely to cause any 
problems.
msg79945 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2009-01-16 10:02
I think path names should be printed with forward slashes on Windows as
well as elsewhere to avoid this type of confusion, and this.
>>> sb
'tmp\\foo'
>>> print(sb)
tmp\foo
(Of course I know why.)

Users have been advised on python-list by various people to enter
literal paths with forward slashes for the same reason.  Printing them
out that way would encourage and reinforce this.
msg79947 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2009-01-16 10:30
That's true, but a it's a different point entirely
You see, this is not a problem with backslashes only:
If you have any kind of control caracter or high bit character in your 
filename, you may be in for trouble, because interpolating that string 
verbatim into the repr can cause uforeseen problems when it is printed 
out.

(and forward slashes are't the same as backslashes on windows.  UNC 
paths (\\my-pc\my-share\...) need backslashes.)
msg79963 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2009-01-16 17:33
I think this is my fault.  I should have used the proper repr() of the
filename in the repr() of a file object from the beginning, then this
wouldn't have been a problem.

I think we should let this rest for Python 2.x (except for fixing the
test suite).  Fortunately this is gone in 3.x for any number of reasons.

Here's my "proof" that we should let it rest.

If nobody were parsing the repr() of file objects, then we could easily
fix this.  However, the reason this was brought up in the first place is
that people *are* parsing file object repr()s.  So now fixing it would
create backwards compatibility problems.  I'd rather live with an
inconsistency (until 2.x is dead) than introduce more backwards
incompatibility into 2.7.
msg79964 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-01-16 17:43
> So you are saying that you see no problem with the repr of a fileobject 
> being generated in different ways depending on the fileobject being 
> unicode or string?

My question is: What is the issue at hand? Is it what the subject says
(inconsistent representation), or is it what your first message says
(problems on Vista)? If the former, I'm opposed to fixing it. If the
latter, I'm in favor of the straight-forward fix to the test suite.
msg79965 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-01-16 17:44
> It is therefore quite unlikely to cause any 
> problems.

That is a false conclusion. The test suite cannot speak
for all the code out there that might break with the
incompatible change that you are proposing.
History
Date User Action Args
2022-04-11 14:56:44adminsetgithub: 49177
2009-01-16 17:44:27loewissetmessages: + msg79965
2009-01-16 17:43:32loewissetmessages: + msg79964
2009-01-16 17:33:27gvanrossumsetstatus: open -> closed
nosy: + gvanrossum
resolution: rejected
messages: + msg79963
keywords: patch, patch, needs review
2009-01-16 10:30:43kristjan.jonssonsetkeywords: patch, patch, needs review
messages: + msg79947
2009-01-16 10:02:44terry.reedysetkeywords: patch, patch, needs review
nosy: + terry.reedy
messages: + msg79945
2009-01-16 09:12:51kristjan.jonssonsetkeywords: patch, patch, needs review
messages: + msg79942
2009-01-16 09:10:41kristjan.jonssonsetkeywords: patch, patch, needs review
messages: + msg79941
2009-01-15 23:03:21loewissetmessages: + msg79926
2009-01-15 22:55:16kristjan.jonssonsetkeywords: patch, patch, needs review
messages: + msg79923
2009-01-15 22:46:11loewissetmessages: + msg79921
2009-01-15 22:30:40kristjan.jonssonsetkeywords: patch, patch, needs review
messages: + msg79919
2009-01-15 18:12:42loewissetmessages: + msg79910
2009-01-15 17:35:09kristjan.jonssonsetkeywords: patch, patch, needs review
messages: + msg79905
2009-01-13 23:30:07kristjan.jonssonsetkeywords: patch, patch, needs review
messages: + msg79797
2009-01-13 18:38:28loewissetnosy: + loewis
messages: + msg79761
2009-01-13 10:02:03kristjan.jonssonsetkeywords: patch, patch, needs review
files: + test_file.patch
messages: + msg79729
2009-01-13 09:21:36kristjan.jonssoncreate