msg79726 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) *  |
Date: 2009-01-13 09:21 |
The repr for fileobjects is inconsistent. On windows:
f = file("tmp\\foo", "wb")
f
<open file 'tmp\foo', mode 'wb' at ...>
f = file(u"tmp\\foo", "wb")
f
<open file u'tmp\\foo', mode 'wb' at ...>
For unicode, the filename is a proper "repr", but for a string it is
just the plain string, enclosed in single quotes. Note that I consider
the string case in error. Like the repr of a list or tuple, it should
be a string containin the repr of its components.
This leads to problems in the testsuite on vista.
I propose to change this with the patch provided, which always displays
the repr of the filename. It may have side effects in the testsuite, I
will add a separate file for that.
|
msg79729 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) *  |
Date: 2009-01-13 10:02 |
Adding patch for test_file
|
msg79761 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2009-01-13 18:38 |
> This leads to problems in the testsuite on vista.
To which problem specifically?
|
msg79797 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) *  |
Date: 2009-01-13 23:30 |
Turns out it wasn't vista related at all, but related to TESTFN being
tmp\@test
When this happens, test_file.py fails when testing the repr for a
unicode file, because the repr contains u'tmp\\@test' whereas for a
string filename it will contain 'tmp\@test'
|
msg79905 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) *  |
Date: 2009-01-15 17:35 |
Just to clarify,
Ascii files and unicode files will repr() as
<open file 'tmp\foo', mode 'wb' at ...>
<open file u'tmp\\foo', mode 'wb' at ...>
respectively. In the former, the backslash isn't escaped, but in the
latter it is. The single quotes certainly imply repr, and if this were
a list, it would repr as:
['tmp\\foo']
and so I think that the ascii syntax really ought to be
<open file 'tmp\\foo', mode 'wb' at ...>
The inconsistency shows up in the testsuite for test_file() when the
TMPFN is tmp\@test (because test_support didn't succeed in creating a
temp file @test) when it is checking the repr of a file opened with a
unicode filename with code that corresponds exactly to code used for an
string filename.
Any thoughts?
|
msg79910 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2009-01-15 18:12 |
> Any thoughts?
I still hadn't time to reproduce the problem, so I still don't
understand it. What specific failure (file and line number) occurs,
and what specific strings get compared? test_file.py has 600 lines,
and I can't guess which test is failing. (also, you say test_file();
if you really refer to a function - where is that function defined?)
|
msg79919 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) *  |
Date: 2009-01-15 22:30 |
I thought I'd explained it in detail without having to invoke the
testsuite. If TEMPFN is "tmp\\@test" you get the following failure:
FAILED (failures=1)
test test_file failed -- Traceback (most recent call last):
File "D:\pydev\python\trunk\lib\test\test_file.py", line 175, in
testUnicodeOpen
self.assert_(repr(f).startswith("<open file u'" + TESTFN))
AssertionError: None
It fails, because print repr(f) will start with:
<open file u'tmp\\@test
Essentially, when generating a repr of a fileobject with a string
filename, the string filename is interpolated between the single colons.
But when it generates a repr of a fileobject with a unicode filename,
the unicode name is escaped and then interpolated.
This results in unicode filename fileobjects having a repr with escaped
backslashes, while the string filename fileobjects have a repr where the
backslashes appear unescaped between the colons.
My patch proposes to
a) make sure that the repr(f) is a string containing the repr of the
filename (be it unicode, string or whatever)
b) Fix the testsuite so that both this case and the one testing string
filenames assumes a repr of the filename.
|
msg79921 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2009-01-15 22:46 |
> I thought I'd explained it in detail [...]
Well, so far, you didn't say that it was testUnicodeOpen that
failed. Now that I see that, I can understand the problem.
In your original message, I failed to see a problem. Yes, the repr
is inconsistent - but why is that a problem? (and then, why does
it actually cause real problems in the test suite on Vista - as
it turns out, the inconsistency causes no problems in the testsuite
at all)
> My patch proposes to
> a) make sure that the repr(f) is a string containing the repr of the
> filename (be it unicode, string or whatever)
I cannot understand how this contributes to solving the problem. -1.
> b) Fix the testsuite so that both this case and the one testing string
> filenames assumes a repr of the filename.
As it is clearly the test suite that is broken, we should just fix the
test suite. If the two chunks in the test, only the second chunk is
relevant for fixing the problem.
|
msg79923 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) *  |
Date: 2009-01-15 22:55 |
I think the testsuite just exposed a design fault in python or a bug, if
you will. When we by accident had a backslash in the filename, I
noticed the inconsistency between
'tmp\@test' and u'tmp\\@test'.
One is a valid repr, the other isn't. It is customary when producing a
repr() of an object, to have it contain a repr() of the any inner
object, such as a string. 'tmp\@test' looks like a valid python literal
but if entered, would produce an incorrect result.
Fixing the testsuite only, will just keep this discrepancy (or incorrect
string repr maybe) hidden for longer.
It is a different story, maybe, that thee test_file.py hasn't done more
aggressive testing on filenames with backslashes. This is far from
being the only case that fails because of this. Some is due to
incorrectly coded tests. Others are due to surprising python behaviour,
like this one.
|
msg79926 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2009-01-15 23:03 |
> I think the testsuite just exposed a design fault in python or a bug, if
> you will.
I disagree. The feature is intentional as-is, and well-designed.
> It is customary when producing a
> repr() of an object, to have it contain a repr() of the any inner
> object, such as a string.
Not at all. It is customary to invoke the nested object's repr, in
the hope that the result of repr will be a string that you can pass
to eval(). This is completely different, because there is no hope
that the result of repr() of a file can be eval()ed.
As the repr is a string, it is natural to just concatenate the pieces
of the repr, including the file name string. For Unicode file names,
this natural approach fails, because one needs to encode the string
first. Mark chose to use an escape representation, because this gives
plain ASCII.
> Fixing the testsuite only, will just keep this discrepancy (or incorrect
> string repr maybe) hidden for longer.
Right. Changing the repr will break existing code, for no good reason.
|
msg79941 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) *  |
Date: 2009-01-16 09:10 |
So you are saying that you see no problem with the repr of a fileobject
being generated in different ways depending on the fileobject being
unicode or string?
You see no danger of confusion when one is confonted with
<open file 'foo\bar' ...>
vs.
<open file u'foo\\bar' ...>
You don't think anyone will be surprised when they try to
eval 'foo\bar' as they can with u'foo\\bar' and get nonsensical results?
I maintain that this is a bug in python that ought to be fixed and not
hidden by changing the test suite. If you are going to rely on using
the repr of the fileobject to extract the filename, you better be able
to do that in a consistent manner irrespective of the type of the
underlying filename.
|
msg79942 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) *  |
Date: 2009-01-16 09:12 |
And btw,
This can be fixed in fileobject.c without affecting the testsuite at
all, since test_file.py has never tested filenames with backslashes
except by accident. It is therefore quite unlikely to cause any
problems.
|
msg79945 - (view) |
Author: Terry J. Reedy (terry.reedy) *  |
Date: 2009-01-16 10:02 |
I think path names should be printed with forward slashes on Windows as
well as elsewhere to avoid this type of confusion, and this.
>>> sb
'tmp\\foo'
>>> print(sb)
tmp\foo
(Of course I know why.)
Users have been advised on python-list by various people to enter
literal paths with forward slashes for the same reason. Printing them
out that way would encourage and reinforce this.
|
msg79947 - (view) |
Author: Kristján Valur Jónsson (kristjan.jonsson) *  |
Date: 2009-01-16 10:30 |
That's true, but a it's a different point entirely
You see, this is not a problem with backslashes only:
If you have any kind of control caracter or high bit character in your
filename, you may be in for trouble, because interpolating that string
verbatim into the repr can cause uforeseen problems when it is printed
out.
(and forward slashes are't the same as backslashes on windows. UNC
paths (\\my-pc\my-share\...) need backslashes.)
|
msg79963 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2009-01-16 17:33 |
I think this is my fault. I should have used the proper repr() of the
filename in the repr() of a file object from the beginning, then this
wouldn't have been a problem.
I think we should let this rest for Python 2.x (except for fixing the
test suite). Fortunately this is gone in 3.x for any number of reasons.
Here's my "proof" that we should let it rest.
If nobody were parsing the repr() of file objects, then we could easily
fix this. However, the reason this was brought up in the first place is
that people *are* parsing file object repr()s. So now fixing it would
create backwards compatibility problems. I'd rather live with an
inconsistency (until 2.x is dead) than introduce more backwards
incompatibility into 2.7.
|
msg79964 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2009-01-16 17:43 |
> So you are saying that you see no problem with the repr of a fileobject
> being generated in different ways depending on the fileobject being
> unicode or string?
My question is: What is the issue at hand? Is it what the subject says
(inconsistent representation), or is it what your first message says
(problems on Vista)? If the former, I'm opposed to fixing it. If the
latter, I'm in favor of the straight-forward fix to the test suite.
|
msg79965 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2009-01-16 17:44 |
> It is therefore quite unlikely to cause any
> problems.
That is a false conclusion. The test suite cannot speak
for all the code out there that might break with the
incompatible change that you are proposing.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:44 | admin | set | github: 49177 |
2009-01-16 17:44:27 | loewis | set | messages:
+ msg79965 |
2009-01-16 17:43:32 | loewis | set | messages:
+ msg79964 |
2009-01-16 17:33:27 | gvanrossum | set | status: open -> closed nosy:
+ gvanrossum resolution: rejected messages:
+ msg79963 keywords:
patch, patch, needs review |
2009-01-16 10:30:43 | kristjan.jonsson | set | keywords:
patch, patch, needs review messages:
+ msg79947 |
2009-01-16 10:02:44 | terry.reedy | set | keywords:
patch, patch, needs review nosy:
+ terry.reedy messages:
+ msg79945 |
2009-01-16 09:12:51 | kristjan.jonsson | set | keywords:
patch, patch, needs review messages:
+ msg79942 |
2009-01-16 09:10:41 | kristjan.jonsson | set | keywords:
patch, patch, needs review messages:
+ msg79941 |
2009-01-15 23:03:21 | loewis | set | messages:
+ msg79926 |
2009-01-15 22:55:16 | kristjan.jonsson | set | keywords:
patch, patch, needs review messages:
+ msg79923 |
2009-01-15 22:46:11 | loewis | set | messages:
+ msg79921 |
2009-01-15 22:30:40 | kristjan.jonsson | set | keywords:
patch, patch, needs review messages:
+ msg79919 |
2009-01-15 18:12:42 | loewis | set | messages:
+ msg79910 |
2009-01-15 17:35:09 | kristjan.jonsson | set | keywords:
patch, patch, needs review messages:
+ msg79905 |
2009-01-13 23:30:07 | kristjan.jonsson | set | keywords:
patch, patch, needs review messages:
+ msg79797 |
2009-01-13 18:38:28 | loewis | set | nosy:
+ loewis messages:
+ msg79761 |
2009-01-13 10:02:03 | kristjan.jonsson | set | keywords:
patch, patch, needs review files:
+ test_file.patch messages:
+ msg79729 |
2009-01-13 09:21:36 | kristjan.jonsson | create | |