classification
Title: Add MLSD command support to ftplib
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: giampaolo.rodola Nosy List: George.Dhoore, SilentGhost, eric.smith, giampaolo.rodola, pitrou, python-dev
Priority: normal Keywords: patch

Created on 2011-01-30 23:37 by giampaolo.rodola, last changed 2011-05-07 14:07 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
ftplib_mlsd.patch giampaolo.rodola, 2011-01-30 23:37 review
issue11072.diff SilentGhost, 2011-03-08 12:17 review
ftplib_mlsd_v2.patch giampaolo.rodola, 2011-03-09 22:35 review
ftplib_mlsd_v3.patch giampaolo.rodola, 2011-03-09 23:21 review
Messages (18)
msg127562 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2011-01-30 23:37
From RFC-3659:

   The MLST and MLSD commands are intended to standardize the file and
   directory information returned by the server-FTP process.  These
   commands differ from the LIST command in that the format of the
   replies is strictly defined although extensible.

The patch in attachment adds support for MLSD command.
This should ease the development of ftp clients which are forced to parse un-standardized LIST responses via dir() or retrlines() methods to obtain meaningful data for the directory listing.

Example:


>>> import ftplib
>>> from pprint import pprint as pp
>>> f = ftplib.FTP()
>>> f.connect("localhost")
>>> f.login("anonymous")
>>> ls = f.mlsd()
>>> pp(ls)
 {'modify': 20100814164724,
  'name': 'svnmerge.py',
  'perm': 'r',
  'size': 90850,
  'type': 'file',
  'unique': '80718b568'},
 {'modify': 20101207185033,
  'name': 'README',
  'perm': 'r',
  'size': 53731,
  'type': 'file',
  'unique': '80718aafe'},
 {'modify': 20100417183215,
  'name': 'install-sh',
  'perm': 'r',
  'size': 7122,
  'type': 'file',
  'unique': '80718b2d2'},
 {'modify': 20110129210053,
  'name': 'Include',
  'perm': 'el',
  'size': 4096,
  'type': 'dir',
  'unique': '8071a2bc4'}]
>>>
msg130300 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-03-08 00:13
Why the callback option?
Also, the tests don't appear to check the return value.
msg130302 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2011-03-08 00:54
I agree that the callback isn't needed, and it reflects the older coding style of much of the library (such as in retrlines). Instead, I'd make this a generator, yielding each of the dicts. (Actually in some ideal rewrite of ftplib, the whole callback "feature" used by retrlines would go away except as a compatibility feature, and internally everything would use generators instead of callbacks.)

Aren't you modifying the state on the server (via "OPTS MLST"), and then if you make a subsequent call without specifying "facts" you'll be using the value of "facts" from the previous call to MLSD? I don't think that's desirable, but short of calling "OPTS MLST" every time (possibly with the results of an initial FEAT) there's no way around it.

Just today I saw a filename with "; " in the name. I think you want to use .partition(' ') isolate the facts from the filename, and add a test with such a filename.

I don't like the "isdigit" test. Shouldn't this decision be left to the caller? What if a fact happens to look like an integer some of the time, but not always? Maybe "unique" is a hex string. I don't think you'd want it converted to an int on the occasions where it was all decimal digits, but a string otherwise. Also look at your "modify" facts. Those are not very useful as ints.

I don't think you should invent a pseudo-fact "name", in case the standard ever uses that. I'd prefer if you yielded tuples of (filename, fact-dict).

MLSD_DATA has newlines in it, so you get "\r\n\n" sequences in your test data.

If a fact=value string does not have an '=', you silently ignore it. I'd rather this raise an exception and not pass silently. It's not compliant. You should have a test for this.

It's possible for a value to have an '=' in it, so you don't want to use .split('='), you should use .partition('='). You should add such a fact to the test data.

Thanks for working on this. It will be a great addition to ftplib.
msg130327 - (view) Author: SilentGhost (SilentGhost) Date: 2011-03-08 12:17
Here is a patch incorporating some of the changes proposed by Eric:

* drop callback, return generator of (filename, fact-dict)

> Aren't you modifying the state on the server (via "OPTS MLST"), and then if you make a subsequent call without specifying "facts" you'll be using the value of "facts" from the previous call to MLSD? I don't think that's desirable, but short of calling "OPTS MLST" every time (possibly with the results of an initial FEAT) there's no way around it.

This is the behaviour according to RFC. MLSD will return the set of facts, until a new OPTS MLST issued.

> I don't like the "isdigit" test. Shouldn't this decision be left to the caller? What if a fact happens to look like an integer some of the time, but not always? Maybe "unique" is a hex string. I don't think you'd want it converted to an int on the occasions where it was all decimal digits, but a string otherwise. Also look at your "modify" facts. Those are not very useful as ints.

Drop the isdigit test: some value such as timestamps ('modify') might be "digits" for one files and would remain strings for the others.

> If a fact=value string does not have an '=', you silently ignore it. I'd rather this raise an exception and not pass silently. It's not compliant. You should have a test for this.

I don't think it should raise an error, i'd rather just pass it to the caller. There is a wide variety of possible non-compliant responses. For example there is a rigid format for time stamps, do we have to check for that?

One thing, that my patch doesn't do at the moment, but I think would be useful is to normalise all fact names to lower case. What do you think?

I hope that MLSD_DATA now covers wider range of cases (it's from http://tools.ietf.org/html/rfc3659#section-7.7.4). Note that server MUST NOT return facts that weren't requested.

What would be the return values check here?
msg130471 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2011-03-09 22:35
Thanks for the great review Eric.
Patch in attachment provides the following changes:

- return a generator object
- remove callback parameter
- each yielded entry is a (name, {...}) tuple
- fix for ";" in file name
- fix for " " in file name
- fix for "=" in fact value
- raise on missing "=" in fact
- no longer converts numbers in integers
- all dictionary keys (fact names) are lower-cased
- extended test suite covering a lot of cases
- no new lines in MLSD_DATA
- updated doc with deprecation warnings for dir() and nslt() methods


> Aren't you modifying the state on the server (via "OPTS MLST"), 
> and then if you make a subsequent call without specifying "facts"
> you'll be using the value of "facts" from the previous call to MLSD? 

Correct. This is how OPTS/MLSD are supposed to work.
Not sure whether we can or *should* reset initial default facts.
I think we shouldn't.
msg130474 - (view) Author: SilentGhost (SilentGhost) Date: 2011-03-09 22:58
facts_found.strip(" ").rstrip(";")

strip is redundant since facts_found is a first element of partitioning by the same string. rstrip is wrong since you're potentially deleting more than one character (there is no test for that).

In your test you're checking whether returned facts contain every requested fact, this is not guaranteed by RFC. Also, not sure what the last for statement is supposed to mean. Is it a typo?
msg130476 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2011-03-09 23:21
You're right about r/strip(), thanks (new patch in attachment).

> In your test you're checking whether returned facts 
> contain every requested fact, this is not guaranteed by RFC.

Yes, but we're using a dummy FTP server returning static MLSD_DATA, so this is not important.

> Also, not sure what the last for statement 
> is supposed to mean. Is it a typo?

It makes sure whether in case of directory empty (see: no data returned) we don't enter in the "for" block.
Probably it's not even necessary because this is already tested by:
self.assertRaises(StopIteration, next, self.client.mlsd())
...but... whathever.
msg130500 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2011-03-10 11:17
I'll give this a proper review in the next day or so (busy at PyCon).

Despite the fact that I typically hate changing values returned by the server, I agree on case-folding the fact names to lowercase upon reading them. The RFC clearly states they're case insensitive: "Fact names are case-insensitive.  Size, size, SIZE, and SiZe are the same fact." You should have the facts in MLSD_DATA be mixed case, to ensure they're being lowercased correctly.

I agree that there's nothing that can be done in the case where no facts are specified: the spec clearly says just use the values we've previously set. Caller beware.
msg130501 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2011-03-10 11:26
> You should have the facts in MLSD_DATA be mixed case, to ensure 
> they're being lowercased correctly.

This is already tested by:

+        # case sensitiveness
+        set_data('Type=type;TyPe=perm;UNIQUE=unique; name\r\n')
+        _name, facts = next(self.client.mlsd())
+        [self.assertTrue(x.islower()) for x in facts.keys()]
msg130502 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2011-03-10 11:31
So it is. I told you I hadn't done a proper review! I was mainly trying to say I agree with lowercasing.

I'll shut up until I can read the whole patch.
msg135145 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2011-05-04 18:05
Eric, any further comments about the patch?
Can we go on and commit it?
msg135346 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2011-05-06 17:41
Yes, I think this should be committed. I think the API is reasonable, which is the primary concern. If there are implementation bugs, they can be addressed as they're found.
msg135348 - (view) Author: Roundup Robot (python-dev) Date: 2011-05-06 17:49
New changeset 56bce38b274f by Giampaolo Rodola' in branch 'default':
Issue #11072: added MLSD command (RFC-3659) support to ftplib.
http://hg.python.org/cpython/rev/56bce38b274f
msg135350 - (view) Author: SilentGhost (SilentGhost) Date: 2011-05-06 17:55
Has something prevent you from implementing suggestion provided in my review?
msg135351 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2011-05-06 18:11
Which ones in particular? msg130474?
msg135356 - (view) Author: SilentGhost (SilentGhost) Date: 2011-05-06 18:24
a review on rietveld http://bugs.python.org/review/11072/show
which as I confirmed with you specifically at #python-dev you received a notification of.
msg135358 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2011-05-06 18:48
In my view, aside from the documentation note they're all minor/styling-related changes but feel free to provide a patch if you want to and I'll commit it.
msg135463 - (view) Author: Roundup Robot (python-dev) Date: 2011-05-07 14:07
New changeset 153bd8fc22c7 by Giampaolo Rodola' in branch 'default':
#11072- applying http://bugs.python.org/review/11072/show suggestions
http://hg.python.org/cpython/rev/153bd8fc22c7
History
Date User Action Args
2011-05-07 14:07:04python-devsetmessages: + msg135463
2011-05-06 18:48:07giampaolo.rodolasetmessages: + msg135358
2011-05-06 18:24:42SilentGhostsetmessages: + msg135356
2011-05-06 18:11:13giampaolo.rodolasetmessages: + msg135351
2011-05-06 17:55:35SilentGhostsetmessages: + msg135350
2011-05-06 17:51:01giampaolo.rodolasetstatus: open -> closed
resolution: fixed
assignee: giampaolo.rodola
components: + Library (Lib)
type: enhancement
stage: resolved
2011-05-06 17:49:14python-devsetnosy: + python-dev
messages: + msg135348
2011-05-06 17:41:56eric.smithsetmessages: + msg135346
2011-05-04 18:05:38giampaolo.rodolasetmessages: + msg135145
2011-03-10 11:31:27eric.smithsetnosy: pitrou, eric.smith, giampaolo.rodola, SilentGhost, George.Dhoore
messages: + msg130502
2011-03-10 11:26:30giampaolo.rodolasetnosy: pitrou, eric.smith, giampaolo.rodola, SilentGhost, George.Dhoore
messages: + msg130501
2011-03-10 11:17:28eric.smithsetnosy: pitrou, eric.smith, giampaolo.rodola, SilentGhost, George.Dhoore
messages: + msg130500
2011-03-09 23:21:13giampaolo.rodolasetfiles: + ftplib_mlsd_v3.patch
nosy: pitrou, eric.smith, giampaolo.rodola, SilentGhost, George.Dhoore
messages: + msg130476
2011-03-09 22:58:30SilentGhostsetnosy: pitrou, eric.smith, giampaolo.rodola, SilentGhost, George.Dhoore
messages: + msg130474
2011-03-09 22:35:45giampaolo.rodolasetfiles: + ftplib_mlsd_v2.patch
nosy: pitrou, eric.smith, giampaolo.rodola, SilentGhost, George.Dhoore
messages: + msg130471
2011-03-08 12:17:20SilentGhostsetfiles: + issue11072.diff
nosy: pitrou, eric.smith, giampaolo.rodola, SilentGhost, George.Dhoore
messages: + msg130327
2011-03-08 01:09:58SilentGhostsetnosy: + SilentGhost
2011-03-08 00:54:30eric.smithsetnosy: pitrou, eric.smith, giampaolo.rodola, George.Dhoore
messages: + msg130302
2011-03-08 00:38:01George.Dhooresetnosy: + George.Dhoore
2011-03-08 00:13:45pitrousetnosy: pitrou, eric.smith, giampaolo.rodola
messages: + msg130300
2011-01-31 02:18:05eric.smithsetnosy: + eric.smith
2011-01-30 23:37:29giampaolo.rodolacreate