Author vstinner
Recipients SebastianGPedersen, benjamin.peterson, corona10, eric.smith, ezio.melotti, giampaolo.rodola, inada.naoki, lemburg, vstinner
Date 2020-01-27.07:59:03
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1580111943.97.0.376202711348.issue39380@roundup.psfhosted.org>
In-reply-to
Content
I'm in favor of changing the default encoding to UTF-8, but it requires good documentation, especially to provide a solution working on Python 3.8 and 3.9 to change the encoding (see below).

--

The encoding is used to encode commands with the FTP server and decode the server replies. I expect that most replies are basically letters, digits and spaces. I guess that the most problematic commands are:

* send user and password
* decode filenames of LIST command reply
* encode filename in STOR command

I expect that the original FTP protocol doesn't specify any encoding and so that FTP server implementations took some freedom. I would not be surprised to use ANSI code pages used on servers running on Windows.

Currently, encoding is a class attribute: it's not convenient to override it depending on the host. I would prefer to have a new parameter for the constructor.

Giampaolo:
> some servers may enable UTF-8 only if client explicitly sends "OPTS UTF-8 ON" first, but that is based on an draft RFC. Server implementors usually treat this command as a no-op and simply assume UTF-8 as the default.
> With that said, I am -1 about implementing logic based on FEAT/OPTS: that should be done before login, and at that point some servers may erroneously reject any command other than USER, PASS and ACCT. 

Oh. In this case, always send "OPTS UTF-8 ON" just after the socket is connected sounds like a bad idea.


Sebastian:
> Since RFC 2640, the industry standard within FTP Clients is UTF-8 (see e.g. FileZilla here: https://wiki.filezilla-project.org/Character_Encoding, or WinSCP here: https://winscp.net/eng/docs/faq_utf8).

"Internationalization of the File Transfer Protocol" was published in 1999. It recommends the UTF-8. Following a RFC recommendation is good argument to change the default encoding to UTF-8.
https://tools.ietf.org/html/rfc2640


Giampaolo:
> Personally I think it makes more sense to just use UTF-8 without going through a deprecation period

I concur. Deprecation is usually used for features which are going to be removed (module, function or function parameter). Here it's just about a default parameter value. I expect to have encoding="utf-8" default in the constructor.

The annoying part is that Python 3.8 only has a class attribute. The simplest option seems to be creating a FTP object, modify its encoding attribute and *then* logs in. Another options is to subclass the FTP class. IMO the worst is to modify ftplib.FTP.encoding attribute (monkey patch the module).

I expect that most users use username, password and filenames encodable to ASCII and so will not notify the change to UTF-8. We can document a solution working on all Python versions to use different encoding name.
History
Date User Action Args
2020-01-27 07:59:04vstinnersetrecipients: + vstinner, lemburg, eric.smith, giampaolo.rodola, benjamin.peterson, ezio.melotti, inada.naoki, corona10, SebastianGPedersen
2020-01-27 07:59:03vstinnersetmessageid: <1580111943.97.0.376202711348.issue39380@roundup.psfhosted.org>
2020-01-27 07:59:03vstinnerlinkissue39380 messages
2020-01-27 07:59:03vstinnercreate