classification
Title: Deprecate usage of the Windows ANSI API in the nt module
Type: Stage:
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: flox, haypo, loewis, mhammond, python-dev, santa4nt, sbt
Priority: normal Keywords: patch

Created on 2011-11-09 00:10 by haypo, last changed 2011-11-16 22:44 by haypo. This issue is now closed.

Files
File name Uploaded Description Edit
deprecate_win_bytes_api.patch haypo, 2011-11-09 00:10 review
deprecate_win_bytes_api-2.patch haypo, 2011-11-09 20:41 review
Messages (17)
msg147323 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-11-09 00:10
Attached patch deprecates the Windows ANSI API (bytes API) in the nt module. Use Unicode filenames instead of bytes filenames to not depend on the ANSI code page anymore and to support any Unicode filename.

The patch changes also os.link(), os.rename() and os.symlink() to not accept two filenames of different types: require two Unicode filenames or two bytes filenames. It is an expected change, it did it to simplify the source code. I change it if necessary.
msg147325 - (view) Author: Roundup Robot (python-dev) Date: 2011-11-09 00:12
New changeset 6bf07db23445 by Victor Stinner in branch 'default':
Issue #13374: Use Unicode filenames instead of bytes filenames
http://hg.python.org/cpython/rev/6bf07db23445
msg147326 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-11-09 00:20
The patch deprecates bytes filenames for the following functions:

nt._getfullpathname
nt._isdir
os.access
os.chdir
os.chmod
os.link
os.listdir
os.lstat
os.mkdir
os.open
os.rename
os.rmdir
os.stat
os.symlink
os.unlink
os.utime

Oh, I forgot a test for os.open(bytes).
msg147327 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-11-09 00:30
Functions like os.execv() or os.readlink() are not deprecated because the underlying C function really uses a bytes API (execv and readlink).
msg147356 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2011-11-09 15:51
> Functions like os.execv() or os.readlink() are not deprecated because 
> the underlying C function really uses a bytes API (execv and readlink).

Probably os.execv() should be implemented on Windows with _wexecv() instead of _execv().  Likewise for other functions which have "wide" versions.  Or maybe it wouldn't be worth the effort, since it would mean writing separate Windows implementations.

I don't know what you mean about os.readlink() though: the Windows implementation uses CreateFileW() and DeviceIoControl().
msg147357 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-11-09 16:36
> Probably os.execv() should be implemented on Windows with _wexecv() instead
> of _execv(). 

That's a different story. Would you like to implement it? If yes, please open a 
new issue.

> I don't know what you mean about os.readlink() though: the Windows
> implementation uses CreateFileW() and DeviceIoControl().

Oops, you are right. The Windows implement only accepts Unicode, so no 
deprecation warning is needed here.
msg147371 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-11-09 20:41
Updated patch:

 * os.rename(), os.symlink(), os.link() accept (bytes, str) and (str, bytes) again
 * ensure that the warning is emited after parsing arguments, not before (to not emit a warning if an int is passed instead of bytes or str)
 * add a test on os.open()
msg147372 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-11-09 20:53
> Probably os.execv() should be implemented on Windows with _wexecv()
> instead of _execv().  Likewise for other functions which have "wide"
> versions.  Or maybe it wouldn't be worth the effort, since it would
> mean writing separate Windows implementations.

Writing separate Windows versions has a long tradition in posixmodule.c,
so in principle it's fine. It still may not be worth the effort since
the function is deprecated in favor of the subprocess module. However,
if code was contributed in that direction, we would likely accept it.
msg147373 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-11-09 21:04
deprecate_win_bytes_api-2.patch:
 * test_os.py: catch_warning() should be moved into test_link_bytes()
 * the change on Py_GetFinalPathNameByHandleA may be done in another commit
msg147529 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2011-11-12 21:36
I notice that the patch changes rename() and link() to use
win32_decode_filename() to coerce the filename to unicode before using
the "wide" win32 api.  (Previously, rename() first tried the wide api,
falling back to narrow if that failed; link() used wide if the args were
both unicode, narrow otherwise.  Some other functions like symlink()
already only use the wide api.)

Is this approach of coercing to unicode and only using the wide api
"blessed"?  I certainly think it should be.  If so then one can get
rid lots windows specific code.

And are we able to assume that on Windows we have access to wide libc
functions?  _wcsicmp(), _snwprintf(), _wputenv() are all used already,
so I guess we already make that assumption.  It looks like a lot of the
windows specific code attempts to reimplement basic libc functions using
the win32 api just to support unicode - presumably there was a time when
we could not assume that wide libc functions would be available.  Other functions like execv() and spawnv() were never given unicode support.
msg147537 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-11-13 00:16
> Is this approach of coercing to unicode and only using the wide api
> "blessed"?

It's not. If people use byte strings, they specifically ask for what
they get; Python shouldn't second-guess the data types.

> I certainly think it should be.  If so then one can get
> rid lots windows specific code.

How so? This entire handling of file names is windows specific;
dealing with different file name data types doesn't make it more
windows specific than it already is.

> And are we able to assume that on Windows we have access to wide libc
> functions?

Yes, but Python should avoid using them.

> _wcsicmp(), _snwprintf(), _wputenv() are all used already,
> so I guess we already make that assumption.  It looks like a lot of the
> windows specific code attempts to reimplement basic libc functions using
> the win32 api just to support unicode - presumably there was a time when
> we could not assume that wide libc functions would be available.

No:
a) we try to get rid of MS libc as much as possible. Ideally, some
   future version of Python will not rely on libc at all for Windows.
   If Microsoft had chosen to make the C library a system API, this
   we would happily use it. Alas, they chose to make it an API of their
   compiler instead, so we really shouldn't use it.
b) the wide libc functions assume a 16-bit wchar_t type. This is not a
   good match for Python's unicode data type, which readily supports
   32-bit characters.
msg147582 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-11-14 01:09
> I notice that the patch changes rename() and link() to use
> win32_decode_filename() to coerce the filename to unicode before using
> the "wide" win32 api.

Well, I did that to simplify the source code.

> (Previously, rename() first tried the wide api,
> falling back to narrow if that failed; link() used wide if the args were
> both unicode, narrow otherwise.  Some other functions like symlink()
> already only use the wide api.)

I can change my patch to mimick the previous behaviour: try Unicode-Unicode, 
or fall back to encoding both arguments to the filesystem encoding.

> Is this approach of coercing to unicode and only using the wide api
> "blessed"?  I certainly think it should be.  If so then one can get
> rid lots windows specific code.

It was already discussed before to drop the bytes API to decode Unicode 
filenames in Python and only use the Unicode Windows API. There is no consensus 
on this topic: the statut is that the bytes API is kept but deprecated. bytes 
filenames will continue to use the bytes Windows API.
msg147709 - (view) Author: Roundup Robot (python-dev) Date: 2011-11-15 21:25
New changeset d42811b93357 by Victor Stinner in branch 'default':
Issue #13374: The Windows bytes API has been deprecated in the os module. Use
http://hg.python.org/cpython/rev/d42811b93357
msg147710 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2011-11-15 21:51
IIUC, it means that the library/application should not use the bytes API if it intends to be supported on major platforms.
msg147713 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-11-15 22:14
> IIUC, it means that the library/application should not use the bytes API if it intends to be supported on major platforms.

I think you misunderstand; it does not literally mean that. Instead, it
means that the library/application either must not use the bytes API at
all, or else make use of it conditional on non-Windows systems (i.e.
special-case Windows).
msg147731 - (view) Author: Roundup Robot (python-dev) Date: 2011-11-15 23:33
New changeset afc716e463a1 by Victor Stinner in branch 'default':
Issue #13374: Skip deprecation tests for os.symlink() on Windows XP
http://hg.python.org/cpython/rev/afc716e463a1
msg147775 - (view) Author: Roundup Robot (python-dev) Date: 2011-11-16 22:42
New changeset 5f239b0ba819 by Victor Stinner in branch 'default':
Issue #13374: Deprecate os.getcwdb() on Windows
http://hg.python.org/cpython/rev/5f239b0ba819
History
Date User Action Args
2011-11-16 22:44:16hayposetstatus: open -> closed
resolution: fixed
2011-11-16 22:42:48python-devsetmessages: + msg147775
2011-11-15 23:33:02python-devsetmessages: + msg147731
2011-11-15 22:14:37loewissetmessages: + msg147713
2011-11-15 21:51:00floxsetnosy: + flox
messages: + msg147710
2011-11-15 21:25:55python-devsetmessages: + msg147709
2011-11-14 01:09:24hayposetmessages: + msg147582
2011-11-13 00:16:33loewissetmessages: + msg147537
2011-11-12 21:36:11sbtsetmessages: + msg147529
2011-11-09 21:54:38santa4ntsetnosy: + santa4nt
2011-11-09 21:04:22hayposetmessages: + msg147373
2011-11-09 20:53:37loewissetmessages: + msg147372
2011-11-09 20:41:16hayposetfiles: + deprecate_win_bytes_api-2.patch

messages: + msg147371
2011-11-09 16:36:41hayposetmessages: + msg147357
2011-11-09 15:51:14sbtsetnosy: + sbt
messages: + msg147356
2011-11-09 00:38:24hayposetnosy: + loewis, mhammond
2011-11-09 00:30:37hayposetmessages: + msg147327
2011-11-09 00:20:30hayposetmessages: + msg147326
2011-11-09 00:12:14python-devsetnosy: + python-dev
messages: + msg147325
2011-11-09 00:10:06haypocreate