classification
Title: st_ino (unsigned long long) is casted to long long in posixmodule.c:_pystat_fromstructstat
Type: behavior Stage: resolved
Components: Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: eryksun, haypo, martin.panter, mba, serhiy.storchaka, steve.dower, xdegaye, xiang.zhang
Priority: normal Keywords:

Created on 2017-02-22 12:24 by mba, last changed 2017-05-22 09:17 by xdegaye. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 557 merged haypo, 2017-03-08 08:04
PR 584 merged haypo, 2017-03-09 16:41
PR 1666 merged xdegaye, 2017-05-19 15:16
Messages (21)
msg288355 - (view) Author: Mba (mba) Date: 2017-02-22 12:24
ino_t is of type 'unsigned long' or 'unsigned long long'. Casting it to signed type means that negative values are incorrectly returned for large inodes.
msg289213 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-03-08 07:36
Any reason our _Py_stat_struct on Windows uses a signed type to represent st_ino?
msg289216 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-03-08 08:07
On Windows, _Py_attribute_data_to_stat() converts BY_HANDLE_FILE_INFORMATION to _Py_stat_struct, and then _pystat_fromstructstat() creates Python objects.

The file index in BY_HANDLE_FILE_INFORMATION is made of two DWORD, so yes, it's unsigned.

On Linux, stat.st_ino type is ino_t which is unsigned too.

So I created a pull request to fix the bug, even if I don't think that a filesystem produce inodes larger than 2^63-1. Not sure if it's worth it to backport the fix to Python 2.7, 3.5 and 3.6?
msg289221 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-08 09:24
I think it is worth to backport this at least to 3.6.
msg289288 - (view) Author: Mba (mba) Date: 2017-03-09 12:52
> I don't think that a filesystem produce inodes larger than 2^63-1

The problem is real: the large inode number was assigned to a file created by MacOS on a share exported via CIFS, and then stated by Linux using NFS export.
msg290247 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-03-24 22:37
New changeset 68d29809405dc766966b2b973b8597212fbc3dbd by Victor Stinner in branch '3.6':
bpo-29619: Convert st_ino using unsigned integer (#557) (#584)
https://github.com/python/cpython/commit/68d29809405dc766966b2b973b8597212fbc3dbd
msg290250 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-03-24 22:38
New changeset 0f6d73343d342c106cda2219ebb8a6f0c4bd9b3c by Victor Stinner in branch 'master':
bpo-29619: Convert st_ino using unsigned integer (#557)
https://github.com/python/cpython/commit/0f6d73343d342c106cda2219ebb8a6f0c4bd9b3c
msg292141 - (view) Author: Xavier de Gaye (xdegaye) * (Python committer) Date: 2017-04-22 22:26
The compilation of Modules/posixmodule.c fails now on Android architecture 'x86' at api level 21, after the above change 0f6d73343d342c106cda2219ebb8a6f0c4bd9b3c:

/opt/android-ndk/toolchains/llvm/prebuilt/linux-x86_64/bin/clang --sysroot=/opt/android-ndk/platforms/android-21/arch-x86 -target i686-none-linux-androideabi -gcc-toolchain /opt/android-ndk/toolchains/x86-4.9/prebuilt/linux-x86_64 -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wno-unused-value -Wno-empty-body -Qunused-arguments -Wno-parentheses-equality   -std=c99 -Wextra -Wno-unused-result -Wno-unused-parameter -Wno-missing-field-initializers -Werror=implicit-function-declaration  -IObjects -IInclude -IPython -I. -I./Include -I/opt/android-ndk/platforms/android-21/arch-x86/usr/include   -DPy_BUILD_CORE  -c ./Modules/posixmodule.c -o Modules/posixmodule.o
./Modules/posixmodule.c:1935:5: error: array size is negative
    Py_BUILD_ASSERT(sizeof(unsigned long) >= sizeof(st->st_ino));
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./Include/pymacro.h:43:15: note: expanded from macro 'Py_BUILD_ASSERT'
        (void)Py_BUILD_ASSERT_EXPR(cond);   \
              ^~~~~~~~~~~~~~~~~~~~~~~~~~
./Include/pymacro.h:40:19: note: expanded from macro 'Py_BUILD_ASSERT_EXPR'
    (sizeof(char [1 - 2*!(cond)]) - 1)
                  ^~~~~~~~~~~~~
1 error generated.
make: *** [Makefile:1720: Modules/posixmodule.o] Error 1

The following code:
  #include <stdio.h>
  #include <sys/stat.h>

  int main()
  {
      struct stat *st;

      printf("sizeof(unsigned long): %d - sizeof(st->st_ino): %d\n",
             sizeof(unsigned long), sizeof(st->st_ino));
      return 0;
  }
prints the following result on this same system:
  sizeof(unsigned long): 4 - sizeof(st->st_ino): 8
msg292142 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-04-22 22:48
Would it work with unsigned long long?

What is the value of HAVE_LARGEFILE_SUPPORT?

Maybe the code should be simplified to always use unsigned long long.
msg292174 - (view) Author: Xavier de Gaye (xdegaye) * (Python committer) Date: 2017-04-23 17:18
sizeof(unsigned long long) is 8 on Android x86 and HAVE_LARGEFILE_SUPPORT is undefined.
According to msg280053 HAVE_LARGEFILE_SUPPORT is also undefined on Android x86_64 (and also on Linux x86_64).

> Maybe the code should be simplified to always use unsigned long long.
Yes. If I understand correcty, one could only use 'unsigned long long' to assert at build time and use the smallest of the two unsigned long that fits with the size of st->st_ino at runtime.
msg293859 - (view) Author: Xavier de Gaye (xdegaye) * (Python committer) Date: 2017-05-17 16:11
In configure.ac, HAVE_LARGEFILE_SUPPORT is defined if (off_t > long && longlong >= off_t) and thus, when it is not defined, it means that off_t <= long (since longlong < off_t is false) so an off_t should fit into a long. But on Android the size of off_t is different from the size of ino_t so one cannot use the value of HAVE_LARGEFILE_SUPPORT to know if an ino_t would fit into an unsigned long.

On Android architecture 'x86' at api level 21:
    sizeof(st->st_ino) = 8
and from the output of configure:
    checking size of off_t... 4
    checking size of long... 4
    checking size of long long... 8
msg293873 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-05-17 18:48
Before 0f6d73343d342c106cda2219ebb8a6f0c4bd9b3c, the code used PyLong_FromLong((long)st->st_ino), so the change doesn't introduce a regression but detected a real bug, right?

> On Android architecture 'x86' at api level 21:
>    sizeof(st->st_ino) = 8
> and from the output of configure:
>    checking size of off_t... 4
>    checking size of long... 4
>    checking size of long long... 8

What is the type of st_ino? It's not off_t?

A workaround would be to force the usage unsigned long long on Android, right?

Can you please try the following patch on Android, and propose a PR if it works?

diff --git a/Modules/posixmodule.c b/Modules/posixmodule.c
index e8c15a9..78b3cb9 100644
--- a/Modules/posixmodule.c
+++ b/Modules/posixmodule.c
@@ -1927,7 +1927,7 @@ _pystat_fromstructstat(STRUCT_STAT *st)
         return NULL;
 
     PyStructSequence_SET_ITEM(v, 0, PyLong_FromLong((long)st->st_mode));
-#if defined(HAVE_LARGEFILE_SUPPORT) || defined(MS_WINDOWS)
+#if defined(HAVE_LARGEFILE_SUPPORT) || defined(MS_WINDOWS) || defined(defined(__ANDROID__))
     Py_BUILD_ASSERT(sizeof(unsigned long long) >= sizeof(st->st_ino));
     PyStructSequence_SET_ITEM(v, 1,
                               PyLong_FromUnsignedLongLong(st->st_ino));
msg293883 - (view) Author: Xavier de Gaye (xdegaye) * (Python committer) Date: 2017-05-17 19:55
> What is the type of st_ino? It's not off_t?

The type of st_ino is unsigned long long as defined in sys/stat.h for x86 and off_t is defined in sys/types.h this way;, quoting:

    /* This historical accident means that we had a 32-bit off_t on 32-bit architectures. */
    typedef __kernel_off_t off_t;

There is also a related section in the bionic github repo [1] quoted here:

=========================
This probably belongs in the NDK documentation rather than here, but these are the known ABI bugs in the 32-bit ABI:

    time_t is 32-bit. http://b/5819737. In the 64-bit ABI, time_t is 64-bit.

    off_t is 32-bit. There is off64_t, and in newer releases there is almost-complete support for _FILE_OFFSET_BITS. Unfortunately our stdio implementation uses 32-bit offsets and -- worse -- function pointers to functions that use 32-bit offsets, so there's no good way to implement the last few pieces http://b/24807045. In the 64-bit ABI, off_t is off64_t.

    sigset_t is too small on ARM and x86 (but correct on MIPS), so support for real-time signals is broken. http://b/5828899 In the 64-bit ABI, sigset_t is the correct size for every architecture.
=========================

I will test your patch on x86 and x86_64 on Android and propose a patch if it works.

[1] https://android.googlesource.com/platform/bionic.git/#32_bit-ABI-bugs
msg293919 - (view) Author: Xavier de Gaye (xdegaye) * (Python committer) Date: 2017-05-18 11:39
With the patch proposed by Victor in msg293873 the cross-compilation of posixmodule.c succeeds on android-21-x86 and android-21-x86_64.

Following my suggestion in msg292174, I propose the following patch instead:

diff --git a/Modules/posixmodule.c b/Modules/posixmodule.c
index e8c15a9473..d7ff3c78bd 100644
--- a/Modules/posixmodule.c
+++ b/Modules/posixmodule.c
@@ -1927,14 +1927,12 @@ _pystat_fromstructstat(STRUCT_STAT *st)
         return NULL;
 
     PyStructSequence_SET_ITEM(v, 0, PyLong_FromLong((long)st->st_mode));
-#if defined(HAVE_LARGEFILE_SUPPORT) || defined(MS_WINDOWS)
     Py_BUILD_ASSERT(sizeof(unsigned long long) >= sizeof(st->st_ino));
-    PyStructSequence_SET_ITEM(v, 1,
+    if (sizeof(unsigned long) >= sizeof(st->st_ino))
+        PyStructSequence_SET_ITEM(v, 1, PyLong_FromUnsignedLong(st->st_ino));
+    else
+        PyStructSequence_SET_ITEM(v, 1,
                               PyLong_FromUnsignedLongLong(st->st_ino));
-#else
-    Py_BUILD_ASSERT(sizeof(unsigned long) >= sizeof(st->st_ino));
-    PyStructSequence_SET_ITEM(v, 1, PyLong_FromUnsignedLong(st->st_ino));
-#endif
 #ifdef MS_WINDOWS
     PyStructSequence_SET_ITEM(v, 2, PyLong_FromUnsignedLong(st->st_dev));
 #else

This patch removes the conditional on HAVE_LARGEFILE_SUPPORT (explicit is better than implicit) and fixes the problem for any operating system where the size of off_t and the size of st_ino are different.
The compiler removes the dead branch of the <if (sizeof(unsigned long)...> conditional, so actually it compiles to the same binary as the other patch (gcc does this without any optimization configured on the command line, I didn't check this for clang).

The same kind of fix could also be applied elsewhere in posixmodule.c where behind the use of HAVE_LARGEFILE_SUPPORT there is an incorrect assumption that the size of off_t is the same as the size of another type:
  * In _pystat_fromstructstat() - a little bit further down this patch - where the other type is st_size.
  * In os_DirEntry_inode_impl() where the other type is ino_t.
Maybe this should be done in a new issue then ?
msg293920 - (view) Author: Xavier de Gaye (xdegaye) * (Python committer) Date: 2017-05-18 11:45
> The same kind of fix could also be applied elsewhere in posixmodule.c

By also adding a Py_BUILD_ASSERT() statement which is a welcome improvement on the previous code that allowed to pinpoint the problem on Android.
msg293924 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-05-18 14:51
Please propose the change as a PR.
msg293959 - (view) Author: Xavier de Gaye (xdegaye) * (Python committer) Date: 2017-05-19 15:24
PR 1666 added.

With Unified Headers introduced in android-ndk-r14, the size of off_t for 32 bits Android devices is 8 instead of 4 and the cross-compilation of Python on Android does not fail in this case, see msg293956.
msg293987 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-05-20 00:09
What is the advantage of compiling calls to both Long and LongLong converters? Isn’t it simpler to blindly call PyLong_FromUnsignedLongLong or similar?
msg294030 - (view) Author: Xavier de Gaye (xdegaye) * (Python committer) Date: 2017-05-20 14:12
Good point.
PyLong_FromLong() is faster than PyLong_FromLongLong(), IMO the fastest should be used when it is possible.
PyLong_FromUnsignedLong() and PyLong_FromUnsignedLongLong() are almost identical, they implement the same algorithm. The first one uses unsigned long (on the stack and for a register) while the other uses unsigned long long, so the difference in performance may be in the order of nano seconds on current processors. Should only PyLong_FromUnsignedLongLong() be used then or should this be left as is for consistency with the handling of PyLong_FromLong() ?

Another point (not related) is that the d_ino member of the DirEntry structure is of type ino_t which is unsigned and PR 1666 should be corrected to use unsigned instead in the last modifications made by the PR in os_DirEntry_inode_impl(). I will push this change when a decision has been taken on the point raised by Martin.
msg294034 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-05-20 15:23
I'm fine with always using unsigned long long. I don't think that it's even
possible to measure the perf difference and if it's doable, I expect don't
expect it to be significant... especially compared to the cost of the stat
() syscall!

Well, most computers are 64-bit. Even smartphones, no?
msg294135 - (view) Author: Xavier de Gaye (xdegaye) * (Python committer) Date: 2017-05-22 09:15
New changeset 50e86033de85294d87b7e942701d456342abde8e by xdegaye in branch 'master':
bpo-29619: Do not use HAVE_LARGEFILE_SUPPORT for type conversions (GH-1666).
https://github.com/python/cpython/commit/50e86033de85294d87b7e942701d456342abde8e
History
Date User Action Args
2017-05-22 09:17:51xdegayesetstatus: open -> closed
resolution: fixed
2017-05-22 09:15:10xdegayesetmessages: + msg294135
2017-05-20 15:23:49hayposetmessages: + msg294034
2017-05-20 14:12:02xdegayesetmessages: + msg294030
2017-05-20 00:09:46martin.pantersetnosy: + martin.panter
messages: + msg293987
2017-05-19 15:24:14xdegayesetmessages: + msg293959
2017-05-19 15:16:25xdegayesetpull_requests: + pull_request1761
2017-05-18 14:51:24hayposetmessages: + msg293924
2017-05-18 11:45:11xdegayesetmessages: + msg293920
2017-05-18 11:39:51xdegayesetmessages: + msg293919
2017-05-17 19:55:53xdegayesetmessages: + msg293883
2017-05-17 18:48:50hayposetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg293873
2017-05-17 16:11:47xdegayesetmessages: + msg293859
2017-04-23 17:18:28xdegayesetmessages: + msg292174
2017-04-22 22:48:43hayposetmessages: + msg292142
2017-04-22 22:26:39xdegayesetnosy: + xdegaye
messages: + msg292141
2017-03-24 22:58:04xiang.zhangsetstatus: open -> closed
stage: resolved
resolution: fixed
versions: - Python 2.7, Python 3.5
2017-03-24 22:38:32hayposetmessages: + msg290250
2017-03-24 22:37:58hayposetmessages: + msg290247
2017-03-18 06:32:45serhiy.storchakasetpull_requests: - pull_request599
2017-03-17 21:00:33larrysetpull_requests: + pull_request599
2017-03-09 16:41:36hayposetpull_requests: + pull_request481
2017-03-09 12:52:08mbasetmessages: + msg289288
2017-03-08 09:24:49serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg289221
2017-03-08 08:08:00hayposetnosy: + haypo
messages: + msg289216
2017-03-08 08:04:54hayposetpull_requests: + pull_request458
2017-03-08 07:36:06xiang.zhangsetnosy: + eryksun, steve.dower
messages: + msg289213
2017-02-22 12:55:43xiang.zhangsetnosy: + xiang.zhang

versions: - Python 3.3, Python 3.4
2017-02-22 12:39:48mbasetcomponents: - Library (Lib)
2017-02-22 12:24:02mbacreate