Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

st_ino (unsigned long long) is casted to long long in posixmodule.c:_pystat_fromstructstat #73805

Closed
mba mannequin opened this issue Feb 22, 2017 · 22 comments
Closed

st_ino (unsigned long long) is casted to long long in posixmodule.c:_pystat_fromstructstat #73805

mba mannequin opened this issue Feb 22, 2017 · 22 comments
Labels
3.7 (EOL) end of life type-bug An unexpected behavior, bug, or error

Comments

@mba
Copy link
Mannequin

mba mannequin commented Feb 22, 2017

BPO 29619
Nosy @vstinner, @xdegaye, @vadmium, @serhiy-storchaka, @eryksun, @zooba, @zhangyangyu
PRs
  • bpo-29619: Convert st_ino using unsigned integer #557
  • [3.6] bpo-29619: Convert st_ino using unsigned integer (#557) #584
  • bpo-29619: Do not rely on HAVE_LARGEFILE_SUPPORT for the size of type… #1666
  • [3.6] bpo-29619: Do not use HAVE_LARGEFILE_SUPPORT for type conversions (GH-1666) #3102
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2017-05-22.09:17:51.115>
    created_at = <Date 2017-02-22.12:24:02.302>
    labels = ['type-bug', '3.7']
    title = 'st_ino (unsigned long long) is casted to long long in posixmodule.c:_pystat_fromstructstat'
    updated_at = <Date 2017-08-17.12:33:09.180>
    user = 'https://bugs.python.org/mba'

    bugs.python.org fields:

    activity = <Date 2017-08-17.12:33:09.180>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2017-05-22.09:17:51.115>
    closer = 'xdegaye'
    components = []
    creation = <Date 2017-02-22.12:24:02.302>
    creator = 'mba'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 29619
    keywords = []
    message_count = 22.0
    messages = ['288355', '289213', '289216', '289221', '289288', '290247', '290250', '292141', '292142', '292174', '293859', '293873', '293883', '293919', '293920', '293924', '293959', '293987', '294030', '294034', '294135', '300412']
    nosy_count = 8.0
    nosy_names = ['vstinner', 'xdegaye', 'martin.panter', 'serhiy.storchaka', 'eryksun', 'steve.dower', 'xiang.zhang', 'mba']
    pr_nums = ['557', '584', '1666', '3102']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue29619'
    versions = ['Python 3.6', 'Python 3.7']

    @mba
    Copy link
    Mannequin Author

    mba mannequin commented Feb 22, 2017

    ino_t is of type 'unsigned long' or 'unsigned long long'. Casting it to signed type means that negative values are incorrectly returned for large inodes.

    @mba mba mannequin added stdlib Python modules in the Lib dir 3.7 (EOL) end of life type-bug An unexpected behavior, bug, or error and removed stdlib Python modules in the Lib dir labels Feb 22, 2017
    @zhangyangyu
    Copy link
    Member

    Any reason our _Py_stat_struct on Windows uses a signed type to represent st_ino?

    @vstinner
    Copy link
    Member

    vstinner commented Mar 8, 2017

    On Windows, _Py_attribute_data_to_stat() converts BY_HANDLE_FILE_INFORMATION to _Py_stat_struct, and then _pystat_fromstructstat() creates Python objects.

    The file index in BY_HANDLE_FILE_INFORMATION is made of two DWORD, so yes, it's unsigned.

    On Linux, stat.st_ino type is ino_t which is unsigned too.

    So I created a pull request to fix the bug, even if I don't think that a filesystem produce inodes larger than 2^63-1. Not sure if it's worth it to backport the fix to Python 2.7, 3.5 and 3.6?

    @serhiy-storchaka
    Copy link
    Member

    I think it is worth to backport this at least to 3.6.

    @mba
    Copy link
    Mannequin Author

    mba mannequin commented Mar 9, 2017

    I don't think that a filesystem produce inodes larger than 2^63-1

    The problem is real: the large inode number was assigned to a file created by MacOS on a share exported via CIFS, and then stated by Linux using NFS export.

    @vstinner
    Copy link
    Member

    New changeset 68d2980 by Victor Stinner in branch '3.6':
    bpo-29619: Convert st_ino using unsigned integer (#557) (#584)
    68d2980

    @vstinner
    Copy link
    Member

    New changeset 0f6d733 by Victor Stinner in branch 'master':
    bpo-29619: Convert st_ino using unsigned integer (#557)
    0f6d733

    @xdegaye
    Copy link
    Mannequin

    xdegaye mannequin commented Apr 22, 2017

    The compilation of Modules/posixmodule.c fails now on Android architecture 'x86' at api level 21, after the above change 0f6d733:

    /opt/android-ndk/toolchains/llvm/prebuilt/linux-x86_64/bin/clang --sysroot=/opt/android-ndk/platforms/android-21/arch-x86 -target i686-none-linux-androideabi -gcc-toolchain /opt/android-ndk/toolchains/x86-4.9/prebuilt/linux-x86_64 -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wno-unused-value -Wno-empty-body -Qunused-arguments -Wno-parentheses-equality   -std=c99 -Wextra -Wno-unused-result -Wno-unused-parameter -Wno-missing-field-initializers -Werror=implicit-function-declaration  -IObjects -IInclude -IPython -I. -I./Include -I/opt/android-ndk/platforms/android-21/arch-x86/usr/include   -DPy_BUILD_CORE  -c ./Modules/posixmodule.c -o Modules/posixmodule.o
    ./Modules/posixmodule.c:1935:5: error: array size is negative
        Py_BUILD_ASSERT(sizeof(unsigned long) >= sizeof(st->st_ino));
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    ./Include/pymacro.h:43:15: note: expanded from macro 'Py_BUILD_ASSERT'
            (void)Py_BUILD_ASSERT_EXPR(cond);   \
                  ^~~~~~~~~~~~~~~~~~~~~~~~~~
    ./Include/pymacro.h:40:19: note: expanded from macro 'Py_BUILD_ASSERT_EXPR'
        (sizeof(char [1 - 2*!(cond)]) - 1)
                      ^~~~~~~~~~~~~
    1 error generated.
    make: *** [Makefile:1720: Modules/posixmodule.o] Error 1
    
    The following code:
      #include <stdio.h>
      #include <sys/stat.h>
    
      int main()
      {
          struct stat *st;
    
          printf("sizeof(unsigned long): %d - sizeof(st->st_ino): %d\n",
                 sizeof(unsigned long), sizeof(st->st_ino));
          return 0;
      }
    prints the following result on this same system:
      sizeof(unsigned long): 4 - sizeof(st->st_ino): 8

    @vstinner
    Copy link
    Member

    Would it work with unsigned long long?

    What is the value of HAVE_LARGEFILE_SUPPORT?

    Maybe the code should be simplified to always use unsigned long long.

    @xdegaye
    Copy link
    Mannequin

    xdegaye mannequin commented Apr 23, 2017

    sizeof(unsigned long long) is 8 on Android x86 and HAVE_LARGEFILE_SUPPORT is undefined.
    According to msg280053 HAVE_LARGEFILE_SUPPORT is also undefined on Android x86_64 (and also on Linux x86_64).

    Maybe the code should be simplified to always use unsigned long long.
    Yes. If I understand correcty, one could only use 'unsigned long long' to assert at build time and use the smallest of the two unsigned long that fits with the size of st->st_ino at runtime.

    @xdegaye
    Copy link
    Mannequin

    xdegaye mannequin commented May 17, 2017

    In configure.ac, HAVE_LARGEFILE_SUPPORT is defined if (off_t > long && longlong >= off_t) and thus, when it is not defined, it means that off_t <= long (since longlong < off_t is false) so an off_t should fit into a long. But on Android the size of off_t is different from the size of ino_t so one cannot use the value of HAVE_LARGEFILE_SUPPORT to know if an ino_t would fit into an unsigned long.

    On Android architecture 'x86' at api level 21:
    sizeof(st->st_ino) = 8
    and from the output of configure:
    checking size of off_t... 4
    checking size of long... 4
    checking size of long long... 8

    @vstinner
    Copy link
    Member

    Before 0f6d733, the code used PyLong_FromLong((long)st->st_ino), so the change doesn't introduce a regression but detected a real bug, right?

    On Android architecture 'x86' at api level 21:
    sizeof(st->st_ino) = 8
    and from the output of configure:
    checking size of off_t... 4
    checking size of long... 4
    checking size of long long... 8

    What is the type of st_ino? It's not off_t?

    A workaround would be to force the usage unsigned long long on Android, right?

    Can you please try the following patch on Android, and propose a PR if it works?

    diff --git a/Modules/posixmodule.c b/Modules/posixmodule.c
    index e8c15a9..78b3cb9 100644
    --- a/Modules/posixmodule.c
    +++ b/Modules/posixmodule.c
    @@ -1927,7 +1927,7 @@ _pystat_fromstructstat(STRUCT_STAT *st)
             return NULL;
     
         PyStructSequence_SET_ITEM(v, 0, PyLong_FromLong((long)st->st_mode));
    -#if defined(HAVE_LARGEFILE_SUPPORT) || defined(MS_WINDOWS)
    +#if defined(HAVE_LARGEFILE_SUPPORT) || defined(MS_WINDOWS) || defined(defined(__ANDROID__))
         Py_BUILD_ASSERT(sizeof(unsigned long long) >= sizeof(st->st_ino));
         PyStructSequence_SET_ITEM(v, 1,
                                   PyLong_FromUnsignedLongLong(st->st_ino));

    @vstinner vstinner reopened this May 17, 2017
    @xdegaye
    Copy link
    Mannequin

    xdegaye mannequin commented May 17, 2017

    What is the type of st_ino? It's not off_t?

    The type of st_ino is unsigned long long as defined in sys/stat.h for x86 and off_t is defined in sys/types.h this way;, quoting:

    /* This historical accident means that we had a 32-bit off_t on 32-bit architectures. */
    typedef __kernel_off_t off_t;
    

    There is also a related section in the bionic github repo [1] quoted here:

    =========================
    This probably belongs in the NDK documentation rather than here, but these are the known ABI bugs in the 32-bit ABI:

    time_t is 32-bit. http://b/5819737. In the 64-bit ABI, time_t is 64-bit.
    
    off_t is 32-bit. There is off64_t, and in newer releases there is almost-complete support for _FILE_OFFSET_BITS. Unfortunately our stdio implementation uses 32-bit offsets and -- worse -- function pointers to functions that use 32-bit offsets, so there's no good way to implement the last few pieces http://b/24807045. In the 64-bit ABI, off_t is off64_t.
    
    sigset_t is too small on ARM and x86 (but correct on MIPS), so support for real-time signals is broken. http://b/5828899 In the 64-bit ABI, sigset_t is the correct size for every architecture.
    

    =========================

    I will test your patch on x86 and x86_64 on Android and propose a patch if it works.

    [1] https://android.googlesource.com/platform/bionic.git/#32_bit-ABI-bugs

    @xdegaye
    Copy link
    Mannequin

    xdegaye mannequin commented May 18, 2017

    With the patch proposed by Victor in msg293873 the cross-compilation of posixmodule.c succeeds on android-21-x86 and android-21-x86_64.

    Following my suggestion in msg292174, I propose the following patch instead:

    diff --git a/Modules/posixmodule.c b/Modules/posixmodule.c
    index e8c15a9473..d7ff3c78bd 100644
    --- a/Modules/posixmodule.c
    +++ b/Modules/posixmodule.c
    @@ -1927,14 +1927,12 @@ _pystat_fromstructstat(STRUCT_STAT *st)
             return NULL;
     
         PyStructSequence_SET_ITEM(v, 0, PyLong_FromLong((long)st->st_mode));
    -#if defined(HAVE_LARGEFILE_SUPPORT) || defined(MS_WINDOWS)
         Py_BUILD_ASSERT(sizeof(unsigned long long) >= sizeof(st->st_ino));
    -    PyStructSequence_SET_ITEM(v, 1,
    +    if (sizeof(unsigned long) >= sizeof(st->st_ino))
    +        PyStructSequence_SET_ITEM(v, 1, PyLong_FromUnsignedLong(st->st_ino));
    +    else
    +        PyStructSequence_SET_ITEM(v, 1,
                                   PyLong_FromUnsignedLongLong(st->st_ino));
    -#else
    -    Py_BUILD_ASSERT(sizeof(unsigned long) >= sizeof(st->st_ino));
    -    PyStructSequence_SET_ITEM(v, 1, PyLong_FromUnsignedLong(st->st_ino));
    -#endif
     #ifdef MS_WINDOWS
         PyStructSequence_SET_ITEM(v, 2, PyLong_FromUnsignedLong(st->st_dev));
     #else

    This patch removes the conditional on HAVE_LARGEFILE_SUPPORT (explicit is better than implicit) and fixes the problem for any operating system where the size of off_t and the size of st_ino are different.
    The compiler removes the dead branch of the <if (sizeof(unsigned long)...> conditional, so actually it compiles to the same binary as the other patch (gcc does this without any optimization configured on the command line, I didn't check this for clang).

    The same kind of fix could also be applied elsewhere in posixmodule.c where behind the use of HAVE_LARGEFILE_SUPPORT there is an incorrect assumption that the size of off_t is the same as the size of another type:

    • In _pystat_fromstructstat() - a little bit further down this patch - where the other type is st_size.
    • In os_DirEntry_inode_impl() where the other type is ino_t.
      Maybe this should be done in a new issue then ?

    @xdegaye
    Copy link
    Mannequin

    xdegaye mannequin commented May 18, 2017

    The same kind of fix could also be applied elsewhere in posixmodule.c

    By also adding a Py_BUILD_ASSERT() statement which is a welcome improvement on the previous code that allowed to pinpoint the problem on Android.

    @vstinner
    Copy link
    Member

    Please propose the change as a PR.

    @xdegaye
    Copy link
    Mannequin

    xdegaye mannequin commented May 19, 2017

    PR 1666 added.

    With Unified Headers introduced in android-ndk-r14, the size of off_t for 32 bits Android devices is 8 instead of 4 and the cross-compilation of Python on Android does not fail in this case, see msg293956.

    @vadmium
    Copy link
    Member

    vadmium commented May 20, 2017

    What is the advantage of compiling calls to both Long and LongLong converters? Isn’t it simpler to blindly call PyLong_FromUnsignedLongLong or similar?

    @xdegaye
    Copy link
    Mannequin

    xdegaye mannequin commented May 20, 2017

    Good point.
    PyLong_FromLong() is faster than PyLong_FromLongLong(), IMO the fastest should be used when it is possible.
    PyLong_FromUnsignedLong() and PyLong_FromUnsignedLongLong() are almost identical, they implement the same algorithm. The first one uses unsigned long (on the stack and for a register) while the other uses unsigned long long, so the difference in performance may be in the order of nano seconds on current processors. Should only PyLong_FromUnsignedLongLong() be used then or should this be left as is for consistency with the handling of PyLong_FromLong() ?

    Another point (not related) is that the d_ino member of the DirEntry structure is of type ino_t which is unsigned and PR 1666 should be corrected to use unsigned instead in the last modifications made by the PR in os_DirEntry_inode_impl(). I will push this change when a decision has been taken on the point raised by Martin.

    @vstinner
    Copy link
    Member

    I'm fine with always using unsigned long long. I don't think that it's even
    possible to measure the perf difference and if it's doable, I expect don't
    expect it to be significant... especially compared to the cost of the stat
    () syscall!

    Well, most computers are 64-bit. Even smartphones, no?

    @xdegaye
    Copy link
    Mannequin

    xdegaye mannequin commented May 22, 2017

    New changeset 50e8603 by xdegaye in branch 'master':
    bpo-29619: Do not use HAVE_LARGEFILE_SUPPORT for type conversions (GH-1666).
    50e8603

    @xdegaye xdegaye mannequin closed this as completed May 22, 2017
    @vstinner
    Copy link
    Member

    New changeset ffbb6f7 by Victor Stinner in branch '3.6':
    bpo-29619: Do not use HAVE_LARGEFILE_SUPPORT for type conversions (GH-1666) (bpo-3102)
    ffbb6f7

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants