msg140222 - (view) |
Author: Kuberan Naganathan (Kuberan.Naganathan) |
Date: 2011-07-13 03:29 |
The lseek function can legitimately return a code less then zero ( except for -1 ) when seeking beyond an offset of 2^63. This behavior should be supported in order to permit the python interpreter to seek in files with valid data at locations greater than or equal to 2^63. This can happen in a sparse file or in the /proc file system address space file.
The fix is simple. In the posix_lseek function check for result != -1 instead of checking for result < 0 in return code checks of the value returned by lseek.
|
msg140223 - (view) |
Author: Kuberan Naganathan (Kuberan.Naganathan) |
Date: 2011-07-13 03:50 |
In addition I would like the posix_lseek function to accept a value larger than 2^63 as a seek offset. Currently I seem to need to seek multiple times within a file to achieve an offset larger than 2^63 ( assuming the return type check on the return value of lseek is fixed. )
|
msg140250 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2011-07-13 12:09 |
> In addition I would like the posix_lseek function to accept a value
> larger than 2^63 as a seek offset
How would it work? The C lseek() takes a signed (64-bit) offset as argument, so we would have to call it multiple times anyway.
|
msg140251 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2011-07-13 13:08 |
"... A negative file offset may be valid for some devices in some implementations.
The POSIX.1-1990 standard did not specifically prohibit lseek() from returning a negative offset. Therefore, an application was required to clear errno prior to the call and check errno upon return to determine whether a return value of ( off_t)-1 is a negative offset or an indication of an error condition. The standard developers did not wish to require this action on the part of a conforming application, and chose to require that errno be set to [EINVAL] when the resulting file offset would be negative for a regular file, block special file, or directory."
Extract of:
http://pubs.opengroup.org/onlinepubs/009695399/functions/lseek.html
|
msg140253 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2011-07-13 14:21 |
> In the posix_lseek function check for result != -1 instead of checking
> for result < 0 in return code checks of the value returned by lseek.
+1
This can be useful when parsing /dev/mem on x86_64, for example.
> In addition I would like the posix_lseek function to accept a value
> larger than 2^63 as a seek offset
Like Antoine, I'm a little bit puzzled here.
|
msg140261 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2011-07-13 14:45 |
> How would it work? The C lseek() takes a signed (64-bit) offset
> as argument, so we would have to call it multiple times anyway.
Python does already call multiple times the same function if the input is larger than the type used by the function. Some examples:
- mbcs codec: work on chunk on INT_MAX bytes (input size type: Py_ssize_t)
- zlib: crc32() works on chunk on UINT_MAX bytes (input size type: Py_ssize_t)
We have also functions processing fewer data if the function cannot handle all data: FileIO.write() clamps the buffer size of INT_MAX on Windows for example.
If we call lseek() multiple times, the implementation will depend on the whence value: SEEK_SET will use SEEK_SET and then SEEK_CUR. For SEEK_CUR, all calls can use SEEK_CUR. For SEEK_END... hum? SEEK_END and then SEEK_CUR?
I vote -1 for this feature because I bet that the behaviour of lseek() with an file position > 2^63 or offset > 2^63 highly depend on the platform (kernel, libc) version.
You can implement it in Python for your usecase. I prefer to keep a thin wrapper with an known and reliable behaviour.
|
msg140270 - (view) |
Author: Kuberan Naganathan (Kuberan.Naganathan) |
Date: 2011-07-13 15:24 |
Im most familiar with solaris. Atleast on solaris lseek interprets its
signed arg as the unsigned value with the same bit pattern. I cannot be
certain that this is common across other operating systems.
On Jul 13, 2011 8:09 AM, "Antoine Pitrou" <report@bugs.python.org> wrote:
>
> Antoine Pitrou <pitrou@free.fr> added the comment:
>
>> In addition I would like the posix_lseek function to accept a value
>> larger than 2^63 as a seek offset
>
> How would it work? The C lseek() takes a signed (64-bit) offset as
argument, so we would have to call it multiple times anyway.
>
> ----------
> nosy: +neologix, pitrou
> versions: +Python 3.2, Python 3.3
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue12545>
> _______________________________________
|
msg140272 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2011-07-13 15:31 |
> Im most familiar with solaris. Atleast on solaris lseek interprets its
> signed arg as the unsigned value with the same bit pattern.
Only with SEEK_SET, I assume?
|
msg140289 - (view) |
Author: Kuberan Naganathan (Kuberan.Naganathan) |
Date: 2011-07-13 16:58 |
I did some testing on this. Solaris seems to treat files as a large
circular buffer of size 2^64 with respect to seeks in the /proc file system
address space file. No error results with seeks of greater than 2^63 with
any of SEEK_SET, SEEK_CUR, SEEK_END. This is not true of a regular file
where I get errno=22 when seeking to an offset greater than or equal to
2^63.
So even on solaris the behavior seems to be filesystem dependent.
|
msg140605 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2011-07-18 17:15 |
> So even on solaris the behavior seems to be filesystem dependent.
>
So I'd suggest forgetting about this part.
Would you like to write a patch for the first point?
|
msg140715 - (view) |
Author: Kuberan Naganathan (Kuberan.Naganathan) |
Date: 2011-07-20 02:07 |
Hi. I can submit a patch for the first part. Should I submit on this issue tracker item?
|
msg140721 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2011-07-20 06:46 |
> Hi. I can submit a patch for the first part. Should I submit on this issue tracker item?
>
Sure.
|
msg140723 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2011-07-20 08:18 |
> So I'd suggest forgetting about this part.
If someone really wants this feature (relative seek of more than 2^63), he/she should open a new issue.
This issue remembers me a mktime() issue: mktime() may return -1 even if it is not an error. time_mktime() uses now a sentinel to detect an error:
buf.tm_wday = -1; /* sentinel; original value ignored */
tt = mktime(&buf);
/* Return value of -1 does not necessarily mean an error, but tm_wday
* cannot remain set to -1 if mktime succeeded. */
if (tt == (time_t)(-1) && buf.tm_wday == -1) /* OverflowError */
For lseek, we can rely on errno. Try something like that:
errno = 0;
offset = lseek(...);
if (offset == (off_t)-1 && errno) /* error */
We can write a test using a sparse file... Or maybe a mmap object?
|
msg140807 - (view) |
Author: Kuberan Naganathan (Kuberan.Naganathan) |
Date: 2011-07-21 11:07 |
Hi. I'm unable ( or have to jump through lots of hoops ) to submit this patch myself for regulatory reasons. Can someone else submit this please? Thanks.
|
msg140904 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2011-07-22 20:35 |
Patch attached.
> For lseek, we can rely on errno. Try something like that:
>
> errno = 0;
> offset = lseek(...);
> if (offset == (off_t)-1 && errno) /* error */
>
It's a little bit overkill :-) (for mktime, time_t can overflow easily
on 32-bit).
> We can write a test using a sparse file... Or maybe a mmap object?
>
I'm not sure it's easily testable, because it's really a corner case
not addressed by POSIX. On my Linux box, I can't get lseek to return a
negative value, I get EINVAL - which does make sense (the Linux kernel
doesn't accept or return negative file offsets, see
http://lwn.net/Articles/138063/).
Kuberan, on what operating system did you notice this problem? Solaris?
|
msg140923 - (view) |
Author: Kuberan Naganathan (Kuberan.Naganathan) |
Date: 2011-07-23 01:16 |
yes. i noticed problem on solaris on the /proc/<pid>/as file which usually
has mapped regions beyond 2^63 in most process files
On Jul 22, 2011 4:35 PM, "Charles-François Natali" <report@bugs.python.org>
wrote:
>
> Charles-François Natali <neologix@free.fr> added the comment:
>
> Patch attached.
>
>> For lseek, we can rely on errno. Try something like that:
>>
>> errno = 0;
>> offset = lseek(...);
>> if (offset == (off_t)-1 && errno) /* error */
>>
>
> It's a little bit overkill :-) (for mktime, time_t can overflow easily
> on 32-bit).
>
>> We can write a test using a sparse file... Or maybe a mmap object?
>>
>
> I'm not sure it's easily testable, because it's really a corner case
> not addressed by POSIX. On my Linux box, I can't get lseek to return a
> negative value, I get EINVAL - which does make sense (the Linux kernel
> doesn't accept or return negative file offsets, see
> http://lwn.net/Articles/138063/).
>
> Kuberan, on what operating system did you notice this problem? Solaris?
>
> ----------
> keywords: +patch
> Added file: http://bugs.python.org/file22719/lseek_negative.diff
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue12545>
> _______________________________________
|
msg190724 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-06-06 20:16 |
Oh, I just reproduced the issue on my gdb.py program, which is part of the python-ptrace program. When searching for a pattern in the memory of another process, /proc/pid/maps is used to get the list of all memory mappings of this process. On Linux x64, the last mapping is "ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]", machine code used for system calls. gdb.py opens /proc/pid/mem in binary mode and tries to seek at 0xffffffffff600000 but FileIO.seek() fails because it gets an offset bigger than 2^63, which is seen as a negative number and FileIO.seek() considers a negative number as an error.
Attached patch fixes this issue by checking not only the result of lseek(), but also errno:
errno = 0;
Py_BEGIN_ALLOW_THREADS
#if defined(MS_WIN64) || defined(MS_WINDOWS)
res = _lseeki64(fd, pos, how);
#else
res = lseek(fd, pos, how);
#endif
Py_END_ALLOW_THREADS
if (res == (off_t)-1 && errno)
return posix_error();
PyLong_AsUnsignedLong() is tried if PyLong_AsLong() failed with an OverflowError. I kept PyLong_AsLong() for relative seek with a negative offset, and I prefer to try first PyLong_AsLong() because the overflow case is very unlikely compared to negative offsets for relative seek.
The result is always casted to an unsigned number. I never seen a device or file descriptor supporting negative offset.
I don't know how to test this code path automatically. I only know the "[vsyscall]" mapping use case on Linux, and it requires to parse /proc/self/maps which is not trivial and I would prefer sometimes simpler (more reliable with less maintenance). I propose to not add an unit test, except someone find how to write a simple and reliable test.
> This is not true of a regular file where I get errno=22
> when seeking to an offset greater than or equal to 2^63.
Same behaviour on Linux: OSError(22, "Invalid argument") when I try file.seek(1 << 63) on a regular file.
|
msg190725 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-06-06 20:17 |
About the version field: If we decide to change os.lseek() and FileIO.seek(), it is safer to only do it in Python 3.4.
|
msg190727 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-06-06 20:27 |
lseek_negative.diff is not correct: 0x8000000000000000 is a valid offset for a /proc/pid/mem file.
|
msg190728 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-06-06 20:29 |
lseek_unsigned_test.py: script to test manually the patch on Linux. It may work on Solaris, I didn't try.
|
msg190729 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-06-06 20:32 |
The patch is not complete: _iomodule.h defines PyLong_AsOff_t() and PyLong_FromOff_t(), but these functions don't support unsigned off_t (larger than 2^63-1).
#if defined(MS_WIN64) || defined(MS_WINDOWS)
/* Windows uses long long for offsets */
typedef PY_LONG_LONG Py_off_t;
# define PyLong_AsOff_t PyLong_AsLongLong
# define PyLong_FromOff_t PyLong_FromLongLong
# define PY_OFF_T_MAX PY_LLONG_MAX
# define PY_OFF_T_MIN PY_LLONG_MIN
# define PY_OFF_T_COMPAT PY_LONG_LONG /* type compatible with off_t */
# define PY_PRIdOFF "lld" /* format to use for that type */
#else
/* Other platforms use off_t */
typedef off_t Py_off_t;
#if (SIZEOF_OFF_T == SIZEOF_SIZE_T)
# define PyLong_AsOff_t PyLong_AsSsize_t
# define PyLong_FromOff_t PyLong_FromSsize_t
# define PY_OFF_T_MAX PY_SSIZE_T_MAX
# define PY_OFF_T_MIN PY_SSIZE_T_MIN
# define PY_OFF_T_COMPAT Py_ssize_t
# define PY_PRIdOFF "zd"
#elif (HAVE_LONG_LONG && SIZEOF_OFF_T == SIZEOF_LONG_LONG)
# define PyLong_AsOff_t PyLong_AsLongLong
# define PyLong_FromOff_t PyLong_FromLongLong
# define PY_OFF_T_MAX PY_LLONG_MAX
# define PY_OFF_T_MIN PY_LLONG_MIN
# define PY_OFF_T_COMPAT PY_LONG_LONG
# define PY_PRIdOFF "lld"
#elif (SIZEOF_OFF_T == SIZEOF_LONG)
# define PyLong_AsOff_t PyLong_AsLong
# define PyLong_FromOff_t PyLong_FromLong
# define PY_OFF_T_MAX LONG_MAX
# define PY_OFF_T_MIN LONG_MIN
# define PY_OFF_T_COMPAT long
# define PY_PRIdOFF "ld"
#else
# error off_t does not match either size_t, long, or long long!
#endif
#endif
My patch fixes FileIO.seek(), but no BufferedReader & friends. So seek() works on open("/proc/self/mem", "rb", 0), but not on open("/proc/self/mem", "rb").
I didn't notice immediatly because I'm using an unbuffered file in my python-pytrace project.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:19 | admin | set | github: 56754 |
2020-11-17 21:36:19 | neologix | set | nosy:
- neologix
|
2020-11-17 19:07:45 | iritkatriel | set | versions:
+ Python 3.9, Python 3.10, - Python 3.4 |
2013-06-06 20:32:25 | vstinner | set | messages:
+ msg190729 |
2013-06-06 20:29:35 | vstinner | set | files:
+ lseek_unsigned_test.py
messages:
+ msg190728 |
2013-06-06 20:27:58 | vstinner | set | messages:
+ msg190727 |
2013-06-06 20:18:23 | vstinner | set | files:
- unnamed |
2013-06-06 20:18:21 | vstinner | set | files:
- unnamed |
2013-06-06 20:18:20 | vstinner | set | files:
- unnamed |
2013-06-06 20:17:55 | vstinner | set | title: Incorrect handling of return codes in the posix_lseek function in posixmodule.c -> os.lseek() and FileIO.seek() does not support offset larger than 2^63-1 |
2013-06-06 20:17:32 | vstinner | set | messages:
+ msg190725 versions:
+ Python 3.4, - Python 2.7, Python 3.2, Python 3.3 |
2013-06-06 20:16:32 | vstinner | set | files:
+ lseek_unsigned.patch
messages:
+ msg190724 |
2011-07-23 01:16:31 | Kuberan.Naganathan | set | files:
+ unnamed
messages:
+ msg140923 |
2011-07-22 20:35:06 | neologix | set | files:
+ lseek_negative.diff keywords:
+ patch messages:
+ msg140904
|
2011-07-21 11:07:23 | Kuberan.Naganathan | set | messages:
+ msg140807 |
2011-07-20 08:18:45 | vstinner | set | messages:
+ msg140723 |
2011-07-20 06:46:46 | neologix | set | messages:
+ msg140721 |
2011-07-20 02:07:30 | Kuberan.Naganathan | set | messages:
+ msg140715 |
2011-07-18 17:15:50 | neologix | set | messages:
+ msg140605 |
2011-07-13 18:05:58 | jcea | set | nosy:
+ jcea
|
2011-07-13 16:58:55 | Kuberan.Naganathan | set | files:
+ unnamed
messages:
+ msg140289 |
2011-07-13 15:31:51 | pitrou | set | messages:
+ msg140272 |
2011-07-13 15:24:29 | Kuberan.Naganathan | set | files:
+ unnamed
messages:
+ msg140270 |
2011-07-13 14:45:13 | vstinner | set | messages:
+ msg140261 |
2011-07-13 14:21:16 | neologix | set | messages:
+ msg140253 |
2011-07-13 13:08:28 | vstinner | set | messages:
+ msg140251 |
2011-07-13 12:09:55 | pitrou | set | nosy:
+ neologix, pitrou
messages:
+ msg140250 versions:
+ Python 3.2, Python 3.3 |
2011-07-13 12:05:56 | pitrou | set | nosy:
+ vstinner
|
2011-07-13 03:50:43 | Kuberan.Naganathan | set | messages:
+ msg140223 |
2011-07-13 03:29:18 | Kuberan.Naganathan | create | |