classification
Title: SIGBUS error on OpenBSD (sparc64)
Type: crash Stage: committed/rejected
Components: Tests Versions: Python 3.4, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Federico.Schwindt, haypo, neologix, nicm, python-dev, rpointel, trent
Priority: normal Keywords: needs review, patch

Created on 2011-05-25 20:20 by rpointel, last changed 2013-05-06 19:48 by neologix. This issue is now closed.

Files
File name Uploaded Description Edit
python2.7.1_sigbus_sparc64 rpointel, 2011-05-25 20:20 gdb output of sigbus error (backtrace)
patch-Modules_selectmodule_c rpointel, 2011-08-16 17:03
patch-Modules_selectmodule_c Federico.Schwindt, 2013-04-08 21:50
openbsd-kqueue-001.patch Federico.Schwindt, 2013-04-12 09:50
Messages (39)
msg136893 - (view) Author: Remi Pointel (rpointel) Date: 2011-05-25 20:20
Hello,

on OpenBSD (arch: sparc64), when I run the regress tests, I have a SIGBUS error during the test_kqueue.py:

(gdb) run /tmp/test_kqueue.py
Starting program: /usr/local/bin/python2.7 /tmp/test_kqueue.py
testPair (__main__.TestKQueue) ...
Program received signal SIGBUS, Bus error.
PyMember_GetOne (addr=0x21007b72c "", l=0x211d2e9c8) at Python/structmember.c:128
128             v = PyLong_FromLongLong(*(PY_LONG_LONG *)addr);

I compiled Python 2.7.1 with debug symbols to help you, attached file is the gdb output (backtrace).

Don't hesitate to ask me if you need information.

Thanks.
msg136898 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-05-25 20:54
It's an unaligned access:

addr=0x21007b72c

  case T_LONGLONG:
      v = PyLong_FromLongLong(*(PY_LONG_LONG *)addr);

sizeof(PY_LONG_LONG) == 8, but addr % 8 = 0x21007b72c % 8 == 4
msg136901 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-05-25 21:15
> Don't hesitate to ask me if you need information.

It looks like the crash occurs on r[0].data in testPair() of test_kqueue. Can you confirm this? Comment the line in test_kqueue.py to check if it works around the crash.

What is the size of intptr_t and "long long" types on your host? You can find this information in config.log if you compiled Python manually. Otherwise, use a dummy C script like:
--------------
int main() { return sizeof(long long); }
--------------
and
--------------
#include <stdint.h>
int main() { return sizeof(intptr_t); }
--------------
Compile the script, run it, and read the exitcode (e.g.g echo $? using bash).

Can you try to get the definition of the kevent structure? It should be in /usr/include/sys/event.h. And/or the offset of each field using the following C script (not tested):
-------------
#include <stdio.h>
#include <sys/event.h>
#include <stddef.h> /* For offsetof */

#ifndef offsetof
#define offsetof(type, member) ( (int) & ((type*)0) -> member )
#endif

struct kevent ev;

#define DUMP(field) printf("offset of " #field ": %u\n", offsetof(ev, field))

int main()
{
   DUMP(ident);
   DUMP(filter);
   DUMP(flags);
   DUMP(fflags);
   DUMP(data);
   DUMP(udata);
   return 0;
}
-------------
msg136904 - (view) Author: Remi Pointel (rpointel) Date: 2011-05-25 21:46
> It looks like the crash occurs on r[0].data in testPair() of test_kqueue. Can you confirm this? Comment the line in test_kqueue.py to check if it works around the crash.

Yes, it does not crash if I comment this line.

> What is the size of intptr_t and "long long" types on your host?

size of intptr_t and "long long" is : 8.

> Can you try to get the definition of the kevent structure? It should be in /usr/include/sys/event.h. And/or the offset of each field using the following C script (not tested).

I have replaced this line of your program (because it failed to build):
#define DUMP(field) printf("offset of " #field ": %u\n", offsetof(ev, field))
by
#define DUMP(field) printf("offset of " #field ": %u\n", offsetof(struct kevent, field))

and I have:
                                                                                                                                                                                                       
offset of ident: 0
offset of filter: 4
offset of flags: 6
offset of fflags: 8
offset of data: 12
offset of udata: 16

Thanks a lot for your help,

Remi.
msg136969 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-05-26 14:54
OpenBSD's struct kevent definition looks fishy:
http://www.openbsd.org/cgi-bin/cvsweb/src/sys/sys/event.h?rev=1.15;content-type=text%2Fplain

struct kevent {
	u_int		ident;		/* identifier for this event */
	short		filter;		/* filter for event */
	u_short		flags;
	u_int		fflags;
	int		data;
	void		*udata;		/* opaque user data identifier */
};

ident and data should be uintptr_t/intptr_t, no wonder this breaks on 64-bit. See for example FreeBSD's header: http://gitorious.org/freebsd/freebsd/blobs/4369b8d3fe19ee9540bfda1bb5f3db6208ff4c91/sys/sys/event.h
msg136979 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-05-26 16:29
Le jeudi 26 mai 2011 à 14:54 +0000, Charles-François Natali a écrit :
> Charles-François Natali <neologix@free.fr> added the comment:
> 
> OpenBSD's struct kevent definition looks fishy:
> http://www.openbsd.org/cgi-bin/cvsweb/src/sys/sys/event.h?rev=1.15;content-type=text%2Fplain
> 
> struct kevent {
> 	u_int		ident;		/* identifier for this event */
> 	short		filter;		/* filter for event */
> 	u_short		flags;
> 	u_int		fflags;
> 	int		data;
> 	void		*udata;		/* opaque user data identifier */
> };
> 
> ident and data should be uintptr_t/intptr_t

ident and data are not pointers, I suppose that T_UINT and T_INT should
be used instead of T_UINTPTR_T and  T_INTPTR_T. intptr_t is bigger than
an int on a 64 bits system (void*: 64 bits, int: 32 bits).
msg136988 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-05-26 17:47
> ident and data are not pointers,

That's not the point.
struct kevent declaration should be the following:

struct kevent {
        uintptr_t ident;        /* identifier for this event */
        short     filter;       /* filter for event */
        u_short   flags;        /* action flags for kqueue */
        u_int     fflags;       /* filter flag value */
        intptr_t  data;         /* filter data value */
        void      *udata;       /* opaque user data identifier */
};

If this doesn't match, you'll unpack garbage when extracting members,
and since it's an unaligned access, you can even get a SIGBUS (looks
like sparc64 doesn't allow unaligned access).

> I suppose that T_UINT and T_INT should be used instead of T_UINTPTR_T and  T_INTPTR_T.

Yes, we could do that on OpenBSD, but that's definitely an OpenBSD bug.
msg137021 - (view) Author: Nicholas Marriott (nicm) Date: 2011-05-27 01:58
Hi

This is not an OpenBSD bug.

kqueue is not standardized. There is no reason for the ident member to be uintptr_t over u_int - either is valid, and so far I don't see any reason for us to try and change it.
msg137022 - (view) Author: Nicholas Marriott (nicm) Date: 2011-05-27 02:08
To follow up a little - I'm afraid that you should not depend on the internal layout of structures like this. We clearly document the kevent structure in kevent(2), and of course beyond that it is of course dependent on C padding rules.

It isn't a bug that we have different layout from FreeBSD.

If you can make a case for us to change the layout then maybe...
msg137032 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-05-27 07:44
Hello Nicholas,

> kqueue is not standardized.

You're probably right, but depending on the version of your manpages, the definition changes:

http://www.openbsd.org/cgi-bin/man.cgi?query=kevent&apropos=0&sektion=0&manpath=OpenBSD+3.8&arch=i386&format=html defines

     struct kevent {
             uintptr_t ident;        /* identifier for this event */
             short     filter;       /* filter for event */
             u_short   flags;        /* action flags for kqueue */
             u_int     fflags;       /* filter flag value */
             intptr_t  data;         /* filter data value */
             void      *udata;       /* opaque user data identifier */
     };

Now, http://www.openbsd.org/cgi-bin/man.cgi?query=kevent&apropos=0&sektion=0&manpath=OpenBSD+Current&arch=i386&format=html defines

     struct kevent {
             u_int   ident;          /* identifier for this event */
             short   filter;         /* filter for event */
             u_short flags;          /* action flags for kqueue */
             u_int   fflags;         /* filter flag value */
             int     data;           /* filter data value */
             void    *udata;         /* opaque user data identifier */
     };

The first page hit when searching for "openbsd struct kevent" refers to the first version...

> If you can make a case for us to change the layout then maybe... 

Well, I don't know if you can make such a backward-incompatible change, but for what it's worth, FreeBSD, NetBSD and OS-X all use uintptr_t.

But for now I'll gues we'll just add a specific case for OpenBSD.
msg137033 - (view) Author: Nicholas Marriott (nicm) Date: 2011-05-27 07:58
Hi

The second one is correct - OpenBSD -current has this in event.h:

struct kevent {
        u_int           ident;          /* identifier for this event */
        short           filter;         /* filter for event */
        u_short         flags;
        u_int           fflags;
        int             data;
        void            *udata;         /* opaque user data identifier */
};

It's been like that since r1.1 so probably the 3.8 man page was wrong.

I don't know that backwards compatibility would be the main concern here, more what is the justification for changing. It does make sense to have ident wide enough to store a pointer so it would be better as a long, but making filter and flags into uint32_t seems unnecessary and I think udata is fine as a void*.
msg137034 - (view) Author: Nicholas Marriott (nicm) Date: 2011-05-27 08:05
Not that I'm unsympathetic but this is really only a concern if you depend on the internal structure layout and I think if you do that, you need to take account of differences between platforms. We don't guarantee we aren't going to change the struct between versions and I doubt FreeBSD or NetBSD or OS X would make that guarantee either :-).

I'll see if we can change it but no promises and even if we do it'll be for later versions of OpenBSD anyway...
msg137036 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-05-27 08:22
Concerning the differences between platforms, as noted, FreeBSD, NetBSD and OS-X are all consistent and I don't think it'll change tomorrow, so for now it's not a problem. Arbitrarily changing such structures definition - event though they're not standardized - is certainly not user-friendly, people will start complaining if their code breaks between two versions.

Nicholas, could you work on a patch so that kevent works on OpenBSD 64-bit? The relevant code is in Modules/selectmodule.c
msg137039 - (view) Author: Nicholas Marriott (nicm) Date: 2011-05-27 08:51
Well "they do it that way" is not a justification that necessarily works for OpenBSD :-).

I'll see if I can come up with a diff to fix this in Python. Not this weekend though, maybe next week. Unless Remi do you want to have a go?
msg137040 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-05-27 08:52
If there are only two versions of the structure on all operating systems, we 
can add a check in configure (e.g. test the size of the ident attribute, =int 
or >=void*?) to define a macro (e.g. HAVE_OPENBSD_KEVENT_STRUCT). You might be 
able to change the type of the fields in the Python structure at run time (in 
select module init), but I prefer to do it at compile time! Or you can add 
*checks* at runtime (at select init) on the size of the fields.
msg137465 - (view) Author: Federico Schwindt (Federico.Schwindt) Date: 2011-06-01 23:32
Adding to this, the kqueue code (and test) heavily depends on the size of these members. kqueue_event_richcompare() uses a Py_intptr_t to store the result of substracting T_UINTs which is obviusly wrong on platforms where Py_intptr and T_UINTs are not the same (which is neither related to this bug nor OpenBSD specific), the test uses sys.maxint, etc.

It'd be nice if someone cleans the code to fix all these problems and adds some more tests to cover all the members in tp_richcompare.
msg138243 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-06-13 13:14
> If there are only two versions of the structure on all operating
> systems, we can add a check in configure

Isn't it a bit overkill?
OpenBSD is the only OS to define struct kevent that way, adding a 
#ifdef __OpenBSD__

check would probably be enough, no?
msg138825 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-06-22 17:02
Here's a patch.
msg138834 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-06-22 20:37
Why did you remove your patch?
msg140013 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-07-08 01:17
@neologix: New try. Why did you remove your patch?
msg140015 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-07-08 07:06
> @neologix: New try. Why did you remove your patch?
>

Sorry, I completely forgot about this issue.
Because it was incorrect (it would have fixed the SIGBUS, but would
have produced incorrect results).
IIRC, the problem is that those members are used in several places, so
it requires a little bit more work to produce a clean patch.
I'll try to provide a new one in a couple days, but since I don't have
any OpenBSD box, I won't be able to test it myself.
msg141192 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-07-26 23:01
select.kevent(bignum, 1, 2, 3, sys.maxsize, bignum) raises a OverflowError('signed integer is greater than maximum') on a 64 bits system. select.kevent() constructor parses the 4th argument using "i" (an int): sys.maxsize doesn't fit in a C int on a 64 bits system.

kevent() constructor parses the 4th argument using "i", but it pass a pointer to a void* value (e->udata). It would be better to use a temporary C int variable, and then write the int into udata using the right cast.
msg141194 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-07-26 23:05
OpenBSD has a patch for this issue:
http://www.openbsd.org/cgi-bin/cvsweb/ports/lang/python/2.7/patches/patch-Modules_selectmodule_c
msg141195 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-07-26 23:15
patch-Modules_selectmodule_c is not enough to fix kqueue_event_init(): it doesn't catch overflow error on the ident attribute.
msg141211 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-07-27 07:24
> patch-Modules_selectmodule_c is not enough to fix kqueue_event_init(): it doesn't catch overflow error on the ident attribute.

This patch is not correct.

Furthermore, it's another occurrence of something I don't understand
with distributions: why apply a patch to their tree and not submit it
upstream?
msg141212 - (view) Author: Federico Schwindt (Federico.Schwindt) Date: 2011-07-27 08:18
I wrote the patch. While the patch fixes most of the issues I'm not entirely happy with it and that's the reason I have not submitted it.
msg141213 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-07-27 08:54
> I wrote the patch. While the patch fixes most of the issues I'm not entirely happy with it and that's the reason I have not submitted it.

Could you submit it here so that we can help you get it in shape for inclusion?
That way we'll all benefit from the work you've already done.
msg142206 - (view) Author: Remi Pointel (rpointel) Date: 2011-08-16 17:03
Hi,
this is the patch in OpenBSD.

Source: http://www.openbsd.org/cgi-bin/cvsweb/ports/lang/python/2.7/patches/patch-Modules_selectmodule_c

Could be usefull to work together and advance on this problem.
Thanks a lot.

Remi.
msg186351 - (view) Author: Federico Schwindt (Federico.Schwindt) Date: 2013-04-08 21:50
New version hopefully good and ready for inclusion.

Please note that the tests are still broken and need to be addressed (bignum and sys.maxsize passed to ident and data respectively).

Thanks.
msg186378 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-04-09 05:47
> New version hopefully good and ready for inclusion.

Looks good to me.
Since I assume you tested your patch on OpenBSD, to me it's ready for commit.
I won't be able to do it myself before two weeks though, so if someone
beats me to it, go ahead!

> Please note that the tests are still broken and need to be addressed (bignum and sys.maxsize passed to ident and data respectively).

If you could fill a separate issue for that, it would be great.

BTW, there are other OpenBSD-specific issues on the report, and we
don't have that many contributors (AFAICT noone): so if you can have a
look at them, that would be great (especially since our OpenBSD
buildbots have all been down for some time now).
msg186388 - (view) Author: Federico Schwindt (Federico.Schwindt) Date: 2013-04-09 07:15
>> Please note that the tests are still broken and need to be addressed (bignum and sys.maxsize passed to ident and data respectively).
>
> If you could fill a separate issue for that, it would be great.

I will but once this is committed. It'd make things easier.

> BTW, there are other OpenBSD-specific issues on the report, and we
> don't have that many contributors (AFAICT noone): so if you can have a
> look at them, that would be great (especially since our OpenBSD
> buildbots have all been down for some time now).

On this report or you mean in general? I don't mind looking if time permits.

Why are the OpenBSD buildbots down? Can we do anything about it?
Having a buildbot slave up will definitely help here.
msg186391 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-04-09 10:04
>> If you could fill a separate issue for that, it would be great.
>
> I will but once this is committed. It'd make things easier.

No problem.

>> BTW, there are other OpenBSD-specific issues on the report, and we
>> don't have that many contributors (AFAICT noone): so if you can have a
>> look at them, that would be great (especially since our OpenBSD
>> buildbots have all been down for some time now).
>
> On this report or you mean in general?

typo: s/on the report/on the tracker.

> I don't mind looking if time permits.

That would be great (I try to fix NetBSD/OpenBSD issues myself, but since I don't have any installed machines with those OSes it's obviously much more complicated).

> Why are the OpenBSD buildbots down? Can we do anything about it?
> Having a buildbot slave up will definitely help here.

All the OpenBSD buildbots we have are provided by SnakeByte, and they're all offline: http://buildbot.python.org/all/builders
Maybe Trent has more info.
msg186393 - (view) Author: Federico Schwindt (Federico.Schwindt) Date: 2013-04-09 10:25
This patch was made against 2.7.4. I've checked 3.3.1 and it obviously doesn't work as PyInt* are gone.

I'll update the patch later today so it can be used everywhere.
msg186600 - (view) Author: Trent Nelson (trent) * (Python committer) Date: 2013-04-11 23:08
On Tue, Apr 09, 2013 at 03:04:42AM -0700, Charles-Fran?ois Natali wrote:
> > Why are the OpenBSD buildbots down? Can we do anything about it?
> > Having a buildbot slave up will definitely help here.
> 
> All the OpenBSD buildbots we have are provided by SnakeByte, and
> they're all offline: http://buildbot.python.org/all/builders Maybe
> Trent has more info.

    Oh dear, hadn't noticed that.  Should be fixed now.

        Trent.
msg186614 - (view) Author: Federico Schwindt (Federico.Schwindt) Date: 2013-04-12 09:50
Here's a slightly modified version of the patch that should work with python 2 and 3.
msg186725 - (view) Author: Federico Schwindt (Federico.Schwindt) Date: 2013-04-13 14:49
Trent, would be possible to update the OpenBSD slaves to 5.3 (or -current)? 
There has been too many changes since 5.1 that affect python (for example the threads implementation).
msg186864 - (view) Author: Trent Nelson (trent) * (Python committer) Date: 2013-04-13 22:55
Yeah those slaves are definitely due for an update.  It's on the list.
msg188571 - (view) Author: Roundup Robot (python-dev) Date: 2013-05-06 19:26
New changeset c6c2b216bd14 by Charles-Francois Natali in branch '2.7':
Issue #12181: select module: Fix struct kevent definition on OpenBSD 64-bit
http://hg.python.org/cpython/rev/c6c2b216bd14

New changeset f6c50b437de6 by Charles-Francois Natali in branch '3.3':
Issue #12181: select module: Fix struct kevent definition on OpenBSD 64-bit
http://hg.python.org/cpython/rev/f6c50b437de6

New changeset 557599a32821 by Charles-Francois Natali in branch 'default':
Issue #12181: select module: Fix struct kevent definition on OpenBSD 64-bit
http://hg.python.org/cpython/rev/557599a32821
msg188575 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-05-06 19:48
Sorry for the delay, it should be fixed now.

Federico, thanks for the patch!
History
Date User Action Args
2013-05-06 19:48:06neologixsetstatus: open -> closed
resolution: fixed
messages: + msg188575

stage: patch review -> committed/rejected
2013-05-06 19:26:50python-devsetnosy: + python-dev
messages: + msg188571
2013-04-13 22:55:01trentsetmessages: + msg186864
2013-04-13 14:49:59Federico.Schwindtsetmessages: + msg186725
2013-04-12 09:50:47Federico.Schwindtsetfiles: + openbsd-kqueue-001.patch

messages: + msg186614
2013-04-11 23:08:24trentsetmessages: + msg186600
2013-04-09 10:25:22Federico.Schwindtsetmessages: + msg186393
2013-04-09 10:04:42neologixsetnosy: + trent
messages: + msg186391
2013-04-09 07:15:58Federico.Schwindtsetmessages: + msg186388
2013-04-09 06:07:26neologixsetversions: + Python 3.3, Python 3.4
2013-04-09 05:47:31neologixsetmessages: + msg186378
2013-04-08 21:50:56Federico.Schwindtsetfiles: + patch-Modules_selectmodule_c

messages: + msg186351
2011-08-16 17:03:21rpointelsetfiles: + patch-Modules_selectmodule_c

messages: + msg142206
2011-07-27 08:54:28neologixsetmessages: + msg141213
2011-07-27 08:18:08Federico.Schwindtsetmessages: + msg141212
2011-07-27 07:24:34neologixsetmessages: + msg141211
2011-07-26 23:15:29hayposetmessages: + msg141195
2011-07-26 23:05:44hayposetmessages: + msg141194
2011-07-26 23:01:54hayposetmessages: + msg141192
2011-07-08 07:06:34neologixsetmessages: + msg140015
2011-07-08 01:17:00hayposetmessages: + msg140013
2011-06-22 20:37:31hayposetmessages: + msg138834
2011-06-22 17:32:04neologixsetfiles: - kevent_openbsd.diff
2011-06-22 17:02:01neologixsetkeywords: + patch, needs review
files: + kevent_openbsd.diff
messages: + msg138825

stage: patch review
2011-06-13 13:14:56neologixsetmessages: + msg138243
2011-06-01 23:32:07Federico.Schwindtsetnosy: + Federico.Schwindt
messages: + msg137465
2011-05-27 08:52:15hayposetmessages: + msg137040
2011-05-27 08:51:03nicmsetmessages: + msg137039
2011-05-27 08:22:35neologixsetmessages: + msg137036
2011-05-27 08:05:06nicmsetmessages: + msg137034
2011-05-27 07:58:52nicmsetmessages: + msg137033
2011-05-27 07:44:52neologixsetmessages: + msg137032
2011-05-27 02:08:47nicmsetmessages: + msg137022
2011-05-27 01:58:39nicmsetnosy: + nicm
messages: + msg137021
2011-05-26 17:47:43neologixsetmessages: + msg136988
2011-05-26 16:29:03hayposetmessages: + msg136979
2011-05-26 14:54:28neologixsetnosy: - mark.dickinson
messages: + msg136969
2011-05-25 21:46:34rpointelsetmessages: + msg136904
2011-05-25 21:15:06hayposetnosy: + haypo
messages: + msg136901
2011-05-25 20:55:31neologixsetnosy: + mark.dickinson
2011-05-25 20:54:40neologixsetnosy: + neologix
messages: + msg136898
2011-05-25 20:20:02rpointelcreate