On the Altix with icc, the exit_handler function is called after the
pthread_exit(0) function is called in Thread_shutdown. This is caused by a
call to "atexit(exit_handler)" that is made when the Thread is initialized.
Exit_handler then calls Thread_shutdown again, where it fails with a segv
because it tries to reference a data structure that has already been
deleted.
When running the gcc compiled version, the exit_handler function is not
called on thread exit, but only when the entire program exits.
I'm on my way out, but I'll research the "expected" behavior of atexit() in
a pthread program tomorrow morning if nobody tells me what it is supposed to
do before I get to it.
Rocky
-----Original Message-----
From: owner-manta@sci.utah.edu [mailto:owner-manta@sci.utah.edu] On Behalf
Of Rocky Rhodes
Sent: Wednesday, May 25, 2005 5:09 PM
To: 'James Bigler'
Cc: manta@sci.utah.edu
Subject: RE: [MANTA] bad exit from bin/manta on Altix
Yep. Works fine compiled with gcc/g++ (although a bit slower).
Rocky
-----Original Message-----
From: owner-manta@sci.utah.edu [mailto:owner-manta@sci.utah.edu] On
Behalf
Of James Bigler
Sent: Wednesday, May 25, 2005 3:30 PM
Cc: manta@sci.utah.edu
Subject: Re: [MANTA] bad exit from bin/manta on Altix
Can you build manta with GCC instead of ICC, since this is an ICC
specific library?
James
Rocky Rhodes wrote:
Ok. Got me there. .../intel-cc/8.1.030/lib/libipr.so.6 is the
culprit.
Rocky
-----Original Message-----
From: owner-manta@sci.utah.edu [mailto:owner-manta@sci.utah.edu] On
Behalf
Of James Bigler
Sent: Wednesday, May 25, 2005 3:06 PM
Cc: manta@sci.utah.edu
Subject: Re: [MANTA] bad exit from bin/manta on Altix
Doing a google on those files indicates that libffio.so is a Fortran
library commonly found on SGI machines. libeag_ffio.so also appears
to
be an SGI oriented library.
Try this for me:
ldd bin/manta | awk '{print $3}' | \
xargs --max-args=1 grep -l "Incorrect Phase"
James
Rocky Rhodes wrote:
It still fails the same way on exit.
Another clue (red herring?) is that the "Incorrect Phase" error
message
is
coming from either libffio.so or libeag_ffio.so on the Altix. These
are
the
only two .so files in /usr/lib that contain this string.
Rocky
-----Original Message-----
From: owner-manta@sci.utah.edu [mailto:owner-manta@sci.utah.edu] On
Behalf
Of James Bigler
Sent: Wednesday, May 25, 2005 11:08 AM
Cc: manta@sci.utah.edu
Subject: Re: [MANTA] bad exit from bin/manta on Altix
Rocky,
Can you try something for me? The only thread code that is specific
to
the altix is the barrier code. In Thread_pthread.cc there are a few
places that use __ia64__. Could you replace these with
__ia64_noway__
to see if you still have problems (I want to not compile the ia64
specific code here)?
Thanks
James
I had tried older versions of the main trunk and didn't have any
luck
isolating a change. I tried versions back to 308 that all failed
similarly,
and version 300 doesn't build for me. It has probably always
been
this
way
and I just had this environment variable set when I tried it
before.
Rocky
-----Original Message-----
From: owner-manta@sci.utah.edu [mailto:owner-manta@sci.utah.edu]
On
Behalf
Of James Bigler
Sent: Tuesday, May 24, 2005 7:59 PM
Cc: manta@sci.utah.edu
Subject: Re: [MANTA] bad exit from bin/manta on Altix
You could try checking out an older version and see if it
happens.
The Thread_pthread.cc file is pretty hairy. The altix is the
only
machine we've had complaints about, though. Should this be
showing
up
in other modern distributions? What do the SGI docs say about
that?
James
Rocky Rhodes wrote:
If I run "bin/manta -bench 10 10 -imagedisplay null -np 2" on
an
Altix,
the program exits in a bad way, complaining of "ERROR:
Incorrect
Phase"
and then telling me that "Thread 'idle or main'" got a SIGSEGV.
If
I
run this again with the LD_ASSUME_KERNEL environment variable
set
to
"2.4.19" it exits cleanly.
SGI's documentation says that this behavior is indicative of an
application "which depends on behaviors in which the
LinuxThreads
implementation deviates from the POSIX standard". The
LD_ASSUME_KERNEL
environment variable forces the application to use the old
LinuxThreads
implementation rather than NPTL (Native POSIX Thread Library).
I
think
the new thread package was included with ProPack 3.0 on the
Altix.
You
might not see this problem on your Altix if it is running an
earlier
version of the system software.
I thought I had tried this earlier and didn't have this
problem,
but
as
it is just an environment variable, now I'm wondering if this
has
always
been broken this way. Is anyone aware of any changes in the
pthread
code made over the last week or so that may have changed this
behavior?
Does anyone feel more qualified than I do about mucking around
in
this
code and trying to understand what goes on when the program
exits?
Rocky
Archive powered by MHonArc 2.6.16.