Text archives Help
- From: Rocky Rhodes <rhodes@sgi.com>
- To: "'James Bigler'" <bigler@cs.utah.edu>
- Cc: manta@sci.utah.edu, "'Brian Sumner'" <bls@sgi.com>
- Subject: RE: [MANTA] bad exit from bin/manta on Altix
- Date: Fri, 27 May 2005 08:47:32 -0700
My LD_LIBRARY_PATH was pointing at a different intel lib directory than the
one with the tools I was using to compile the manta code. When I put them
back in sync, this problem went away. I'm chocking this up to user error.
Rocky
>
-----Original Message-----
>
From: owner-manta@sci.utah.edu [mailto:owner-manta@sci.utah.edu] On Behalf
>
Of James Bigler
>
Sent: Wednesday, May 25, 2005 9:26 PM
>
Cc: manta@sci.utah.edu
>
Subject: Re: [MANTA] bad exit from bin/manta on Altix
>
>
Our thread code expects that exit_handler is called only once when the
>
main thread returns or exit() is called. It should only be processed
>
when the program terminates. Only heap memory should be access by this
>
point.
>
>
James
>
>
Rocky Rhodes wrote:
>
> On the Altix with icc, the exit_handler function is called after the
>
> pthread_exit(0) function is called in Thread_shutdown. This is caused
>
by a
>
> call to "atexit(exit_handler)" that is made when the Thread is
>
initialized.
>
> Exit_handler then calls Thread_shutdown again, where it fails with a
>
segv
>
> because it tries to reference a data structure that has already been
>
> deleted.
>
>
>
> When running the gcc compiled version, the exit_handler function is not
>
> called on thread exit, but only when the entire program exits.
>
>
>
> I'm on my way out, but I'll research the "expected" behavior of atexit()
>
in
>
> a pthread program tomorrow morning if nobody tells me what it is
>
supposed to
>
> do before I get to it.
>
>
>
> Rocky
>
>
>
>
>
>>-----Original Message-----
>
>>From: owner-manta@sci.utah.edu [mailto:owner-manta@sci.utah.edu] On
>
Behalf
>
>>Of Rocky Rhodes
>
>>Sent: Wednesday, May 25, 2005 5:09 PM
>
>>To: 'James Bigler'
>
>>Cc: manta@sci.utah.edu
>
>>Subject: RE: [MANTA] bad exit from bin/manta on Altix
>
>>
>
>>Yep. Works fine compiled with gcc/g++ (although a bit slower).
>
>>
>
>> Rocky
>
>>
>
>>
>
>>>-----Original Message-----
>
>>>From: owner-manta@sci.utah.edu [mailto:owner-manta@sci.utah.edu] On
>
>>
>
>>Behalf
>
>>
>
>>>Of James Bigler
>
>>>Sent: Wednesday, May 25, 2005 3:30 PM
>
>>>Cc: manta@sci.utah.edu
>
>>>Subject: Re: [MANTA] bad exit from bin/manta on Altix
>
>>>
>
>>>Can you build manta with GCC instead of ICC, since this is an ICC
>
>>>specific library?
>
>>>
>
>>>James
>
>>>
>
>>>Rocky Rhodes wrote:
>
>>>
>
>>>>Ok. Got me there. .../intel-cc/8.1.030/lib/libipr.so.6 is the
>
>>
>
>>culprit.
>
>>
>
>>>> Rocky
>
>>>>
>
>>>>
>
>>>>
>
>>>>>-----Original Message-----
>
>>>>>From: owner-manta@sci.utah.edu [mailto:owner-manta@sci.utah.edu] On
>
>>>
>
>>>Behalf
>
>>>
>
>>>>>Of James Bigler
>
>>>>>Sent: Wednesday, May 25, 2005 3:06 PM
>
>>>>>Cc: manta@sci.utah.edu
>
>>>>>Subject: Re: [MANTA] bad exit from bin/manta on Altix
>
>>>>>
>
>>>>>Doing a google on those files indicates that libffio.so is a Fortran
>
>>>>>library commonly found on SGI machines. libeag_ffio.so also appears
>
>>
>
>>to
>
>>
>
>>>>>be an SGI oriented library.
>
>>>>>
>
>>>>>Try this for me:
>
>>>>>
>
>>>>>ldd bin/manta | awk '{print $3}' | \
>
>>>>>xargs --max-args=1 grep -l "Incorrect Phase"
>
>>>>>
>
>>>>>James
>
>>>>>
>
>>>>>Rocky Rhodes wrote:
>
>>>>>
>
>>>>>
>
>>>>>>It still fails the same way on exit.
>
>>>>>>
>
>>>>>>Another clue (red herring?) is that the "Incorrect Phase" error
>
>>
>
>>message
>
>>
>
>>>>>is
>
>>>>>
>
>>>>>
>
>>>>>>coming from either libffio.so or libeag_ffio.so on the Altix. These
>
>>>
>
>>>are
>
>>>
>
>>>>>the
>
>>>>>
>
>>>>>
>
>>>>>>only two .so files in /usr/lib that contain this string.
>
>>>>>>
>
>>>>>> Rocky
>
>>>>>>
>
>>>>>>
>
>>>>>>
>
>>>>>>
>
>>>>>>>-----Original Message-----
>
>>>>>>>From: owner-manta@sci.utah.edu [mailto:owner-manta@sci.utah.edu] On
>
>>>>>
>
>>>>>Behalf
>
>>>>>
>
>>>>>
>
>>>>>>>Of James Bigler
>
>>>>>>>Sent: Wednesday, May 25, 2005 11:08 AM
>
>>>>>>>Cc: manta@sci.utah.edu
>
>>>>>>>Subject: Re: [MANTA] bad exit from bin/manta on Altix
>
>>>>>>>
>
>>>>>>>Rocky,
>
>>>>>>>
>
>>>>>>>Can you try something for me? The only thread code that is
>
specific
>
>>>
>
>>>to
>
>>>
>
>>>>>>>the altix is the barrier code. In Thread_pthread.cc there are a
>
few
>
>>>>>>>places that use __ia64__. Could you replace these with
>
>>
>
>>__ia64_noway__
>
>>
>
>>>>>>>to see if you still have problems (I want to not compile the ia64
>
>>>>>>>specific code here)?
>
>>>>>>>
>
>>>>>>>Thanks
>
>>>>>>>James
>
>>>>>>>
>
>>>>>>>
>
>>>>>>>
>
>>>>>>>
>
>>>>>>>>>>I had tried older versions of the main trunk and didn't have any
>
>>>
>
>>>luck
>
>>>
>
>>>>>>>>>>isolating a change. I tried versions back to 308 that all
>
failed
>
>>>>>>>>>
>
>>>>>>>>>similarly,
>
>>>>>>>>>
>
>>>>>>>>>
>
>>>>>>>>>
>
>>>>>>>>>
>
>>>>>>>>>>and version 300 doesn't build for me. It has probably always
>
>>
>
>>been
>
>>
>
>>>>>this
>
>>>>>
>
>>>>>
>
>>>>>>>>>way
>
>>>>>>>>>
>
>>>>>>>>>
>
>>>>>>>>>
>
>>>>>>>>>
>
>>>>>>>>>>and I just had this environment variable set when I tried it
>
>>>
>
>>>before.
>
>>>
>
>>>>>>>>>> Rocky
>
>>>>>>>>>>
>
>>>>>>>>>>
>
>>>>>>>>>>
>
>>>>>>>>>>
>
>>>>>>>>>>
>
>>>>>>>>>>
>
>>>>>>>>>>>-----Original Message-----
>
>>>>>>>>>>>From: owner-manta@sci.utah.edu [mailto:owner-
>
manta@sci.utah.edu]
>
>>>
>
>>>On
>
>>>
>
>>>>>>>>>Behalf
>
>>>>>>>>>
>
>>>>>>>>>
>
>>>>>>>>>
>
>>>>>>>>>
>
>>>>>>>>>>>Of James Bigler
>
>>>>>>>>>>>Sent: Tuesday, May 24, 2005 7:59 PM
>
>>>>>>>>>>>Cc: manta@sci.utah.edu
>
>>>>>>>>>>>Subject: Re: [MANTA] bad exit from bin/manta on Altix
>
>>>>>>>>>>>
>
>>>>>>>>>>>You could try checking out an older version and see if it
>
>>
>
>>happens.
>
>>
>
>>>>>>>>>>>The Thread_pthread.cc file is pretty hairy. The altix is the
>
>>
>
>>only
>
>>
>
>>>>>>>>>>>machine we've had complaints about, though. Should this be
>
>>>
>
>>>showing
>
>>>
>
>>>>>up
>
>>>>>
>
>>>>>
>
>>>>>>>>>>>in other modern distributions? What do the SGI docs say about
>
>>>
>
>>>that?
>
>>>
>
>>>>>>>>>>>James
>
>>>>>>>>>>>
>
>>>>>>>>>>>Rocky Rhodes wrote:
>
>>>>>>>>>>>
>
>>>>>>>>>>>
>
>>>>>>>>>>>
>
>>>>>>>>>>>
>
>>>>>>>>>>>
>
>>>>>>>>>>>>If I run "bin/manta -bench 10 10 -imagedisplay null -np 2" on
>
>>
>
>>an
>
>>
>
>>>>>>>Altix,
>
>>>>>>>
>
>>>>>>>
>
>>>>>>>
>
>>>>>>>>>>>>the program exits in a bad way, complaining of "ERROR:
>
>>
>
>>Incorrect
>
>>
>
>>>>>>>Phase"
>
>>>>>>>
>
>>>>>>>
>
>>>>>>>
>
>>>>>>>>>>>>and then telling me that "Thread 'idle or main'" got a
>
SIGSEGV.
>
>>>
>
>>>If
>
>>>
>
>>>>>I
>
>>>>>
>
>>>>>
>
>>>>>>>>>>>>run this again with the LD_ASSUME_KERNEL environment variable
>
>>
>
>>set
>
>>
>
>>>>>to
>
>>>>>
>
>>>>>
>
>>>>>>>>>>>>"2.4.19" it exits cleanly.
>
>>>>>>>>>>>>
>
>>>>>>>>>>>>
>
>>>>>>>>>>>>
>
>>>>>>>>>>>>SGI's documentation says that this behavior is indicative of
>
an
>
>>>>>>>>>>>>application "which depends on behaviors in which the
>
>>
>
>>LinuxThreads
>
>>
>
>>>>>>>>>>>>implementation deviates from the POSIX standard". The
>
>>>>>>>
>
>>>>>>>LD_ASSUME_KERNEL
>
>>>>>>>
>
>>>>>>>
>
>>>>>>>
>
>>>>>>>>>>>>environment variable forces the application to use the old
>
>>>>>>>
>
>>>>>>>LinuxThreads
>
>>>>>>>
>
>>>>>>>
>
>>>>>>>
>
>>>>>>>>>>>>implementation rather than NPTL (Native POSIX Thread Library).
>
>>
>
>>I
>
>>
>
>>>>>>>think
>
>>>>>>>
>
>>>>>>>
>
>>>>>>>
>
>>>>>>>>>>>>the new thread package was included with ProPack 3.0 on the
>
>>>
>
>>>Altix.
>
>>>
>
>>>>>>>You
>
>>>>>>>
>
>>>>>>>
>
>>>>>>>
>
>>>>>>>>>>>>might not see this problem on your Altix if it is running an
>
>>>>>
>
>>>>>earlier
>
>>>>>
>
>>>>>
>
>>>>>>>>>>>>version of the system software.
>
>>>>>>>>>>>>
>
>>>>>>>>>>>>
>
>>>>>>>>>>>>
>
>>>>>>>>>>>>I thought I had tried this earlier and didn't have this
>
>>
>
>>problem,
>
>>
>
>>>>>but
>
>>>>>
>
>>>>>
>
>>>>>>>as
>
>>>>>>>
>
>>>>>>>
>
>>>>>>>
>
>>>>>>>>>>>>it is just an environment variable, now I'm wondering if this
>
>>
>
>>has
>
>>
>
>>>>>>>>>always
>
>>>>>>>>>
>
>>>>>>>>>
>
>>>>>>>>>
>
>>>>>>>>>
>
>>>>>>>>>>>>been broken this way. Is anyone aware of any changes in the
>
>>>>>
>
>>>>>pthread
>
>>>>>
>
>>>>>
>
>>>>>>>>>>>>code made over the last week or so that may have changed this
>
>>>>>>>
>
>>>>>>>behavior?
>
>>>>>>>
>
>>>>>>>
>
>>>>>>>
>
>>>>>>>>>>>>Does anyone feel more qualified than I do about mucking around
>
>>
>
>>in
>
>>
>
>>>>>>>this
>
>>>>>>>
>
>>>>>>>
>
>>>>>>>
>
>>>>>>>>>>>>code and trying to understand what goes on when the program
>
>>>
>
>>>exits?
>
>>>
>
>>>>>>>>>>>>
>
>>>>>>>>>>>>
>
>>>>>>>>>>>> Rocky
>
>>>>>>>>>>>>
>
>>>>>>>>>>>>
>
>>>>>>>>>>>>
- Re: [MANTA] bad exit from bin/manta on Altix, (continued)
Archive powered by MHonArc 2.6.16.