Manta Interactive Ray Tracer Development Mailing List

Text archives Help


Re: [MANTA] bad exit from bin/manta on Altix


Chronological Thread 
  • From: James Bigler <bigler@cs.utah.edu>
  • Cc: manta@sci.utah.edu
  • Subject: Re: [MANTA] bad exit from bin/manta on Altix
  • Date: Wed, 25 May 2005 22:41:41 -0600

Hmm...looking at this some more there are some interesting things going on.

From what I understand main should not exit until all other threads have finished. The threading library tries to facilitate this by having the main thread block on a semaphore.

I ran into this problem when I wanted manta to "exit gracefully", meaning all the threads would in turn exit internally. If I use Thread::exitAll() it wouldn't have a problem. You can test this by pressing 'q' or ESC twice really fast (called fast quit in the code). This will force a call to Thread::exitAll(). Try this and see if you have this problem.

What I had to do for rtrt to exit gracefully properly (without segfaults), is to join all the worker threads to a single thread group. At the end of main() the main thread is joined to this group. main does not exit until all the worker threads have exited. Perhaps a similar model will help manta.

I'll have to dig up the emails to see what some of the other issues were.

James

James Bigler wrote:
Our thread code expects that exit_handler is called only once when the main thread returns or exit() is called. It should only be processed when the program terminates. Only heap memory should be access by this point.

James

Rocky Rhodes wrote:

On the Altix with icc, the exit_handler function is called after the
pthread_exit(0) function is called in Thread_shutdown. This is caused by a
call to "atexit(exit_handler)" that is made when the Thread is initialized.
Exit_handler then calls Thread_shutdown again, where it fails with a segv
because it tries to reference a data structure that has already been
deleted.

When running the gcc compiled version, the exit_handler function is not
called on thread exit, but only when the entire program exits.

I'm on my way out, but I'll research the "expected" behavior of atexit() in
a pthread program tomorrow morning if nobody tells me what it is supposed to
do before I get to it.

    Rocky


-----Original Message-----
From: owner-manta@sci.utah.edu [mailto:owner-manta@sci.utah.edu] On Behalf
Of Rocky Rhodes
Sent: Wednesday, May 25, 2005 5:09 PM
To: 'James Bigler'
Cc: manta@sci.utah.edu
Subject: RE: [MANTA] bad exit from bin/manta on Altix

Yep.  Works fine compiled with gcc/g++ (although a bit slower).

    Rocky


-----Original Message-----
From: owner-manta@sci.utah.edu [mailto:owner-manta@sci.utah.edu] On


Behalf

Of James Bigler
Sent: Wednesday, May 25, 2005 3:30 PM
Cc: manta@sci.utah.edu
Subject: Re: [MANTA] bad exit from bin/manta on Altix

Can you build manta with GCC instead of ICC, since this is an ICC
specific library?

James

Rocky Rhodes wrote:

Ok.  Got me there.  .../intel-cc/8.1.030/lib/libipr.so.6 is the


culprit.

    Rocky



-----Original Message-----
From: owner-manta@sci.utah.edu [mailto:owner-manta@sci.utah.edu] On


Behalf

Of James Bigler
Sent: Wednesday, May 25, 2005 3:06 PM
Cc: manta@sci.utah.edu
Subject: Re: [MANTA] bad exit from bin/manta on Altix

Doing a google on those files indicates that libffio.so is a Fortran
library commonly found on SGI machines.  libeag_ffio.so also appears


to

be an SGI oriented library.

Try this for me:

ldd bin/manta | awk '{print $3}' | \
xargs --max-args=1 grep -l "Incorrect Phase"

James

Rocky Rhodes wrote:


It still fails the same way on exit.

Another clue (red herring?) is that the "Incorrect Phase" error


message

is


coming from either libffio.so or libeag_ffio.so on the Altix.  These


are

the


only two .so files in /usr/lib that contain this string.

    Rocky




-----Original Message-----
From: owner-manta@sci.utah.edu [mailto:owner-manta@sci.utah.edu] On


Behalf


Of James Bigler
Sent: Wednesday, May 25, 2005 11:08 AM
Cc: manta@sci.utah.edu
Subject: Re: [MANTA] bad exit from bin/manta on Altix

Rocky,

Can you try something for me? The only thread code that is specific


to

the altix is the barrier code. In Thread_pthread.cc there are a few
places that use __ia64__.  Could you replace these with


__ia64_noway__

to see if you still have problems (I want to not compile the ia64
specific code here)?

Thanks
James




I had tried older versions of the main trunk and didn't have any


luck

isolating a change. I tried versions back to 308 that all failed


similarly,




and version 300 doesn't build for me.  It has probably always


been

this


way




and I just had this environment variable set when I tried it


before.

    Rocky






-----Original Message-----
From: owner-manta@sci.utah.edu [mailto:owner-manta@sci.utah.edu]


On

Behalf




Of James Bigler
Sent: Tuesday, May 24, 2005 7:59 PM
Cc: manta@sci.utah.edu
Subject: Re: [MANTA] bad exit from bin/manta on Altix

You could try checking out an older version and see if it


happens.

The Thread_pthread.cc file is pretty hairy.  The altix is the


only

machine we've had complaints about, though.  Should this be


showing

up


in other modern distributions?  What do the SGI docs say about


that?

James

Rocky Rhodes wrote:





If I run "bin/manta -bench 10 10 -imagedisplay null -np 2" on


an

Altix,



the program exits in a bad way, complaining of "ERROR:


Incorrect

Phase"



and then telling me that "Thread 'idle or main'" got a SIGSEGV.


If

I


run this again with the LD_ASSUME_KERNEL environment variable


set

to


"2.4.19" it exits cleanly.



SGI's documentation says that this behavior is indicative of an
application "which depends on behaviors in which the


LinuxThreads

implementation deviates from the POSIX standard".  The


LD_ASSUME_KERNEL



environment variable forces the application to use the old


LinuxThreads



implementation rather than NPTL (Native POSIX Thread Library).


I

think



the new thread package was included with ProPack 3.0 on the


Altix.

You



might not see this problem on your Altix if it is running an


earlier


version of the system software.



I thought I had tried this earlier and didn't have this


problem,

but


as



it is just an environment variable, now I'm wondering if this


has

always




been broken this way.  Is anyone aware of any changes in the


pthread


code made over the last week or so that may have changed this


behavior?



Does anyone feel more qualified than I do about mucking around


in

this



code and trying to understand what goes on when the program


exits?



       Rocky








Archive powered by MHonArc 2.6.16.

Top of page