Manta Interactive Ray Tracer Development Mailing List

Text archives Help


Re: [MANTA] loop unrolled?


Chronological Thread 
  • From: Hansong Zhang <hansong@sgi.com>
  • To: "Steven G. Parker" <sparker@cs.utah.edu>
  • Cc: "'manta@sci.utah.edu'" <manta@sci.utah.edu>
  • Subject: Re: [MANTA] loop unrolled?
  • Date: Wed, 21 Dec 2005 14:45:52 -0800

some more info on this:

Unrolling is part of the story.  Effectiveness of the inlining operation is the other part that the compiler doesn't seem to be doing well.
Exactly.  Unrolling is just the beginning.  Itanium has so many registers
there is no good reason for any loads or stores to any intermediate
vectors in the computation!  It is really hard for me how to understand
how the compiler is making such serious mistakes.  It almost seems as
if the compiler is making the transformations in the wrong order.
Steven G. Parker wrote:
It would be straightforward (although tedious) to create  specializations for 3 dimensional vectors/points of floats, which  might help.  You will also find the same pattern in the ColorSpace  class...
Steve

On Dec 21, 2005, at 12:35 PM, Hansong Zhang wrote:

Steven G. Parker wrote:

I have confirmed that gcc unrolls this loop (mac and x86) when the  - funroll-loops flag is enabled.

Thanks, Steve. Now I'm really torn between gcc and icc :-)
On the other hand, this gives Itanium some hope because the code  it's running now is rather crappy.

Hansong



On Dec 21, 2005, at 12:17 PM, Hansong Zhang wrote:

In Manta, vector operations like the following are implemented  for  generic dimensionality:

   VectorT<T, Dim>& operator*=(T s) {
     for(int i=0;i<Dim;i++)
       data[i] *= s;
     return *this;
   }

I understand that this is in hope of the compiler's being able  to  unroll the loop, so that it's just as efficient as explicit 3  or 4  vector implementation. However, we have observed that, on  Altix/ Itanium with the Intel compiler, the above function  consumes a lot  of time because it's not properly inlined on many  occasions. The  generated assembly code is, to say the least, not  pretty.

So the question is, has anybody verified with certainty that the   above loop is unrolled on other platforms (Mac, gcc, ...)? If  not,  fixing this could be a boost to all platforms. The compiler  plays a  much bigger role on Itanium than other processors, i.e.  if the  unrolling doesn't happen Itanium suffers much more. I  wonder  whether it's just that other platforms hide it better or  if other  compilers are smarter.

Thanks,
Hansong










Archive powered by MHonArc 2.6.16.

Top of page