Text archives Help
- From: Hansong Zhang <hansong@sgi.com>
- To: "Steven G. Parker" <sparker@cs.utah.edu>
- Cc: "'manta@sci.utah.edu'" <manta@sci.utah.edu>
- Subject: Re: [MANTA] loop unrolled?
- Date: Wed, 21 Dec 2005 14:45:52 -0800
some more info on this:
Unrolling is part of the story.
Effectiveness of the inlining operation is the other part that the
compiler doesn't seem to be doing well.
Exactly. Unrolling is just the beginning. Itanium has so many
registers
there is no good reason for any loads or stores to any intermediate
vectors in the computation! It is really hard for me how to understand
how the compiler is making such serious mistakes. It almost seems as
if the compiler is making the transformations in the wrong order.
Steven G. Parker wrote:
It would be straightforward (although tedious) to create
specializations for 3 dimensional vectors/points of floats, which
might help. You will also find the same pattern in the ColorSpace
class...
Steve
On Dec 21, 2005, at 12:35 PM, Hansong Zhang wrote:
Steven G. Parker wrote:
I have confirmed that gcc unrolls this loop
(mac and x86) when the - funroll-loops flag is enabled.
Thanks, Steve. Now I'm really torn between gcc and icc :-)
On the other hand, this gives Itanium some hope because the code it's
running now is rather crappy.
Hansong
On Dec 21, 2005, at 12:17 PM, Hansong Zhang wrote:
In Manta, vector operations like the
following are implemented for generic dimensionality:
VectorT<T, Dim>& operator*=(T s) {
for(int i=0;i<Dim;i++)
data[i] *= s;
return *this;
}
I understand that this is in hope of the compiler's being able to
unroll the loop, so that it's just as efficient as explicit 3 or 4
vector implementation. However, we have observed that, on Altix/
Itanium with the Intel compiler, the above function consumes a lot of
time because it's not properly inlined on many occasions. The
generated assembly code is, to say the least, not pretty.
So the question is, has anybody verified with certainty that the
above loop is unrolled on other platforms (Mac, gcc, ...)? If not,
fixing this could be a boost to all platforms. The compiler plays a
much bigger role on Itanium than other processors, i.e. if the
unrolling doesn't happen Itanium suffers much more. I wonder whether
it's just that other platforms hide it better or if other compilers
are smarter.
Thanks,
Hansong
|
Archive powered by MHonArc 2.6.16.