Manta Interactive Ray Tracer Development Mailing List

Text archives Help


[Manta] Re: Gcc __m128 register to/from stack


Chronological Thread 
  • From: Solomon Boulos < >
  • To: " " < >
  • Cc: " " < >
  • Subject: [Manta] Re: Gcc __m128 register to/from stack
  • Date: Thu, 21 Jan 2010 11:03:36 -0800

This is almost certainly just an issue with filling up the vector from scalars. In other portions of code, you shouldn't see the same behavior. You can usually replace things like this with better _mm_shuffle_ps variants but this varies from system to system.

On more recent processors you are also unlikely to notice much of a hit from these ops, but it is still good practice to avoid _mm_set_ps and its equivalents.

On Jan 21, 2010, at 10:09, "Li-Ta Lo" 
< >
 wrote:

Hi,

I recently noticed that there are many instances of "inefficient?"
SSE code generated by GCC, for both Manta and my own SSE vector/matrix
library. For example, in Manta's RayPacket.o, you can find code to
load/store an XMM register from/to stack that done by an movlps plus
an movhps instead of a single movaps.

    c35:       45 0f 12 21             movlps (%r9),%xmm12
    c39:       45 0f 12 1a             movlps (%r10),%xmm11
    c3d:       45 0f 16 61 08          movhps 0x8(%r9),%xmm12
    c42:       45 0f 16 5a 08          movhps 0x8(%r10),%xmm11

My small test program

extern void print(const __v4sf &);

int main()
{

   __v4sf a = { 1.0f, 2.0f, 3.0f, 3.0f};
   __v4sf b = { 3.0f, 2.0f, 1.0f, 0.0f};
   __v4sf c = a + b;

    print(c);
}

will generate code like this

       .text
       .p2align 4,,15
.globl main
       .type   main, @function
main:
.LFB1859:
       .loc 1 21 0
       .cfi_startproc
       subq    $24, %rsp       #,
.LCFI1:
       .cfi_def_cfa_offset 32
.LBB17:
       .loc 1 25 0
       movaps  .LC3(%rip), %xmm0       #, tmp61
       .loc 1 27 0
       movq    %rsp, %rdi      #, tmp62
       .loc 1 25 0
       addps   .LC2(%rip), %xmm0       #, tmp61
       movlps  %xmm0, (%rsp)   # tmp61, c
.LVL0:
       movhps  %xmm0, 8(%rsp)  # tmp61, c
.LVL1:
       .loc 1 27 0
       call    print(float __vector const&)    #
.LBE17:
       .loc 1 28 0
       xorl    %eax, %eax      #
       addq    $24, %rsp       #,
       ret
       .cfi_endproc

Any one know the reason behind this?

Ollie



Archive powered by MHonArc 2.6.16.

Top of page