Text archives Help
- From: "Li-Ta Lo" <
>
- To:
- Subject: [Manta] Gcc __m128 register to/from stack
- Date: Thu, 21 Jan 2010 11:09:36 -0700 (MST)
- Importance: Normal
Hi,
I recently noticed that there are many instances of "inefficient?"
SSE code generated by GCC, for both Manta and my own SSE vector/matrix
library. For example, in Manta's RayPacket.o, you can find code to
load/store an XMM register from/to stack that done by an movlps plus
an movhps instead of a single movaps.
c35: 45 0f 12 21 movlps (%r9),%xmm12
c39: 45 0f 12 1a movlps (%r10),%xmm11
c3d: 45 0f 16 61 08 movhps 0x8(%r9),%xmm12
c42: 45 0f 16 5a 08 movhps 0x8(%r10),%xmm11
My small test program
extern void print(const __v4sf &);
int main()
{
__v4sf a = { 1.0f, 2.0f, 3.0f, 3.0f};
__v4sf b = { 3.0f, 2.0f, 1.0f, 0.0f};
__v4sf c = a + b;
print(c);
}
will generate code like this
.text
.p2align 4,,15
.globl main
.type main, @function
main:
.LFB1859:
.loc 1 21 0
.cfi_startproc
subq $24, %rsp #,
.LCFI1:
.cfi_def_cfa_offset 32
.LBB17:
.loc 1 25 0
movaps .LC3(%rip), %xmm0 #, tmp61
.loc 1 27 0
movq %rsp, %rdi #, tmp62
.loc 1 25 0
addps .LC2(%rip), %xmm0 #, tmp61
movlps %xmm0, (%rsp) # tmp61, c
.LVL0:
movhps %xmm0, 8(%rsp) # tmp61, c
.LVL1:
.loc 1 27 0
call print(float __vector const&) #
.LBE17:
.loc 1 28 0
xorl %eax, %eax #
addq $24, %rsp #,
ret
.cfi_endproc
Any one know the reason behind this?
Ollie
- [Manta] Gcc __m128 register to/from stack, Li-Ta Lo, 01/21/2010
Archive powered by MHonArc 2.6.16.