
www.Usenet.com
| <-- __Chronological__ --> | <-- __Thread__ --> |
"Peter L. Montgomery" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> In article <[EMAIL PROTECTED]>
> Terje Mathisen <[EMAIL PROTECTED]> writes:
> >xpuente wrote:
> >
> >> Hi all,
> >>
> >> First all, maybe this is not the correct group to post this but I think
> >> here's people who can help with my probs.
> >> Well, I have a SGI Altix system with 8 madison 1.3 GHz procs. FP
> >> applications runs fine with 7.1 intel compilers but integer
applications
> >> runs badly. We have a big C++ application and with this system the
> >> performance is even lower than a Origin 3200 based on R12000 400MHz
> >> processors. Some one can think that intel compiler sucks and exploring
the
> >> assembed code really sucks. Not only because the poor IPC achieved
(bellow
> >> 0.7) but loking in code you can see that sequences like:
> >>
> >> int Class::method(Boolean a)
> >> { return (a ? 1:0);}
> >>
> >> The intel VTune says thats this code has a very low IPC but not only
this.
> >> The asembled code is huge for this method. In fact if you use
if(){}else{}
> >> sentence legth is roughly one half. Obviously my code has a lot of
these
> >> sentences.... and much of the slowdown is caused by this. I cant
understand
> >> how is the compiler so bad. Some one has similar experience with these
> >> compilers or i'm stupid (or my code is a crap)?.
> >
> >Ouch!
> >
> >This is a classic example of code where predicates should work
> >beautifully, in fact is is more or less the canonical example.
> >
> >There's about two or three reasonable code generation results here, I'd
> >prefer to see something like this:
> >
> >Cycle 1: Set (pa/pb) to the result of the (a == 0) test.
> > Set return register to first return value
> > Set another register to the alternate result
> >
> >Cycle 2: Do a predicated move of the alternate value to the return
register.
> >
> >It is hard to see how you can end up with much more than this?
>
> I assume a is a 64-bit variable. If it is nonzero,
> we want 1; otherwise we want 0.
>
> This illustrates the usefulness of MAX and MIN
> instructions, which too many architectures omit.
> We need MIN(a, 1) (unsigned minimum).
> The constant 1 can be loaded earlier if MIN disallows an immediate
argument.
>
> Without MIN, we can use
>
> 1 - (1 >> a), (logical shift)
> or
> 1 ^ (1 >> a)
>
> taking only two instructions if the constant 1 is already in a register.
> The C language does not define the output of a shift when the operand is
large,
> but the IA-64 defines it correctly. The unpredicated expansion
> may be easier to inline (if, for example, the function reference
> occurs in code which is already predicated).
Unfortunately, on Itanium1 and Itanium2 shift by variable amount is
relatevely expensive (3 clock cycles on Itanium2, assuming that you want to
use result in "normal" ALU/store operation). Predicated move is faster...
Thanks,
Eugene
> --
> After California's recall election, wildfires Schwarz-en-ed the Bush-lands
> on its geographic right (when we wanted the forests to be Green).
> [EMAIL PROTECTED] Home: San Rafael, California
> Microsoft Research and CWI
| <-- __Chronological__ --> | <-- __Thread__ --> |