Discussion:
OpenMP on Hyper-Threading ???
(too old to reply)
Ronny
2004-04-10 16:51:01 UTC
Permalink
Hi all,I'm working on some fortran program to boost its performance.
I use the OpenMP Fortran API to modify source code. I compiled it
with Intel Fortran Compiler v8.0 with -openmp enabled.
But it shows no improvements , actually some slowdown on my
Intel Pentium4 2.8GHz E (Prescott) with Hyper-Threading enabled.
This is the significant part of the program:

COMMON U(N1,N2), V(N1,N2), P(N1,N2)
PCHECK = 0.0D0
UCHECK = 0.0D0
VCHECK = 0.0D0
..........

!$OMP PARALLEL DO REDUCTION(+:PCHECK,UCHECK,VCHECK)
!$OMP+schedule(dynamic,1)
DO 3500 ICHECK = 1,M
DO 4500 JCHECK = 1, N
PCHECK = PCHECK + (P(ICHECK,JCHECK))
UCHECK = UCHECK + (U(ICHECK,JCHECK))
VCHECK = VCHECK + (V(ICHECK,JCHECK))
4500 CONTINUE
U(ICHECK,ICHECK) = U(ICHECK,ICHECK)
1 * ( MOD (ICHECK, 100) /100.)
3500 CONTINUE
!$OMP END PARALLEL DO
...........

I find that if I thread this code,my program does much worse.
Is there any programming error or tips on programming OpenMP?
Thanks
David Butenhof
2004-04-12 11:52:28 UTC
Permalink
Post by Ronny
Hi all,I'm working on some fortran program to boost its performance.
I use the OpenMP Fortran API to modify source code. I compiled it
with Intel Fortran Compiler v8.0 with -openmp enabled.
But it shows no improvements , actually some slowdown on my
Intel Pentium4 2.8GHz E (Prescott) with Hyper-Threading enabled.
COMMON U(N1,N2), V(N1,N2), P(N1,N2)
PCHECK = 0.0D0
UCHECK = 0.0D0
VCHECK = 0.0D0
..........
!$OMP PARALLEL DO REDUCTION(+:PCHECK,UCHECK,VCHECK)
!$OMP+schedule(dynamic,1)
DO 3500 ICHECK = 1,M
DO 4500 JCHECK = 1, N
PCHECK = PCHECK + (P(ICHECK,JCHECK))
UCHECK = UCHECK + (U(ICHECK,JCHECK))
VCHECK = VCHECK + (V(ICHECK,JCHECK))
4500 CONTINUE
U(ICHECK,ICHECK) = U(ICHECK,ICHECK)
1 * ( MOD (ICHECK, 100) /100.)
3500 CONTINUE
!$OMP END PARALLEL DO
...........
I find that if I thread this code,my program does much worse.
Is there any programming error or tips on programming OpenMP?
Thanks
First, you need to remember that a hyperthreaded CPU is *not* the same as a
multiprocessor. It's a single processor, with multiple instruction streams
sharing common (and limited) resources. Two compute bound threads on a
hyperthreaded CPU will likely, on average, perform a bit better than those
same two threads on the same CPU with hyperthreading disabled.

If the two threads aren't independent, and share the right kinds of data in
the right ways (especially with a lot of contention on close cache lines),
and issue the right instruction mixes to keep the CPUs pipelines busy
without stalling due to scheduling contention between them, those threads
might actually do better on a hyperthreaded CPU than on two CPUs... but the
tuning is really tricky and highly dependent on the exact low-level details
of that particular CPU. (And if those highly tuned threads ran on a real
multiprocessor, they'd perform worse.)

Hyperthreading is a cool hardware trick to make better use of the
scalability already built into modern chips, for a small overhead in
maintaining one or more additional instruction stream contexts and a little
scheduling logic. But those instruction streams are NOT processors; and
scheduling them as processors is extremely and unrealistically optimistic.
It's just a lot easier for the OS that way.

Now, for workloads that are less compute-intensive, you're likely to be less
disappointed. You won't be getting maximum use out of the processor
pipelines, because your instruction stream is less tightly packed, and
you're probably not using floating-point much. But contention between the
instruction streams will be less likely to drive the application
performance than I/O latency -- which will be essentially the same whether
your threads are sharing a single CPU, a hyperthreaded CPU, or have
separate CPUs. You may gain some benefit from the hyperthreaded CPU's
"shared cache".

OpenMP and hyperthreading really don't go together. Your OpenMP compiler and
runtime scheduler is highly unlikely to know enough about the sharing
characteristics to do a good job of managing affinity and sharing. It's
almost certainly treating your hyperthreaded CPU as a dual processor SMP;
and that's a losing proposition because it's simply not.
--
/--------------------[ ***@hp.com ]--------------------\
| Hewlett-Packard Company Tru64 UNIX & VMS Thread Architect |
| My book: http://www.awl.com/cseng/titles/0-201-63392-2/ |
\----[ http://homepage.mac.com/dbutenhof/Threads/Threads.html ]---/
Loading...