Discussion:
pthread_create is slow
(too old to reply)
Mona
2008-12-09 23:43:47 UTC
Permalink
HI All,
I have a multi-thread applications that during runtime continously
creates multiple threads to process incoming or out going packets. We
have instrumented invokation of pthread_create and have realized at
times it takes as long as 10 msec.
Any help greatly apprieciated.

Many thanks in advance,
Bizhan
Ian Collins
2008-12-10 00:18:15 UTC
Permalink
Post by Mona
HI All,
I have a multi-thread applications that during runtime continously
creates multiple threads to process incoming or out going packets. We
have instrumented invokation of pthread_create and have realized at
times it takes as long as 10 msec.
Any help greatly apprieciated.
Try using a thread pool rather then continuously creating threads.
--
Ian Collins
Mona
2008-12-10 00:31:57 UTC
Permalink
Post by Ian Collins
Post by Mona
HI All,
I have a multi-thread applications that during runtime continously
creates multiple threads to process incoming or out going packets. We
have instrumented invokation of pthread_create and have realized at
times it takes as long as 10 msec.
Any help greatly apprieciated.
Try using a thread pool rather then continuously creating threads.
--
Ian Collins
Yes, we are aware of creating a poll of thread would help. But we
would like to understand why?
Thanks,
Bizhan
Ian Collins
2008-12-10 00:36:03 UTC
Permalink
Post by Mona
Post by Ian Collins
Post by Mona
HI All,
I have a multi-thread applications that during runtime continously
creates multiple threads to process incoming or out going packets. We
have instrumented invokation of pthread_create and have realized at
times it takes as long as 10 msec.
Any help greatly apprieciated.
Try using a thread pool rather then continuously creating threads.
*Please* don't quote signatures
Post by Mona
Yes, we are aware of creating a poll of thread would help. But we
would like to understand why?
Creating threads requires kernel resources, some of which can be
expensive in time. If you really want to see what's involved, dig into
the source for your platform.
--
Ian Collins
David Schwartz
2008-12-10 01:09:28 UTC
Permalink
Post by Mona
Yes, we are aware of creating a poll of thread would help. But we
would like to understand why?
Your question explains why -- creating and destroying threads is
pointless and expensive.

Why don't we sell our cars every day and buy new ones the next
morning? Because that's a ridiculous amount of pointless effort when
we could just re-use the cars we already have.

DS
Gil Hamilton
2008-12-10 15:43:33 UTC
Permalink
Post by Mona
I have a multi-thread applications that during runtime continously
creates multiple threads to process incoming or out going packets. We
have instrumented invokation of pthread_create and have realized at
times it takes as long as 10 msec.
Any help greatly apprieciated.
There are many factors that can contribute to the time it takes to
create a thread so it's hard to be sure. However, your numbers don't
smell right to me; that is, it seems like that's way too long. Which
causes me to wonder how you're measuring the time and what you're
actually measuring.

I have a simple test program (see below) that creates threads, forces at
least a couple of task switches and then joins them a given number of
times. Now obviously it's a toy that isn't doing anything useful. But
its time to create and destroy a thread averages about 8 *microseconds*
-- several orders of magnitude faster than what you are reporting. I
ran it on a rather old redhat Enterprise 3 (2.4-based) system with a
2.8GHz Pentium 4 as well as an SMP opensuse 10.2 (2.6-based) system with
a modern Core-2 dual-core Xeon 3.0GHz system; interestingly, times for
both systems are about the same. (Yes, I'm assuming x86 linux is your
platform -- an assumption that may not be warranted.)

Creating threads is supposed to be relatively light-weight. Since the
entire address space is shared with the creating thread as are most
other resources, there are only a few additional kernel resources
required -- task structure for the new thread plus some reference counts
on other resources need to be incremented -- and there are user-space
stack pages to be allocated for the new thread. If you have plenty of
memory, though, none of this should take very long. If your system is
way overloaded, some additional paging I/O might have to occur which
would slow it down.

If you have lots of other threads running and doing real work then
obviously those other threads may be sucking available time away from
the thread-creation code. (In which case, your measurement of the "time
required to create threads" is actually including lots of other stuff
too.)

On a multiprocessor system, it might actually take longer to create a
thread than on a single processor system (somewhat non-intuitively).
Code and data pages which are already in the cache on the processor
running the thread-creating code might NOT be in the cache on other
processors where the new thread ends up running. Also, some additional
inter-processor interrupts will need to be handled and new page tables
installed on the other processors causing additional TLB flushes.
Again, however, that probably wouldn't explain the orders of magnitude
differences you're reporting. (Creating a test program that properly
takes multiple processor issues into account would be a bit more
complicated -- my test program simply locks the program onto a single
processor to avoid those issues.)

GH

Test program follows.
Compile with:
gcc -O3 -D_GNU_SOURCE -pthread -o tprog tprog.c"
Run with:
time ./tprog
--------------------------------------------------------

#include <pthread.h>
#include <stdio.h>
#include <errno.h>
#include <sched.h>
#include <string.h>
#include <stdlib.h>

int go_flag;

void * thread_proc(void * arg)
{
go_flag = 1;
while (go_flag == 1) {
// Make sure main thread runs after we've run.
sched_yield();
}
return (void *)0;
}

int main(int ac, char **av)
{
int err;
unsigned int i, cnt = 1000000;
cpu_set_t cpumask;

if (ac > 1) {
cnt = atoi(av[1]);
if (cnt <= 0) {
fprintf(stderr, "Usage: %s [num_threads]\n", av[0]);
return 2;
}
}

CPU_ZERO(&cpumask);
CPU_SET(0, &cpumask);
err = sched_setaffinity(0, sizeof cpumask, &cpumask);
if (err) {
fprintf(stderr, "%s: sched_setaffinity(0x0001) failed - %s",
av[0], strerror(errno));
return 1;
}

for (i = 0; i < cnt; ++i) {
pthread_t t1;
go_flag = 0;
err = pthread_create(&t1, NULL, thread_proc, NULL);
if (err) {
fprintf(stderr, "%s: pthread_create (%u) failed - %s",
av[0], i, strerror(errno));
return 1;
}
while (go_flag == 0) {
// Make sure new thread runs before we run again.
sched_yield();
}
go_flag = 2;
err = pthread_join(t1, NULL);
if (err) {
fprintf(stderr, "%s: pthread_join (%u) failed - %s",
av[0], i, strerror(errno));
return 1;
}
}

return 0;
}

Loading...