Post by KhookieI'm very new to multithreaded programming, mainly due to all this
interest in multi-core processors.
http://softwareblogs.intel.com/2007/09/19/threading-building-blocks-p...
Is that true that pthreads is slower than Intel TBB?
I thought pthreads was lower-level than TBB, hence if someone was good
enough with pThreads, they could produce something faster than TBB.
Or am I wrong here?
Personally, I haven't yet found any use for the TBB and continue to
prefer to do my threading "by hand". I must admit that I haven"t
benchmarked it yet, nor do I have any project in which I could use it
at the moment.
The whole idea of having a user-space scheduler is interesting in some
cases: the ones I see most are data-crunching applications. However, I
generally prefer abstracting parallelism a bit differently than TBB
does: I have a user-space scheduler that basically takes a task you
give it and performs it in the next available free core. As the tasks
I use it for are usually very data-intensive but not internally
parallelizable, which means a parallel_while or a parallel_for
wouldn't really help in most of my cases.
I guess what I'm trying to say is that any approach to "abstracted-
away" parallelism is only going to work as well as the context it is
used in allows for it. TBB will probably work very well in the
contexts it was designed for and, as I gather from following their
blog, that doesn't include the kind of applications I work on.
For the record: I principally work on a distributed architecture for a
real-time embedded system with both soft real-time and hard real-time
components. Most of my parallelism comes from having to handle
asynchronous input from devices (cameras, sensors, laser scanners and
the like) simultaneously. Therefore, I can abstract-away a lot of my
parallelism into a message-passing approach rather than a shared-data
approach, which means I only have to worry about threads where two of
them meet to exchange a message. As my applications are driven by the
input from external devices, there is no way I can "turn off" the
parallelism nor to code as if it wasn't there and have some external
library handle it for me. The only place I could do that is within the
data-crunching tasks that come with having to analyse the input of
those devices, but I already have a user-space scheduler for that.
Just my C$0.02.
rlc