Qprof

This is a set of profiling utilities, currently targeting only linux. It includes a simple command line profiling tool, with the following characteristics: Instructions for installing and using the profiler are given here. A sample session is presented here. The package can be downloaded from here.

The profiler is licensed under several different GPL-compatible licenses. In many cases, reuse of the library components in proprietary applications is allowed. See the LICENSING.txt file in the distribution for more details.

Other included packages

The distribution includes three other facilities which may be useful outside of a profiling context:
Atomic_ops
Provides implementations for atomic memory update operations on a number of architectures. This allows direct use of these in reasonably portable code. Unlike earlier similar packages, this one explicitly considers memory barrier semantics, and allows the construction of code that involves minimum overhead across a variety of architectures. The plan is to generalize this to non-Linux platforms soon. It is also available as a separate distribution from here.

It should be useful both for high performance multi-threaded code which can't afford to use the standard locking primitives, or for code that has to access shared data structures from signal handlers. For details, see README_atomic_ops.txt in the distribution.

Some lock-free data structures
Handler_safe_data.h describes some interfaces that, for example, support simple memory allocation from signal handlers. These are based on the atomic_ops package.
Wrap.h
Provides a reasonably general purpose facility for wrapping library functions (i.e. forcing user-specified code to be executed before and after a call to a standard library function) by redefining them and then using dlopen and dlsym. This is probably viable only on Linux/Unix platforms. The profiler uses it to intercept thread creation. See README_wrap.txt for details.
Some more details can be found in the README.txt file.

Related packages

We are aware of the following open source packages that are either related to, or perform sampled profiling on Linux.
Gprof
This is the standard Linux profiler. It can generate approximate call-graph profiles. It doesn't appear to interact well with threads or dynamic libraries. Requires relinking for flat profile and recompilation for call-graph profile.
Sprof
An analogous but separate facility for displaying shared library profiles.
Cprof
A thread-aware profiler for Linux based on gcc-based code instrumentation. A while ago we found it nontrivial to get running on many Linux platforms, but its maintenance status has recently improved.
Oprofile
A system wide profiling tool. Requires a kernel module.
Prospect
Another system-wide profiler. Based on the Oprofile kernel module.
Perfmon and pfmon tool
A library and command to access hardware profile counters on Itanium. We rely on this for hardware event support. By itself, it can be used to count hardware events in a program region, etc.