Overview

Programmers can prepare a program for deep profiling by compiling it in a deep profiling grade such as asm_fast.gc.profdeep. When a program compiled in a deep profiling grade is executed, it builds a data structure containing profiling information, and writes this out to a file called Deep.data at the end of execution. Programmers can then browse the contents of these profiling data files using the Mercury deep profiling tool, mdprof.


The structure of the profiling data file

The data structure written out to the Deep.data file, the profiling tree, resembles a call graph. It has four kinds of nodes: CallSiteStatic, ProcStatic, CallSiteDynamic and ProcDynamic structures. These four node types are consistently abbreviated as css, ps, csd and pd throughout the deep profiling system, including the names of structure fields.

The Deep.data file consists of a header and a sequence of nodes. The header contains

The following sequence contains nodes of all four types. In the running program, these nodes refer to each other by pointers, but in the process of writing them out we convert these pointers to node ids, which are dense small integers starting at one.
CallSiteStatic
CallSiteStatic structures are created by the compiler. There is one CallSiteStatic structure for each call site in the source code. CallSiteStatic structures contain the following fields:
ProcStatic
ProcStatic structures are created by the compiler. There is one ProcStatic structure for each procedure in the source code. ProcStatic structures contain the following fields:
CallSiteDynamic
CallSiteDynamic structures are created by the instrumented program during a profiling run. There will be one or more CallSiteDynamic structures for each call site through which the program actually performs a call during the profiling run. For a given call site, there will be distinct CallSiteDynamic structures for each distinct context in which those invocations take place.
ProcDynamic
ProcDynamic structures are created by the instrumented program during a profiling run. There will be one or more ProcDynamic structures for each procedure which is called during the profiling run. For a given procedure, there will be distinct ProcDynamic structures for each distinct context in which those calls take place.

The Mercury deep profiling tool mdprof

The Mercury deep profiler consists of four programs. One is the web browser of the user's choice: this implements the user interface. The other three are mdprof, mdprof_cgi and mdprof_server.
mdprof
This a simple shell script. It is invoked by the web server in response to queries of the right form. It does nothing more than set up the PATH environment variable to contain the directory in which mdprof_cgi and mdprof_server were installed, and then invoke mdprof_cgi.
mdprof_cgi
This is a Mercury program. It is invoked once for every page displayed by the deep profiling system. It is passed, in the environment variable QUERY_STRING which is set by the web server, an URL component containing the name of a profiling data file and a query specifying which part of that data file is to be displayed. Mdprof_cgi checks whether a server process already exists for the given profiling data file, and if not, it creates that process. It then sends it the query, passes the results back to the web server, and exits.
mdprof_server
This is a Mercury program. It reads in the profiling data file whose name is passed to it on the command line by mdprof_cgi, processes it to materialize information that is required by queries but is stored in the profiling data file only implicitly, and then goes into a loop awaiting queries. When it gets a query from mdprof_cgi, it answers the query and goes back to sleep. It exits when it has not received a query for a set timeout period, which by default is thirty minutes, or when it receives a "query" telling it to shut down. (Due to the timeout mechanism, shutting down the server explicitly is not useful unless the profiling data file has changed, the server has been recompiled, or one wants to recover its space occupied by its virtual memory.)
The reason for the split between mdprof/mdprof_cgi and mdprof_server is simply that the web server requires the program it invokes to exit before it displays the page the program generates, and we don't want have to read and process the deep profiling data file for every page to be displayed, since that takes a significant fraction of a minute. The reason for the split between mdprof and mdprof_cgi is to make it easy to specify and to modify the name of the directory containing the server program.

Mdprof_cgi and mdprof_server communicate via a pair of named pipes, whose names have the form /var/tmp/mdprof_server_{from,to}_mangled_data_file_name. (The mangling is required to replace any slashes in the name of the data file.) The existence of these files serves as an approximation of a lock; the idea is that they exist if and only if a server process for that data file is alive and serving queries via those pipes. Mdprof_cgi creates a server process for the data file if and only if these named pipes do not exist. They are created and destroyed only by mdprof_server, and they are always created and destroyed together. Mdprof_server creates them as soon as it starts and deletes them just before it exits. There is no race condition when the pipes are deleted: while a server process may exist for a very short while after the pipes are deleted, it will not do anything after that deletion except shut down. There is a race condition when the pipes are created. It is possible for the web server to receive two requests for a given data file in quick succession, and it is possible that when the second invocation of mdprof_cgi checks whether the pipes exist, the first invocation of mdprof_cgi either has not yet created the server process or that the server process has not yet created the pipes. However, the window of vulnerability is small, the likelyhood that two different users will start mdprof browsing sessions during that window is very small, and the likelyhood that one user will start two different sessions during that window is even smaller. The mechanisms required for locking would in any case be operating system dependent. Since the programming cost is significant and the benefits are not, we simply live with the window of vulnerability.


The modules of the deep profiler

array_util.m
This module contains utility predicates for handling arrays.
callgraph.m
This module constructs an explicit representation of the call graph, so we can find its cliques.
canonical.m
This module has code to canonicalize call graphs (i.e. ensure that no clique contains more than one ProcDynamic from a given procedure). It also has code that uses canonicalization to merge two call graphs. This module is not complete yet.
cliques.m
This module allows you build a description of a directed graph (represented as a set of arcs between nodes identified by dense small integers) and then find the strongly connected components of that graph.
conf.m
This module contains primitives whose parameters are decided by the configure script. This module picks them up from the #defines put into runtime/mercury_conf.h by the configure script.
dense_bitset.m
This module provides an ADT for storing dense sets of small integers. The sets are represented as bit vectors, which are implemented as arrays of integers. This is used by cliques.m.
interface.m
This module defines the type of the commands that mdprof_cgi passes to mdprof_server, as well as utility predicates for manipulating commands and responses.
io_combinator.m
This module a set of I/O combinators for use by read_profile.m.
mdprof_cgi.m
This file contains the program that is executed by the web server to handle each web page requests.
mdprof_server.m
This file defines the top level predicates of mdprof_server. It is mostly concerned with option handling.
measurements.m
This module defines the data structures that store deep profiling measurements and the operations on them.
merge.m
This module contains code for recursively merging sets of ProcDynamic and CallSiteDynamic nodes. mdprof_uses this code to make sure that each clique contains at most ProcDynamic structure for any given procedure.
profile.m
This file defines the main data structures of mdprof_server, and predicates for accessing them.
read_profile.m
This module contains code for reading in the deep profiling data files created by deep profiled executables.
server.m
This module contains the main server loop of mdprof_server; each iteration of the server loop serves up one web page. The module also contains test code for checking that all the web pages can be created without runtime aborts.
startup.m
This module contains the code for turning the raw list of nodes read in by read_profile.m into the data structure that mdprof_server needs to service requests for web pages.
timeout.m
This module implements the timeouts that mdprof_server uses to shut down after it hasn't received any queries for a while.
util.m
This module defines utility predicates for both mdprof_cgi and mdprof_server.