Open Questions

Here are some problems I have encountered but not yet solved.

Parallel Dynamic Programming under NUMA
Average Height of Regular Trees
Analysis of Top-Level Parallel Recursive Algorithms
Average-Case Analysis of Parallel Algorithms with Petri Nets

I am happy to receive any comments, hints or solutions via email!

Parallel Dynamic Programming under NUMA

When wrapping up my master’s thesis, I discovered that my results only held for the shared memory model. Unfortunately, true shared memory platforms are rare in practice; most machines have non-uniform memory access (NUMA), i. e. processors have different access cost for different parts of the memory.

Is it still possible to classify the set of parallelizable dynamic programming recurrences?

Idea: If the dependency radius around each cell is small in a suitable sense, then we have $o(N)$ expensive accesses, $N$ the number of entries.

Average Height of Regular Trees

tree languages are almost the same as parsing trees of context-free grammars; for the details, see Comon et al. [1]. In particular, all simple varieties of trees [2] with finite set of node degrees $\Omega$ are regular. Conversely, all regular tree languages describable by regular tree grammars with only one non-terminal are simple varieties. It is known that all simple varieties with $\Omega \cap \mathbb{N}_{\geq 2} \neq \emptyset$ have average height in $\Theta(\sqrt{n})$ .

There are clearly tree languages with linear average height, the one defined by $S \to a(S) \mid a$ .

The question is now: ⊕ Are there (infinite) regular tree languages with average height not in $\Theta(n) \cup \Theta(\sqrt{n})$ ?

I gave a talk on this at FORMAT 2014 and wrote up some related results that may be of independent interest.

It may be possible to adapt the height analysis for simple varieties [3] to more than one non-terminal; I was not able to figure out if and how.

Analysis of Top-Level Parallel Recursive Algorithms

During average-case running-time analysis of recursive algorithms, we routinely solve recurrences of the form

$\mathbb{E}T_n = \mathbb{E}\bigl[ T_k + T_{n-k} + f(n) \bigr] = \mathbb{E}T_k + \mathbb{E}T_{n-k} + \mathbb{E}f(n);$

note how we use that expectation is linear in addition.

Now consider the most simple way to exploit parallelism: fork for each recursive call on the top level only. Then, the recurrence becomes

$\mathbb{E}T'_n = \mathbb{E}\bigl[ \max\{ T_k, T_{n-k}\} \bigr] + \mathbb{E}f(n).$

Unfortunately, expectation is not linear in taking the maximum. So what do we do?

Ideas:

If we know the distribution of $T_n$ really, really well, we may be able to derive decent tail bounds, Chernoff bounds. Since we are interested in bounding speedup from below, upper bounds on $T'_n$ may already help us.
Can we use that
$\max \{X_1, \dots, X_p\} = \lim_{p \to \infty} \Bigl( \sum_{i=1}^p |X_i|^p \Bigr)^{1/p}?$
The powers of $X_i$ can be obtained from the moment generating function of the $X_i$ (which are up to maybe a size parameter). Note that we still have to deal with $\mathbb{E} (\_)^{1/p}$ .

Average-Case Analysis of Parallel Algorithms with Petri Nets

Sequential programs can be modeled as something akin to (finite, discrete) Markov chains [4]: one state per statement, edges from statements to those that can be executed next, and probabilities to indicate which edge is taken with which likelihood. In this model, we can determine the expected runtime of an algorithm.

Can we transfer this to parallel algorithms? Even if the number processors $p$ is small, the size of the automaton explodes with $n^p$ which soon becomes intractable. There is no natural extension that treats $p$ as a parameter and maintains the method.

A natural candidate model for parallel algorithms are Petri nets. Statements become transitions and places contain a (labeled) token for each processor that currently waits to execute the next transition. More places and other tokens can be used to model, say, mutexes; we would require transitions to be able to distinguish between processor and helper tokens.

The first question is, what kind of stochastic versions have been studied at all? In order to remain somewhat consistent with Laube and Nebel [4], we would want to have transitions move the input marker to different places with certain probabilities. We would also require all enabled transitions to fire in each time step; otherwise, we would study concurrency but not parallelism. ⊕Some transitions may want to let only one processor pass at a time. Such ties should be resolved non-deterministically. If this works out, the expected runtime of the parallel algorithm would be given by the expected number of steps until all processor markers reach a sink place.

There may be a natural barrier for this approach. Abovementioned method yields only the expected value; higher moments are inaccessible. A simple example is a plain for-loop:

  for ( int i = 1; i < n; i++ ) {
    foo();
  }
  bar();

We know that this loop runs for exactly $n$ iterations, every time. However, after translation it becomes:

A graphic created with TikZ
^[source]

This can no longer be distinguished from other loops with the same probabilities, e. g.

  while ( true ) {
    if ( uniformInt(1,n) == 1 ) {
      break;
    }
  }
  bar();

Why is this a larger problem for the model proposed above? Well, if we have conflicts aka communication, e. g. in case of a mutex, the probability of processors interfering is crucial for the expected runtime – but depends on higher moments! Consider, for instance, this net: ⊕Inventing notation here. Say thick edges transport processor tokens ( $1,2, \dots$ ) and thin edges helper tokens (●). Small numbers next to places indicate a capacity limit.

A graphic created with TikZ
^[source]

Which kinds of loop are behind this? If both are of the first type I gave above, we never have a conflict; note that the two processors enter their respective loops at the same time here, and $p_1$ leaves the mutex before $p_2$ reaches it. If they are of the second kind, however, we have a non-zero probability for conflict.

So, the question remains: is there any hope here? If yes, what can we get out of a stochastic Petri net model?

References

[1]
H. Comon et al.: Tree Automata Techniques and Applications (2007). tata.gforge.inria.fr.
[2]
Philippe Flajolet and Robert Sedgewick: Analytic Combinatorics (Cambridge University Press, 2009). ISBN: 978-0-511-80165-5, DOI: 10.1017/CBO9780511801655.
[3]
P. Flajolet and A. Odlyzko: The Average Height of Binary Trees and Other Simple Trees, Journal of Computer and System Sciences 25(2) (1982): 171–213, DOI: 10.1016/0022-0000(82)90004-6.
[4]
U. Laube and M. E. Nebel: Maximum Likelihood Analysis of Algorithms and Data Structures, Theoretical Computer Science 411(1) (2010): 188–212, DOI: 10.1016/j.tcs.2009.09.025.