![]() |
|
| | |
| Craig Holman Research Jabberwocky Quotations Contact | |
| Home >
Jabberwocky      |
|
Jabberwocky for June 1, 2008 |
|
Some research themesI wrote the following on February 5, 2007. Introduction In the past twenty years of designing algorithms for challenging problems, I have noticed several themes, patterns of thought, approaches, and tools that I have come to rely upon. Of these, the most important are the study of constraint and consequence, playing rather than working, searching rather than researching, and giving yourself the gift of time. While heuristics have an important place in algorithm design, I have chosen to focus primarily on algorithmic frameworks rather than the heuristics that can operate within those frameworks. All of my algorithms could be improved by better heuristics. If the algorithms have value, someone will invest time on theory and experiment to discover increasingly effective heuristics. I appear to be most attracted by the non-heuristic aspects of algorithm design. I first developed the reduction approach while working on an algorithm for determining whether a k-clique exists in a graph. It has remained one of my most important tools. I believe that I use reduction in each of my algorithms.
Reduction The word 'reduction' is resonant with 'simplification' and 'purification' for me. It can be applied to problem instances and search spaces. In both, it involves a paring away, an elimination of the irrelevant, redundant, tautological, and contradictory. The reduction of a problem instance should result in a simpler yet equivalent instance. The reduction of a search space should result in a smaller search space. The means of reducing a search space is often the reduction of the underlying problem instance.
Complete sets of reductions We can think of a reduction as an operation on a structure that reduces some lower-bounded metric of that structure. A set of reductions that each result in the reduction of the same lower-bounded metric has the property that only a finite number of applications of some reduction in that set to a structure is possible, regardless of the order in which the applications are performed. We'll refer to a structure that is the result of an exhaustive application sequence of reductions in the set as a 'reduced structure'. For many sets of reductions, there are structures for which there are more than one reduced structures. That is, for many sets of reductions, there exist structures for which the order in which reductions are applied to sites within the changing structure determine which of several reduced structures will be the resulting reduced structure. For these sets of reductions, different application sequences result in different reduced structures. When designing a reduction algorithm that uses such a set of a reductions, we must be concerned both with attempting to minimizing the metric as well as maximizing the efficiency of the reduction. There are also sets of reductions, known as 'complete sets of reductions', for which there is only one reduced structure. For a complete set of reductions, every exhaustive application sequence results in a single reduced structure that is characteristic of the original structure. The order in which we apply reductions from the set may affect the efficiency of the reduction but has no effect upon the outcome of the reduction. Working with a complete set of reductions has great advantages. We may complete a set of reductions that is not complete by adding reductions to it. It is not enough to hope or believe that a set of reductions is complete - we need to prove that it is before we can justify focusing only on the efficiency of reduction using the set. I was first exposed to the idea of a complete set of reductions in a graduate class on advanced automated theorem proving. Proving that a set of reductions was complete was at the heart of my doctoral work on the reduction of general boolean expressions to a new normal form. I am finding the use of complete sets of reductions important in my work on factoring composite integers into two factors.
Constraint and consequence I have been fond of the phrase 'constraint and consequence' for many years, and I'll try to convey some of what it has come to mean to me. The best way I have of defining mathematics succinctly is 'the study of constraint and consequence'. I don't know if many mathematicians would agree, but it seems to capture the essence of mathematics as I understand it. In short, it says that mathematics is the study of systems of constraints and the consequences that flow from those constraints. In another vein, each problem instance is a system of constraints that admits of zero, one, or several solutions. Solving a problem instance involves the discovery and application of those constraints to reduce the instance and/or search space until either a solution becomes apparent or we have proven that no solution exists. Reduction is possible only because the constraints inherent in a problem instance provide sites for the application of reductions. The study of the possible constraints inherent in a problem and the effective use of actual constraints inherent in a problem instance is key to algorithm design. We often must squeeze blood from a stone, and that stone is a system of constraints. When I was young, I had little understanding of constraint, thought it had little value, and certainly didn't want to be subject to it. I understand constraint better and think it it can have great value, but I still don't want to be subject to it. I first learned of the nature and value of constraint in a course on poetry in 1980 for which we read Poetic Meter and Poetic Form by Paul Fussell. It is a masterful book. I learned how convention can provide the structure, context, and system of expectations within which, upon which, and against which a poem of great beauty and power can be woven. Convention can act as a skeleton, without which a poem can do little but flop around. I learned that weaving a poem upon a conventional form, such as a haiku, Elizabethan sonnet, limerick, or sestina, requires working with a system of constraints and expectations, often conforming to them but often rebelling against them. The acts of rebellion have power only because of the constraints and the expections that have become patterned over time. I have remained fascinated with constraint and the way that convention can provide structure and power. It is no accident that I see the study of constraints throughout mathematics and at the heart of algorithm design.
Incremental solution There are several flavors of reduction. Two opposing flavors I'll refer to as 'symmetric' and 'asymmetric', referring to the relative size of the substructures that result from the application of a reduction. The approach of incremental solution involves asymmetric reduction, while that of splitting, the next topic, involves symmetric reduction. In my graph algorithms, I'll often solve the problem 'one node at a time'. For example, consider the problem of constructing all maximal cliques of a graph. A clique of a graph is a complete subgraph - one in which every node is fully connected to every other node. A maximal clique is a clique that is not a proper subgraph of any other clique of the graph. At a high level of my algorithm for this problem, I solve it a node at a time as follows: I exhaustively pick a node, construct all maximal cliques that contain that node, and then remove that node from future consideration. I break the problem into two parts - maximal cliques that include a node and those that don't, solve the first part, and then solve the second by repeating the process with a different node. Solving a problem one node at a time is one example of incremental solution. Sometimes I am able to completely solve each subproblem (e.g. constructing all maximal cliques that include a specified node) in one pass, but often multiple passes are required, and incremental solution is performed within an iterative or conditional process.
Splitting Symmetric reduction can be seen in an approach that I call 'splitting' that is also used in the problem of constructing all maximal cliques of a graph. Splitting is used at a lower level in my algorithm for that problem. The results of splitting a set of size m in that algorithm is always two sets of size (m-1). For example, suppose that we have a clique candidate (represented as a set of node labels) { 3, 7, 18, 23, 72 } and a constraint that there is no edge between nodes 7 and 18. The clique candidate can not be a clique because it contains both 7 and 18, so it should be removed from a list of clique candidates, but two subsets of the former candidate should be added to the list of clique candidate: { 3, 7, 23, 72 } and { 3, 18, 23, 72 }. I refer to the splitting reduction as symmetric because the two resulting subproblems are of similar size. I use splitting as the basis of three algorithms: constructing all maximum cliques of a graph, determining whether a graph contains a clique of size k, and constructing all maximal cliques of a graph. A clique is maximum if no other clique of the graph contains more nodes. I have several other algorithms for the first two problems, but only one for constructing all maximal cliques. I suspect that this approach will be of use more generally, but so far I'm aware of using it only for these graph problems.
Reverse engineering One aspect of reverse engineering is reverse evaluation. I've spent years studying boolean expressions, which are named after the logician George Boole. One form of a boolean expression can consist of boolean variables (that can only take on the values true and false), nots, ands, ors, and parentheses. For example,
A and not ( ( B or not A ) and ( not C or not B ) )
is an example of a boolean expression that contains boolean variables A, B, and C. We can assign values to each of the variables in a boolean expression and evaluate the expression, which will evaluate to either true or false. For example, if we assign true to A, false to B, and true to C, the expression evaluates to true as follows :
A and not ( ( B or not A ) and ( not C or not B ) ) true and not ( ( false or not true ) and ( not true or not false ) ) true and not ( ( false or false ) and ( false or not false ) ) true and not ( ( false ) and ( false or true ) ) true and not ( false ) true
A set of assignments of true or false to each of the variables of a boolean expression is called an 'interpretation'. If there exists an interpretation that causes the expression to evaluate to true, that interpretation is called a 'model' of the expression, and the expression is said to be 'satisfiable'. A boolean expression is satisfiable (able to evaluate to true under some interpretation) if and only if a model (an interpretation under which the expression evaluates to true) for the expression exists. I'll quote the first paragraph of my dissertation:
In other words, I start out by assuming that the expression evaluates to true, reverse evaluate the expression, propagating that constraint so that a system of constraints on variables is constructed, and work with that system of constraints. A similar approach can be taken to start the process of factoring a composite integer into two integer factors.
Patchwork coverage I have often fallen into the trap of searching the perfect, golden algorithm for a challenging problem. To be sure, that remains my ultimate goal, but I've begun to figure out that judging an algorithm on that basis is misguided and disheartening. An algorithm is designed to solve a problem. The algorithm will usually work better for some instances of that problem than others. Some instances may be solvable more efficiently than others, while other instances may not be solvable at all by the algorithm. Each problem instance has a size (e.g. the number of nodes in a graph) that is often referred to as N. If the algorithm can solve every instance of its problem within a polynomial function of N (e.g. N4 + 2N3 - 7N) units of time, it is said to have polynomial complexity in time. On the other hand, if even one of its problem instances requires an exponential amount of time (e.g. 2N), which grows much more rapidly than any polynomial function, the algorithm is said to have exponential complexity in time. Many problems that interest me are considered very challenging because they belong to a complexity class called 'NP-complete'. No one knows whether the NP-complete problems have polynomial or exponential time complexity, but we do know that if any of these problems is polynomial, then all of them are polynomial, and that if any of these problems is exponential, then all of them are exponential. Determining and proving the complexity of this class of problems is one of the great problems in mathematics and computer science. We can construct a problem-instance space (of one or more dimensions) that contains all instances of the problem as points. For simplicity, we'll assume that this space consists of two dimensions. Each point can be colored according to time required to solve the instance. Let tough problem instances (can't be solved within a reasonable time) be colored red and the other instances be colored blue. Our goal is to come up with an algorithm for which the instance space is solid blue. The problem instance space for Algorithm A and Algorithm B might look like this:
while the problem instance space to the right shows what happens when we use whichever algorithm is most efficient for each instance. If we follow the simple rule that we should use Algorithm A for any instance in the top half of the space and Algorithm B for any instance in the bottom half of the space, we are guaranteed to use the most efficient algorithm on each instance. The world is seldom so kind or simple, but many algorithms can make performance guarantees for regions of their problem instance spaces. By developing several very different algorithms for solving the same problem and determining the regions of the space for which they are effective, we may be able to greatly reduce, if not eliminate, the region in which red points can appear for an algorithm that makes uses the best algorithm for each instance. In the quest for finding polynomial-time worst-case algorithms for some challenging problems, I have taken to referring to the goal of finding such a patchwork algorithm as the goal of finding 'polynomial coverage'. In this way, algorithms that fail to provide the desired coverage throughout their problem instance space, but which are able to guarantee the desired coverage for some region of that space, may prove to be quite valuable as part of a patchwork algorithm. I have developed, so far, six or seven algorithms for determining whether a graph contains a clique of size k. I hope to take the time to determine regions for which each provides polynomial coverage and develop a patchwork algorithm that provides for polynomial coverage over as much of the problem instance space as I can achieve.
The Adaptive Market pattern
When trying to build something, it helps to have a variety of tools that are well-suited to the different activities that will be performed, as well as knowledge about how well each tool is suited to each activity. When trying to accomplish something by delegating components to service providers, it helps to have a variety of service providers that are well-suited to the different activities that they will be asked to perform, as well as knowledge about how well each is suited to the activity, including their reputations for quality, ability to estimate, ability to complete, ability to work on schedule, etc. The same applies to algorithm design. An algorithm may make use of other algorithms to accomplish its goal, and it may need to make the decision of which algorithms to use, as well as their parameters and time limits, based upon characteristics of the problem instance as well as a knowledgebase on the algorithms it can draw on. The Adaptive Market pattern is a multi-level free-market framework for selecting and applying the best tools for tasks that learns from experience and incorporates the elements of prediction, estimation, bidding, reputation, and adaptation. The Adaptive Market pattern began to take shape as I was working on my dissertation, which was entitled Elements of an Expert System for Determining the Satisfiability of General Boolean Expressions.
Clustering Many problems seem to boil down to clustering problems. The problem recently posed for the Netflix Prize, that of predicting how customers will rate movies, is one such problem as far as I am concerned. People think of movies as being more or less similar to each other in many ways. Some are simple: do they have directors, actors, actresses, or cinematographers in common? Others are more subjective, such as genre. Others may be very ideosyncratic and may be quite unconsciously held and applied. We can try to guess about which grouping criteria people use and the relative strength or importance of each, or we can look at what people actually do. I think the latter is a much more powerful, though more difficult, technique. Let's assume that a person is more likely to watch movies that he believes are similar to movies that he's seen before and liked, and to rate movies that seem similar to him accordingly. Most people will rate movies, in part, based upon how they regard the groups, or movie clusters, that the movies belong to. Each person may regard a movie as belonging to several movie clusters. People may differ significantly as to what constitutes movie clusters - that is, which movies belong to which clusters, and which movies are at the center of a cluster instead of at its periphery. How do we determine which handful of movies are the center of a cluster? How do we determine which clusters are worth identifying and using for classification? How can we distinguish the natural clusters that are significant for our purposes from the background noise? These questions are central. I have some ideas on these questions that I'll discuss in the future. Another example of where clustering comes into play involves determining whether a graph contains a clique of size k. Let's assume for this example that we are considering a graph of 100 nodes, that the probability of an edge between pairs of nodes is 80%, and that we are looking for cliques of size 20. We can start out by getting rid of any nodes that aren't connected to at least 19 other nodes, and performing some other reduction in addition to that. Once we've done that, we might be tempted to start clustering nodes together according to some criteria in hope of efficiently discovering a 20-clique in this way. This approach to clustering is really a search approach. I believe there are other methods of clustering that show some promise, but I haven't focused on them for several years.
Resonance This approach is both fun and frustrating. Fun because it sounds so simple, and frustrating because I haven't been able to use it effectively. If you want to determine whether a rock is solid but light or hollow, you can tap the rock with a hammer and listen to the sound that results. With practice, you'll be able to discriminate between hollow and solid rock reliably. If you want to determing whether an object contains a cavity of a specified dimension, you may be able to determine this by sending a pulse of the right frequency of energy into the object and trying to detect whether resonance has occurred. I believe that this would be the result of the frequency being the one that would result in a standing wave occurring in the cavity. I've been fascinated by the possibility of doing something similar with graphs. If I send a pulse into the nodes of a graph and propagate pulses very carefully at each clock tick, after the k-th clock tick a pulse would remain only if a k-clique exists in the graph. If only I knew how to do the 'propagate pulses very carefully' efficiently...
Search rather than research
In my note on Research Philosophy, I wrote:
People doing original work in many areas might find this approach unthinkable or counterproductive. They are probably right. I think it is more common among artists. I was happy to learn that several mathematicians of note have pursued their work in this way. While I think that people in many areas can benefit from this approach, it think it is unusually useful for algorithm designers.
Play
I do algorithm design for many reasons, including reputation, glory, financial recognition, and the betterment of the world. Since I've been keeping things to myself for twenty years, however, something else must be driving me - joy, pleasure, and exuberance. I love doing this. This is not work; this is my mind at play.
Time
I'm glad that I don't have to conduct my own research on a schedule, for not only would that take a lot of fun out of the process, but I think I'd accomplish much less. Much of this takes time - not just a certain number of hours, but those hours spread over months or years. Some things just have to simmer on the back burner for a long time, almost forgotten. Sometimes I've started things in a different order than I seemed to need to address them to make any progress. I rarely make conscious decisions about what to work on next. I'll work on one problem until another calls to me strongly enough that I think I'll have more fun working on that. After working on the Graph Isomorphism problem off and on over several years, I spent a long stretch working on one approach to the k-Clique Existence problem. Frustrated by an implementation bug, I decided to take a break and pulled out my Graph Isomorphism folder. It took a while to get the problem back into my head and then I remembered where I had gotten stuck. After an hour or so of working some examples on paper, I had one of those 'Aha!' moments in which the way became clear for me, and I had a sketch of an algorithm by the end of the evening. Of course, it took me a few months of uneasy happiness to discover that, for certain graphs, my approach wouldn't resolve the issue, and at least two years of wrestling with that final step. I believe that I know how to address the issue, but I haven't finished the implemention yet or proven it to my satisfaction. This remains my most urgent task and I hope to complete it within the next few months, when I'm ready. I seem to work on things when I am ready to work on them, and trying to do things out of their time usually isn't productive.
Highlights of my doctoral dissertation     I was awarded my doctorate in Computer Science from Northwestern University at Evanston, Illinois in 1990. My dissertation was entitled Elements of an Expert System for Determining the Satisfiability of General Boolean Expressions and was supervised by Dr. Lawrence Henschen.
The Constraint Calculus, an algorithm for determining all of the models of a general Boolean expression, is presented and proven to be both sound and complete. Algorithms for determining the satisfiability and falsifiability of general Boolean expressions are derived from the Constraint Calculus. A method for randomly generating general Boolean expressions is introduced. This method depends upon the ability to randomly generate each n-node unlabelled tree with the same probability. Elegant Normal Form (ENF), a canonical normal form for general Boolean expressions, is defined. A set of eight graph transformations for reducing a general Boolean expression to ENF is presented and proven to be a complete set of reductions. Since a complete set of reductions is guaranteed to reduce a structure to a single form regardless of the order in which the reductions are applied, as long as they are exhaustively applied, algorithms for reducing expressions to ENF can be designed for efficiency without regard for optimizing the resulting structure. An algorithm for reducing expressions to ENF is presented. Several metrics of general Boolean expressions and their various representations under the Constraint Calculus are defined. The results of applying the satisfiability-determination algorithm, as defined and as enhanced by the reduction-to-elegance algorithm, to 300,000 randomly generated general Boolean expressions having from 100 to 500 literals each are presented. Preliminary identification of metrics which are good predictors of an expression's logical class (contradiction, satisfiable and falsifiable, tautology), cost of satisfiability determination, cost of reduction to elegance, and the number of literals in the resulting expression are presented. 99% of the expressions required fewer than 10,000 set operations for reduction to elegance and retained fewer than 2% of their original literals. None of the elegant equivalents of these expressions retained more than 9% of the literals in the original expression. 100% of the contradictions were discovered to be contradictions during their reduction to elegance. 99.99% of the tautologies were discovered to be tautologies during their reduction to elegance. 84% of the expressions did not require backtracking during the final stage of the satisfiability-determination algorithm 99% of the elegant expressions did not require backtracking during the final stage of the satisfiability-determination algorithm.
I will eventually make my dissertation available on this site. In the meantime, I'll post several short pieces that I've derived from it.
     |
| Home >
Jabberwocky      |
|
| Craig Holman Research Jabberwocky Quotations Contact | |
| Patterncraft and Constraint and Consequence are trademarks of Craig Stewart Holman Copyright © 2008 Craig Stewart Holman All Rights Reserved. |
|
| | |
![]() |
|