Constraint and Consequence
 
   
  Craig Holman   Research   Jabberwocky   Writing   Quotations   Contact
 
  Home > Jabberwocky           Previous
 
 
 

Jabberwocky for November 30, 2010

 
 

Alan - Algorithmic Definition Language - Parallelism

          First in thread      Previous in thread

One of the things that I'd like Alan to support well is parallelism, which includes:

  • Indicating regions of the algorithm that can be performed in parallel as well as regions that must be performed sequentially. These regions may be nested.
  • Parallel blocks controlled by a range or other collection
  • Capturing the results of algorithm execution in a parallel block
  • Defining pipelines
  • Specifying access to shared collections and other objects in parallel blocks
  • Discovering opportunities for parallelism
  • Validating explicit parallelism claims

The following examples are brief snippets of text that could be in the definition section of an Alan definition, or sketches of such snippets. Remember that these represent ideas that are still being played with, and will likely go through several changes.

Regions of an algorith may be marked as being necessarily sequential or able to be performed in parallel. Algorithm text in the definition section of an Alan document is sequential. Subordinate blocks may be declared to be sequential or parallel. Sequential and parallel blocks may be nested arbitrarily.

    definition
    {
        parameters(...)

        statement
        parallel
        {
            statement
            statement
            sequential
            {
                statement
                parallel
                {
                    statement
                    statement
                }
            }
            statement
        }
        statement
    }

A parallel block can consist of two or more statements that may be executed in parallel:

    parallel
    {
        statement
        statement
    }

A parallel block can controlled by the enumeration of a range or other collection:

    parallel for b in 1 .. k
    {
        DoSomething(b)
    }

A parallel clause may contain an until or while clause, such as:

    parallel until 1 finishes
    parallel until all finish
    parallel until majority true
    parallel while not solved

An algorithm may return a value - this is accomplished by marking one of its parameters with return.

Suppose that Calculate is an algorithm that has two parameters:

    algorithm Calculate
    {
        definition
        {
            parameters ( in const Natural x,
                         return Real result )
            ...
        }
    }

In the following parallel loop, each call to Calculate returns a Real result which is discarded because it isn't captured:

    parallel for b in 1 .. k
    {
        Calculate(b)
    }

By specifying a capture clause immediately after the parallel block, we can capture the results of the calls as a map from parameter tuples to results:

    parallel for b in 1 .. k
    {
        Calculate(b)
    }
    capture results

The object that is captured into can be queried in a variety of ways that will be covered in a later post. Operations appropriate to the algorithm return type, including aggregation operations, may be performed on the capture object. For example,

    Real sum = results.Sum

Rather than capture return values into a map through the use of a capture clause, the return values can be piped into a new block:

    parallel for b in range
    {
        Phase1(b)
    }
    pipe as individualResult
    {
        Phase2(individualResult)
    }

This structure constitutes a pipeline. A pipeline can be extended with as many blocks joined together with pipe clauses as desired. In the previous example, the result of each Phase1 call is passed through to the second stage of the pipeline as 'individualResult' as soon as it is returned. You can think of each individual result as activating a new instance of the second stage of the pipeline.

A pipe region is sequential by default. If you want it to be parallel, mark it as follows:

    parallel for b in range
    {
        Phase1(b)
    }
    pipe as individualResult parallel
    {
        Phase2(individualResult)
    }

It may not be necessary to specify a loop-control variable - for example:

    for b in range
    {
        Calculate(b)
    }

can be rewritten as:

    for range
    {
        Calculate(*)
    }

Similarly, it isn't necessary to specify a loop-control variable in a parallel statement - for example:

    parallel for range
    {
        Calculate(*)
    }

I would recommend the use of implicit loop-control variables and the associated * reference mechanism when possible, though nested loops will require explicit loop-control variables.

    parallel for range
    {
        Phase1(*)
    }
    pipe as individualResult
    {
        Phase2(individualResult)
    }

    parallel for range
    {
        Phase1(*)
    }
    pipe
    {
        Phase2(*)
    }

The output of one stage can be piped into two or more pipeline branches through the use of a pipe parallel clause:

    parallel for range
    {
        Phase1(*)
    }
    pipe parallel
    {
        Phase2a(*)
        Phase2b(*)
    }
    pipe
    {
        Phase3(3)
    }

In the previous example, the second stage consists of two independent branches of the pipeline that reunite for the third stage.

The results of the independent branches of the second stage can be captured with capture clases after any of the statements or blocks in the second stage - for example:

    parallel for range
    {
        Phase1(*)
    }
    pipe parallel
    {
        Phase2a(*) capture resultsA
        Phase2b(*) capture resultsB
    }

The pipe clause may be decorated with a variety of subclauses, such as:

    pipe until 3 finish

When three values pass through this this pipe, the pipeline is shut down.

pipe clauses that contain a while or an until subclause will kill the pipeline that they participate in when while condition becomes true or the until condition becomes false.

pipe clauses that do not contain a while or an until subclause will not kill the pipeline that they participate in - any subclauses determine what and how much gets passed through the pipe.

The next pipe clause only sends the first 100 results through the pipe:

    pipe first 100

The next pipe clause sends through the pipe only the true Boolean results:

    pipe where true

The next pipe clause sends through the pipe at most 100 true Boolean results:

    pipe first 100 where true

The next pipe clause send through the pipe all results until a false Boolean result is encounted:

    pipe while true

The next pipe clause send through the pipe all results that are in successfulOutcomeSet:

    pipe while * in successfulOutcomeSet

Some subclauses may be combined: the next pipe clause sends at most 100 true Boolean results, shutting down the pipeline on encountering the first false result:

    pipe first 100 while true

I will explore parallelism in Alan further in future posts. Some parallelism issues that are far from settled in my mind are:

  • How can the partitioning of a parallel-controlling collection or range be specified or hinted at?
  • How can use of shared objects in a parallel block be specified or hinted at?
  • How can opportunities for parallelism be automatically discovered in regions that are not marked as parallel?
  • How can the explicit parallel claims in an Alan document be validated automatically?
 
 
 
  Home > Jabberwocky           Previous
 
  Craig Holman   Research   Jabberwocky   Writing   Quotations   Contact
 
 
 
  Patterncraft™ and Constraint and Consequence™
are trademarks of Craig Stewart Holman

Copyright © 2010 - 2011 Craig Stewart Holman

All Rights Reserved.

 
   
 
  Patterncraft logo