Rapid Prototyping of Feedback Control Systems

Companion web page for the ICAC'2012 submitted paper

Overview

To illustrate our approach we take an example scenario for an HTC environment. We consider Condor infrastructure for executing large scientific workflows. With too many concurrently running workflow enacting engines (DAGMan) submitting lot of jobs, the Condor scheduler can become a bottleneck an eventually overload leading to overall system throughput degradation.

Our objective is therefore to quickly graft a control system that would ensure an high throughput, while preventing the infrastructure overload.

Following figure shows the architecture model developed for this scenario:

The trigger periodically (every t seconds) observes the state of the system using these three sensors:
  • The queueStat provides the information about the number of jobs in the scheduler queue (N) using the output of the condor_q command.
  • The current service rate ( ) is computed from the condor history file by the serviceRate.
  • The number of running DAGMan processes (m) is obtained from the ProcessCounter by executing the system ps command.

The N and the outputs are stabilized using the moving average queueStatAvg, serviceRateAvg before we pass them to the submissionRate. This controller is responsible of computing the delay that will be imposed system wide to all running DAGMans.
The delay is in our example simply written into a file by the delayer from which it is read by DAGMan the next time it tries to submit a job.

Next, we extend the PeriodicTrigger type to provide an effector setPeriod that would allow us to change the initial trigger period (t - initialPeriod property) and the CondorQueueStat sensor type in order to provide an information about how long did it take the last time to execute the condor_q command. The last step is to connect them in a new controller triggerRate : TriggerRateController that will be responsible for adjusting the trigger rate (t) based on the execution time of the condor_q.

Note: This example is not meant to reflect a real world Condor a deployment. As with the presented adaptation model, the main purpose is to illustrate the capabilities of the proposed architecture modeling and supporting tools as an overall approach to the engineering problem of these systems.

Architecture Model

system {

    // data types in the system
    typedef float
    typedef int32
    typedef string

    typedef struct SubmissionRateInput {
        serviceRateAvg : float
        processCounter : int32
        queueStatsAvg : int32
    }

    // default data and control links type
    // in thies scenario we do not have any requirements on a specific link behavior
    dlink DL
    clink CL

    <typename T> filter<<T>> MovingAverage {
    // note: we can omit at the type level the link
    // mode if the component is agnostic whether the data are being pushed to it
    // or it has to full it. The extra pull code will be added based on the actual mode
    // setting at the instance level
        required dlink<<T>> input : DL
    } 

    <typename T> active sensor<<T>> PropertyGetter

    <typename T> effector<(<T>)> PropertySetter

    <typename T> active filter<<T>> PeriodicTrigger {
        property<float> initialPeriod

        required observing dlink<<T>> input : DL

        provided sensor period : PropertyGetter<T=float>
        provided effector setPeriod : PropertySetter<T=float>    
    }

    sensor<int32> ProcessCounter {
        required property<string> processName
    }

    filter<SubmissionRateInput> Synchronizer {
        required observing dlink<float> serviceRate : DL
        required observing dlink<int32> processCounter : DL
        required observing dlink<int32> queueStat : DL
    }

    sensor<float> CondorServiceRate {
        required property<string> condorConfigPath
    }

    sensor<int32> CondorQueueStat {
        required property<string> condorConfigPath

        provided sensor execTime : PropertyGetter<T=float>    
    }

    effector<(int32)> CondorDAGManDelay

    controller SubmissionRateController {
        required notifying dlink<SubmissionRateInput> input : DL
        required clink<(int32)> delay : CL
    }

    controller TriggerRateController {
        required notifying dlink<float> input : DL
        required clink<(float)> period : CL
    }

    main composite Main {
        required property<string> condorConfigPath = string("condor_config")

        // the main sensors
                // at this point we need to specify all required properties
        feature serviceRate : CondorServiceRate (condorConfigPath(:condorConfigPath))
        feature processCounter : ProcessCounter (processName(string("condor_dagman")))
        feature queueStat : CondorQueueStat (condorConfigPath(:condorConfigPath))

        // averages
                // at this point we need to specify the mode of the link that has been left
                // unspecified in the type declaration
                // also the appropriate type parameters need to be specified
        feature serviceRateAvg : MovingAverage<T=float> (input(observing))
        feature queueStatAvg : MovingAverage<T=int32> (input(observing))
        feature execTimeAvg : MovingAverage<T=float> (input(notifying))

        // a latch for the synchornizing the input
        feature sync : Synchronizer

        // the period trigger
        feature trigger : PeriodicTrigger<T=SubmissionRateInput>

        // controllers
        feature submissionRateController : SubmissionRateController
        feature triggerRateController : TriggerRateController

        // delayer
        feature delayer : CondorDAGManDelay 

        // bindings - 1. loop
        dbind serviceRate to serviceRateAvg.input as b1
        dbind queueStat to queueStatAvg.input as b3
        dbind processCounter to sync.processCounter as b2
        dbind serviceRateAvg to sync.serviceRate as b4
        dbind queueStatAvg to sync.queueStat as b5
        dbind sync to trigger.input as b6
        dbind trigger to submissionRateController.input as b7
        cbind delayer to submissionRateController.delay as b8

        // bindings 2. loop
        dbind queueStat.execTime to execTimeAvg.input as b9
        dbind execTimeAvg to triggerRateController.input as b10
        cbind trigger.setPeriod to triggerRateController.period as b11        
    }
}

SPIN Verification Support

The architecture model can be translated into a Promela model:

  • Main.pml (main composite from the model above)

SPIN verifier can than be used to validate some LTL formulas:

  • !([] (queueStat__act -> (<> dagmanDelay__act))) (the LTL formulae has to be negated as it will be part of the never claim)

Main.pml (11 KB) Filip Krikava, 19/03/2012 01:40

ArchitectureModel.png (84.3 KB) Filip Krikava, 19/03/2012 02:01

Also available in: HTML TXT