JustKernel

Ray Of Hope

dynamic scheduler that can self-learn

I want to make scheduler dynamic so that it can give maximum performance based on the different workload by adjusting its configuration. To start with a simple case I have chosen runqueu configuration.

Problem Statement :

Consider a simple usecase: The person who doesn’t have much knowledge about the scheduler will have a hard time configurig the runqueue type. And even if he knows, the workloads are dynamic and constantly change and require different runqueue configuration to give the max performance.

Solution:

What I propose is to dynamically adjust the runqueue type per cpupool, based on the workload to get the maximum performance.

High Level Design:

Fetch the performance related data from scheduler ——-> feed into Regression Algorithm OR Classification Algorithm (AI) which trains itself by constant stream of perforamnce number and gives the best fitted runqueue —-> dynamically change the runqueue type for the scheduler.

The problems that I can forsee:

1) what all trace data we can use from scheduler that can help to deduce the performance at a given instance.

2) how to pass scheduler data from the scheduler to the userspace and share it with third party library.

3) Which third party library to use for analyzing the data. Which algo to use..

4) Licensing issues.

5) Currently runqueue configuration is static and can’t be set dynamically.

Here the main challenge will be to collect the scheduler specific data that we can treat as the training data. I am looking at some of the chess playing learning algorithms to understand how they are making computer to learn about the best possible moves ie. building the training set on the fly and improve after each run.

Some references that I found:
https://erikbern.com/2014/11/29/deep-learning-for-chess.html

https://github.com/erikbern/deep-pink

deepchess-end-to-end-deep-neural-network-for-automatic-learning-in-chess.pdf

github repo for the project: https://github.com/Justkernel/intellisched

So to start with I need a training set to train the scheduler to adjust the runqueue configuration to give the maximum performance.
Was going through the various trace parameters (logs) that Credit2 produces that can show the performance of the scheduler at a given instant:
TRC_SCHED_BLOCK
TRC_SCHED_SWITCH
TRC_CSCHED2_CREDIT_BURN
TRC_CSCHED2_CREDIT_RESET
TRC_CSCHED2_UPDATE_LOAD
Per runq load.

I got lost in the thoughts about collecting huge logs of data and then trying to identify using various workloads which parameters impact performace. Whole picture looked quite complicated and I was not getting answers.

Then, I thought to further simplify the problem. Lets take avg runq load as the parameters impacting performance. Measure the avg runq load using runq per cpupool, runq per socket, runq per cpu using same workload and collect the trace data (avg runq load data points in over a period of time using same workload and different runq arrangement). The set of best avg runq load values will give me a simple short training set.

Now next step is to make the scheduler change runq configuration to achieve the best possible avg workload per runq. for eg if we start runq = cpupool, if the delta between current avg workload and training set values is bigger than the threshhold, then a switch to other runq configuration is done and again the delta is measured.

Though this a very simple mechanism to make the scheduler intelligent to adjust its configuration, but it will be nice step towards future.

Then, I can increase the workloads to 2 – 3 and based the runq configuration based on the workload and runq avg workload.

Then, I can bring in other scheduler parameters like cpu idle time, cpu cycle steal time and based on the weights of each of the parameter (weight in respect to its impact on performance) and and workload the configuration parameter will be chosen.

Then we can look into adjusting other parameters and configurations dynamically to get the maximum performance.

(I am not keen on using any AI jargons here, just want to keep it simple so that anyone reading this blog can understand the simple approach. Along with this, I will also not be using any third party libraries (Octave or Matlab or Python AI lib) that provide built in functions to make predictions. I will use C and will do all the mathematical calculations required to make predictions.)

Thanks
Anshul

Tags: ,


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.