CS322: CPU Scheduling

Introduction

We have seen that a major task of an operating system is to manage a collection of processes, and that (in some cases) a single process may be structured as a set of individual threads.
Both of these situations raise the following issue: on a system with a single CPU (or on a multi-processor system with fewer CPU's than processes), how is CPU time divided among the different processes/threads that are competing to use it?
The component of the operating system that addresses these issues is called the scheduler. As we shall see, scheduling is often handled on several levels, with CPU scheduling being the lowest level. So we will want to discuss scheduling in general and CPU scheduling in particular.
The scheduler works in cooperation with the interrupt system we discussed in the last class.
- The scheduler assigns the CPU to perform computation on behalf of a particular process (or thread within a process).
- The CPU can be "borrowed" from its current process by an interrupt. This is under the control of the external devices, not the scheduler - though interrupts can be disabled for a short time if need be.
- When a process (or thread) requests an IO transfer, it normally becomes ineligible to use the CPU until the transfer is complete.
  - This means that the scheduler will have to choose a new process (or a new thread within the same process) to use the CPU.
  - Eligibility for the process (or thread) that requested the IO to use the CPU is restored when the device in question interrupts to indicate that the transfer is complete. (In the case of character devices, this may be the last in a series of interrupts). Following the interrupt, the scheduler may be invoked to decide whether the CPU should go back to:
    - The process (or thread) that was running when the interrupt occurred.
    - The process (or thread) that requested the IO operation that caused the interrupt (if it is higher priority.)
    (But on other systems the CPU always goes back to the process that was running.)
  - Thus, the interrupt handlers have to interact in some way with the data structures used by the scheduler.
- A timer is also generally used to prevent a compute-bound process (or thread) from "hogging" the CPU. A timer interrupt may result in invoking the scheduler to give another process (or thread) a turn.
- In a multiprogrammed operating system, the device drivers together with the scheduler (or at least a portion of it) constitute the "kernel" of the operating system. Because the kernel routines must modify the scheduler data structures, it is common for them to run at least part of the time with interrupts disabled, so that an update is not interrupted in mid-stream.
We will now proceed as follows:
- Basic scheduling concepts
- Our main focus: Approaches to CPU scheduling in a multiprogrammed environment: types of scheduling algorithms and tuning and analysis of scheduling algorithms.
- For simplicity, we will talk in terms of scheduling processes, but the same principles apply within a threaded system.

Basic Scheduling concepts

Terminology
- Recall that a process is a program in execution. From a scheduling point of view, a process constitutes a claim on the system resources. A process is defined by:
  - A portion of memory that contains the program that is being executed by the process, together with its data.
  - The contents of the CPU registers used by the process (including the PC which determines which instruction in the program is to be executed next.)
- Many other terms have been used historically for what we now call a process. Probably the most common is job - so, to some extent, we will use the terms process and job interchangeably. Another term that occurs frequently is task. (Process is now the preferred term, but much of the nomenclature uses the older term job.)
- The execution of a process consists of an alternation of CPU bursts and IO bursts. A process begins and ends with a CPU burst. In between, CPU activity is suspended whenever an IO operation is needed.
  - If the CPU bursts are relatively short compared to the IO bursts, then the process is said to be IO bound. (Example: a typical data processing task involves reading a record, some minimal computation, and writing a record.)
  - If the CPU bursts are relatively long compared to the IO bursts, then the process is said to be compute bound. (Example: a typical number crunching task involves an IO burst to read parameters; a very long CPU burst - perhaps hours or more - and another IO burst to write results.)
- The state of a process: at any given time, a process is in one of several states. While the set of possible states varies from system to system, the following three comprise a minimal set:
  - Running: the CPU is currently executing the code belonging to the process. This means that the hardware CPU registers contain the values associated with the particular process; in particular, the hardware program counter is pointing to code belonging to the process.
  - Ready: the process could be running, but another process has the CPU.
  - Waiting (blocked): before the process can run, some external event (normally the completion of an IO transfer) must occur.
    As a process runs, it goes through a series of state transitions, of which the following graph is a simple rendition:
  - Note that, in this model, parallelism between computation and IO is on an inter-process basis: that is, computation for one process overlaps IO for other processes. It is also possible in some systems to have computation and IO for a single process overlap to some extent: a process may start an IO burst and continue computing until it reaches a point where further computation cannot proceed. (e.g. if the operation is a read, computation can proceed until the data read is actually used; if it is a write, computation can proceed until the buffer where the data is stored must be re-used.) If this is the case, then the model must be modified to show the RUNNING -> WAITING transition occurring as the result of an explicit WAIT request by the process, rather than as the automatic result of any IO request.
    Ex: VMS SYS$QIOW vs SYS$QIO and SYS$WAIT.
  - Another possibility not shown in the diagram is pre-emption: the scheduler may take the CPU away from a process involuntarily either because it has used up its time quantum or because another, higher priority process needs the CPU. This could be shown in the diagram by a line from RUNNING to READY labelled "pre-emption".
- To manage the various processes, a multiprogrammed system typically uses process control blocks (PCB's). A PCB is a data structure (normally stored in system memory) that records the process's state, registers (unless it is running) and other information such as priority, accounting data etc.
- Queues: In a multiprogrammed system, there are typically many shared resources: the CPU, disks etc. Since it is possible to have several different processes desiring to use a given shared resource at any time, the scheduler must maintain a queue for each resource.
  - In the context of schedulers, we use the term queue in a broader sense than the way we defined it in Data Structures. An operating system scheduler queue may be managed using a FIFO discipline, but often uses some other ordering mechanism such as some sort of priority scheme.
  - In the case of the CPU, the queue is called the ready list and contains all the ready processes that are not currently running.
  - In addition, queues may be associated with various IO devices that (such as disk) that may be accessed by more than one process at a time.
    - We will defer discussion of scheduling algorithms for these until later.
    - We will also sometimes speak of a queue as being associated with a non-shareable device such as a terminal. Of course, such a queue will contain at most one process.
  - Often the PCB's include one or more fields used to maintain the queues. For example, if a FIFO discipline is used, then each PCB might contain a field that can point to the PCB of the process that is next in line, or null if there is none. Note that, at any time, a PCB will be in exactly one queue (possibly at its head, using the resource, or possibly further back waiting on the resource) unless intraprocess overlap of IO and computation is allowed, of course.
  - When the process currently using a shared resource finishes using the resource, the next process in the queue is allowed its turn. In the case of IO devices, this is normally handled by the interrupt handlers - when an interrupt occurs, the current process is returned to the ready list (its IO request is complete) and the IO operation requested by the next process in line is started. In the case of the CPU, a portion of the scheduler called the dispatcher selects a new process for execution whenever the current process yields the CPU.
Scheduler goals: schedulers typically attempt to achieve some combination of the following goals. Note that, to some extent, these goals are contradictory - hence what is achieved must be a compromise:
- Maximize CPU utilization (due to its relatively high cost)
- Maximize utilization of other resources (disks, printers etc.)
- Maximize throughput = number of jobs completed per unit time.
- Minimize waiting time = total time a job spends waiting in the various queues for a resource to become available.
- Minimize turnaround time = waiting time + computation time + IO time
- Minimize response time (timesharing) = time from entry of a command until first output starts to appear. (Note that this is the only delay the user perceives, since the slowness of a typical terminal masks most subsequent waiting time unless the system load is very heavy.
- Fairness - all comparable jobs should be treated equitably.
- Avoid indefinite postponement.
- Uniformity - the behavior of the system should be predictable. (e.g. interactive users tend to prefer a response time of several seconds to response times generally very low with occassional long delays.)
- Graceful degradation - in the face of excessive loads, the system response deteriorates gradually, rather than coming to a sudden virtual standstill.
Types of schedulers: A multiprogrammed system may include as many as three types of scheduler:
- The long-term (high-level) scheduler admits new processes to the system.
  - Long-term scheduling is necessary because each process requires a portion of the available memory to contain its code and data. Thus, the total size of memory imposes an upper-limit on the multiprogramming level or number of active processes.
  - In a time-shared system, a common approach is to have the management establish an upper limit on the number of users allowed on line. Often this is done by allocating a fixed number of PCB's: when all are in use, no new process can be created. The long term scheduling algorithm is as follows:
```
whenever a new user attempts to login:

    if number of processes > maximum then
        create a new process for the user
    else
        tell him to come back later.
```
    (A certain amount of long-term scheduling also occurs as users voluntarily leave a heavily loaded system to return later when the usage is less.)
  - In a batch system, a long term scheduler may attempt to achieve a good balance (job mix) between compute-bound and IO bound jobs. Ideally, the job mix would include 1-2 compute bound jobs that can help maximize CPU usage, plus as many IO bound jobs as possible - preferably representing a good mixture of demands on the various shared IO devices.
    - Note that CPU time is wasted whenever the ready list is empty - i.e. all jobs are in the wait state. Compute bound jobs are almost always ready and thus help ensure that little CPU time goes unused.
    - If there are several shared IO devices (e.g. disks), it would be desireable to ensure that the needs of the various IO bound processes are distributed as evenly as possible among them, rather than having all processes competing for a single disk.
    - This kind of scheduling is hard to automate, since it is hard for the operating system to predict what a job will need. System management can also do long-term scheduling by scheduling batch runs throughout the week or month in a way that helps to improve the job mix.
- Medium-term (intermediate) scheduling is not found in all systems. The medium-term scheduler, if present, controls the temporary removal from memory of a process when this is expedient. (This is called swapping). The need for swapping may occur in two ways:
  - In a timeshared system, the long-term scheduler may admit more users than can all fit in memory. However, since time-shared jobs are characterized by bursts of activity interspersed with periods of idleness while the user thinks, the medium-term scheduler can swap-out temporarily inactive jobs. As soon as new input arrives from the user, the job can be swapped back in and another job swapped out.
  - In any kind of system, a sudden increase in the memory requirements for one job can make it necessary to either swap out the job that wants to grow until space is available for it or swap out another job to make room for it.
  - Note that the medium-term scheduler only swaps out jobs when it has too. Ideally, swapping should only occur when the usage level on the system is very high. Often, the onset of swapping brings a noticeable decrease in system performance - which may help to reduce the multiprogramming load on a timeshared system as users give up until later.
- The short term scheduler determines the assignment of the CPU to ready processes. Most of our discussion will focus on this kind of scheduling.
- Summary:
```
              ------------------------------------------
             |   Swapped-out processes                  |
             |                                          |
             |--------------- MT -----------------------|
Incoming     |  Memory-resident    |   Running          |
Jobs -----> LT  processes         ST   Process (1)      |
             |  (waiting or ready) |                    |
              ------------------------------------------
```
- Note on frequency of execution:
  - Long-term scheduler is executed infrequently - only when a new job arrives. Thus, it can be fairly sophisticated.
  - Medium-term scheduler (if it exists) is executed more frequently. To minimize overhead, it cannot be too complex.
  - Short-term scheduler is executed very frequently - whenever any IO request occurs, and often when one completes. Thus, it must be very fast. On newer machines, it is common to find hardware support for certain scheduler functions to help out.

Scheduling Algorithms

As noted above, our primary focus is on algorithms for short term scheduling, though most will be applicable to long-term as well.
Scheduling algorithms can be classified in two different ways:
- First-come-first-served (FCFS) algorithms vs priority algorithms.
  - The simplest scheduling algorithm is FCFS. Under this scheme, a queue (in the operating systems sense) is indeed a queue in the data structures sense.
  - As we shall see, it is often desireable to use some sort of scheme in which each process has a priority assigned to it. In a priority scheme, the highest priority ready process is selected; in the case of equal priorities, FCFS is used to resolve ties. (Note: we can think of FCFS as the degenerate case of a priority scheme in which all processes have the same priority.) Priorities can be
    - Externally assigned - eg. by system management.
    - Internally computed by some algorithm that seeks to maximize one or more performance characteristics.
    - Some combination of external and internal.
    Priority schemes require more sophisticated data structures for the various queues, to be discussed later.
- Among priority systems: Non-preemptive vs preemptive schemes.
  - Once a process has been granted the CPU, the simplest approach is allow the process to continue using the CPU until it voluntarily yields the CPU - eg by requesting an IO transfer. Of course, IO interrupts may steal the CPU from time to time; but after each interrupt, control passes back to the process that was running when it occurs. This is called a non-preemptive approach.
  - In a preemptive scheme, a running process may be forced to yield the CPU (thus returning to the ready list) by an external event rather than by its own action. Such external events can be either or both of the following kinds:
    - A higher priority process enters the system from outside.
    - A higher-priority process tha was in the wait state becomes ready. This could occur as the result of an IO interrupt that moves a formerly waiting process to the ready list.
  - A technique closely related to preemption is timer interruption. When a process is given the CPU, a timer may be set to a specified interval. If the process still hasn't yielded the CPU of its own accord at the end of the interval, then it is pre-empted.
    - We may think of this as a form of pre-emption, since both timer interruption and preemption force a process to yield the CPU before its CPU burst is complete.
    - However, it is helpful to distinguish timer interruption from pre-emption caused by higher priority processes becoming ready for two reasons:
      - Timer interruption is a function of the particular process's own behavior, independent of the rest of the system.
      - Almost all multiprogrammed operating systems use some form of timer - if for no other reason than to prevent a process that is in an infinite loop from tying up the system forever. But preemption for a higher priority process is a feature that may or may not be included in a given operating system.
    - Nonetheless, we will sometimes use the term "preemption" as a blanket term for either form of forced CPU yielding.
  - It should be recognized that pre-emption (of either sort) has a price connected with it: the overhead of changing over from one process to another (context changing.) This entails the saving of the various CPU registers in the PCB of the process being pre-empted, and the loading of the registers from the PCB of the new process. Of course, this also occurs when a process yields the CPU due to IO; but this is not optional whereas preemption is. Thus, a pre-emptive scheme may try to minimize the need for preemption (of either sort) in various ways, as we shall see below.
    Note: some CPU's attempt to reduce the cost of context switching by providing special instructions in the instruction set that cause a context change to occur. E.g. VAX LDPCTX, SVPCTX instructions together do what would otherwise require >20 instructions to save each register individually and another >20 to load each.
- The various options yield a number of possible combinations. One can have:
  - An FCFS scheme that is non-preemptive. (The simplest of all cases.) This approach is rarely used for short-term scheduling, but is often used at least implicitly for long-term scheduling - e.g. on a timeshared system, if all job slots are full a user may not be allowed to log in, but his effort to log in may be recorded and he may be given an opportunity to log in when his turn comes up.
    Example: Large universities that face heavy user demand sometimes have a system that will allow a user to go through the login sequence and then will notify him that the system had too many users, will put him in a queue, and will inform him of his place in line. When room becomes available, a message will be sent to the terminal he is at, notifying him that he can now log in for real. (If a users failed to respond to such a message within five minutes, another user will be allowed on instead.)
  - An FCFS scheme that is non-preemptive (in the strict sense of the term) but which uses timer interrupts. Many time-shared systems use this scheme at least in part - i.e. ordinary users are allowed slices of the CPU on a "round robin" basis. (Such schemes are more generally priority schemes in which the majority of users have the same priority.) (The round robin scheme is sometimes abbreviated RR).
  - A priority scheme that is non-preemptive. Whenever a running process voluntarily yields the CPU, the next highest priority process is selected. Once it gets the CPU, a process keeps it until it yields. This approach is rarely useful, however.
  - A priority scheme that uses timer interruption. Same as the above, except a process may be forced to yield by time quantum interruption. Many operating systems use this approach for short term scheduling. One problem to be faced in this scheme is this: if a high priority process is pre-empted due to a time quantum expiration, then how is the next process to run selected? If we simply take the highest priority ready process, then we would simply select the process that was just pre-empted since, by assumption, if it was running then it was the highest priority running process.
    Often we find that the majority of the processes on a system may have the same priority, especially if priorities are externally assigned. In this case, we use FCFS within a priority; so the pre-empted process is placed behind all other processes of the same priority in the queue.
    The priority of a pre-empted process may be reduced in some way to enable another process to get a crack at the CPU. (More on both of these when we come to data structures for the queues.)
  - A priority scheme that is preemptive. This is the most sophisticated approach, and along with the previous scheme is one of the most common short-term approaches.
We have classified scheduling algorithms broadly as FCFS vs priority and non-preemptive vs preemptive. Within the former group, priority algorithms can be further classified on the basis of how priorities are assigned. We have already noted the distinction between externally assigned and internally assigned (computed) priorities. Another way to look at this distinction is that external priorities are generally static (i.e. they stay the same throughout the life of a job) while internal priorities are often dynamic (they are recomputed on the basis of the process's behavior.)
- Ways of assigning priorities externally:
  - On the basis of the type of process:
    - In a foreground/background system, the foreground process (which is typically interactive) always has priority over the background process (which by definition is a process that makes use of CPU cycles not needed by the foreground process.)
    - Timeshared systems often allow batch processes to run as a sort of background set of processes. Again, the foreground interactive processes have priority.
    - Some systems (eg. VMS) allow for a third type of process: real time processes that are controlling some external device needing fast response. These processes are often idle for much of the time, but have priority over all other types when they need the CPU. (Example: DPX_Link at Gordon.)
  - On the basis of organizational priorities: certain departments or functions have a greater need for fast service, and so processes servicing them are given greater priority. This may be handled on a blanket or case-by-case basis. For example, if the payroll run is delayed and there is a threat of checks not going out on time, then the system manager may give that process priority for a particular run.
  - On the basis of payment for service. If computer time is being sold to outside users, multiple classes of service may be provided, with higher priority for higher billing.
- Ways of assigning priorities dynamically (internally) - a more interesting question, since a good priority algorithm can enhance system performance.
  - Shortest job first (SJF) - priorities are assigned in inverse order of time needed for completion of an entire job (for long term scheduling) or for the next CPU burst (for short term scheduling.)
    - As noted in the book, it can be shown that this scheme minimizes AVERAGE turnaround time, since moving a shorter job ahead of a longer job decreases the turnaround time for the shorter job more than it increases the turnaround time for the longer job.
    - This scheme can be used for long-term scheduling. Often, on a batch system, users are required to submit an estimate of CPU time (and perhaps other resources) needed for their job. If these are used for SJF scheduling, then the user needs an incentive to use resources efficiently and to estimate carefully:
      - Generally, a job that exceeds its estimated resource consumption is aborted. Thus, an unreasonably low estimate may get quick service - but if the job is aborted, nothing is gained.
      - Alternately, a job that exceeds its estimated time may be shelved and restarted later, or may be charged a premium charge for the excess time.
      - Drastically overestimating the resources needed will prevent aborts - but will increase turnaround time. Likewise, jobs that hog resources will suffer delays.
    - At first, this scheme appears hard to use for short-term scheduling. How does the scheduler know how long the next CPU burst will take?
    - However, if CPU burst lengths do not vary too widely, some sort of algorithm may be used for estimation. The text discusses the idea of exponential averaging: a process's estimated burst duration is a weighted average of:
      - the predicted duration of its last burst (which incorporates information about its long term history)
      - the actual duration of its last burst.
      The formula that we use is
```
	t		= alpha * t         + (1-alpha) * t
	 n+1 predicted             n actual 		   n predicted

       (where alpha is a parameter between 0 and 1 that determines
        the relative weight given to the two terms.)

       For example, using the above with alpha = .5, we get 

	t	      = .5 t               + .5 t
         n+1 predicted      n actual             n predicted

                      = .5 t    + .25 t    + .125 t    + ...
	                    n          n-1         n-2

                        n
                      ____
                      \     (n-i)
		    = /  (.5)     * t 
		      ----           i actual
                      i = 1
 
```
      (Note that this can be efficiently implemented in software. In each PCB we store the predicted value for the current burst. When the burst terminates, we add in the actual value and then shift the result right one place = dividing by 2.)
    - If used with a pre-emptive system (of either sort), SJF generally compares the remaining time needed for each job, rather than the original time. For a long-term scheduler, this is the original estimate - total service received; for short-term, it is the estimated time for the current quantum - the portion of the quantum received. In either case, the scheme is called Shortest Remaining Time First (SRT).
  - Highest Response Ratio Next (HRN) (Brinch Hansen - 1971). One problem with SJF/SRT is that it is strongly biased toward shorter jobs and can, indeed, lead to a long job being indefinitely postponed. The book noted that this problem can be dealt with by some sort of aging strategy: as a job waits, its priority is boosted until it eventually gets run. One method for doing this is as follows: For each job, calculate its priority as
```
           time waiting + service time
priority = ---------------------------
                 service time
```
    (where a higher numerical value means a higher priority). The name comes from the fact that the numerator represents the total turnaround time/response time for the job if it is given the processor now, while the denominator is the actual time needed. The response ratio is the ratio of actual turnaround time to time needed
  - Other considerations: in computing a process's priority, one or more of the following additional considerations may enter in - perhaps in the form of an adjustment to the basic value computed using SJF, SRT, or HRN:
    - If SJF or SRT is used, then some allowance for aging, like a periodic increment of the priority value.
    - If a low priority process holds a non-shareable resource that is desired by a high priority process, then the low priority process's priority may be boosted so that it can complete and make the resource available.
    - A process that has just been swapped into memory by the medium term scheduler may be given increased priority for access to the CPU and/or for protection against being swapped out, so as to not waste the investment involved in swapping it in.
    - A process that is using an otherwise under-utilized resource may be given increased priority.
    etc.
- Hybrid methods: In a scheme where priority is basicly static (external), minor dynamic adjustments may be made to the base priority for fine tuning.
  Example: In VMS, each (non real-time) process is assigned a base priority when it is created, on the basis of a value assigned to the user by the system manager. A process's priority is increased when certain events occur - e.g. if it has been waiting for terminal input and the terminal input is received. Once boosted, the priority can be dropped down if the process's CPU burst exceeds its time quantum - but never below its base priority.

Time quantum selection

We have noted that most multiprogramming operating systems use some form of timer interruption to keep a single process from hogging the CPU. A key design question is the length of the time quantum to be used.
- If the time quantum is of magnitude comparable to the time needed for context switching, then the overhead of context switching will require an unduly high share of the processor cycles.
- If the time quantum is too long, then response time on a time shared system will begin to degenerate, and IO device usage will drop on any kind of system as IO bound processes are unable to get the CPU to start a new IO operation when the current one completes because a compute bound job has it.
One textbook author suggests that one picture an operating system as having a dial on the front labeled "q" (for quantum). Suppose the dial on a timeshared system is initially set at 0.
- No one would get any work done - as soon as a process gets the CPU, it has to yield it.
- As the dial is slowly turned up, system response would begin to improve. At low settings, context changing overhead might still occupy a high percentage of the overall CPU time, so user processes would be slowed down. But this overhead would decrease as the quantum is increased.
- As "q" is increased, more and more of the processes on the system would be able to complete a CPU burst without being interrupted by the timer.
- However, at some point the response time would begin to deteriorate again. This would be due to an occassional compute-bound process taking a large slice of time while all the other processes wait.
- Assuming there is a heavily compute bound process on the system, further increases in "q" beyond the optimal point would continue to degrade response time.
- Note, however, that if we measure throughput in terms of CPU usage, it is possible that throughput might continue to increase even after response time has begun to decrease, since less and less of the CPU time goes to overhead. However, eventually a point would be reached where even throughput would decay; when a compute bound process does yield, other IO bound processes might not be able to take up the slack before themselves having to yield the CPU for IO. This might lead to the CPU sitting idle while all processes wait on IO.
A good rule of thumb would seem to be that the majority of processes should be able to complete a CPU burst without timer interruption. (The text suggests around 80% as a rule of thumb. This might be higher in a highly-interactive situation, or lower in a situation where there is much heavy computation going on.) The the time quantum will generally be a fraction of a second - perhaps between .1 and .5 or so.
It is also possible to relate time quantum to priority. A compute bound process that could benefit from a long time quantum might be given a much longer than usual quantum, but also a lower priority. This would mean that it gets CPU bursts less often, but gets longer bursts each time, resulting in the same overall average CPU use with less overhead. (See discussion of multi-level feedback queues below.)

Scheduling data structures

We noted above that, in the case of a FCFS algorithm, the "queue" of processes waiting for the CPU is indeed a queue in the data structures sense. Such queues can be implemented by linking PCB's together using a special field reserved for that purpose, with each PCB pointing to its sucessor. Two external pointers - one to the front of the queue and one to the rear - complete the implementation. Adding a new process to the end of the queue, and dispatching a process from the front are both easy.
```
         _______        _______
	| Front | ---> | PCB   |
         -------       |       |_____
        | Rear  | -     -------     /
         -------   |    _______    /
                   |   | PCB   |<--
                   |   |       |_____
                   |    -------     /
                   |    _______    /
                   --->| PCB   |<--
                       |   null|
                        -------
```
The injection of priority necessitates a more complicated - and potentially more costly - data structure. Remember that the short term scheduler, in particular, is executed very frequently. Thus, overhead time must be kept to a minimum.
- We could maintain a linked list in descending priority order. Finding the next process to run can be done in O(1) time; but inserting a new process involves O(n) cost - where n is the length of the queue. This may be acceptable if it is known that the queue will always be short.
- We could use some sort of tree structure - eg. a binary search tree based on priority order, or a heap. Such a structure might have both dispatch and insert times of O(log n). (Though a tree could do much worse if most priorities are the same; it would degenerate, in this case, to a linked list. A heap-based priority queue would not suffer this problem, though.)
When the set of possible priorities is limited, another approach may be used: multi-level queues. A single queue is maintained for each priority level. When a job is inserted, it is added at the end of the queue for its priority level, since we assume that FCFS is used between jobs of equal priority. The dispatcher always selects the front job from the queue having the highest priority.
```
      Example:	Priority 3:     Job X   Job D    Job M

                Priority 2:     Job Y   Job W

                Priority 1:     Job L   Job Z
```
Assuming that priority 3 is the highest, the dispatcher will select Job X next. If a job A with priority 2 becomes ready, it is inserted after Job W in the level 2 queue.
Multi-level queues are the basis of an interesting scheme to handle the time quantum problem discussed above. The scheme is called multi-level feedback queues.
- The operating system maintains a hierarchy of queues of decreasing priority.
- A newly arrived process is placed at the rear of the highest queue. When it reaches the front, it is given a single time quantum of CPU time.
  - If it yields the CPU before the quantum expires, it is placed at the rear of the original queue.
  - If not, it is placed at the rear of the next queue down.
- If the dispatcher finds the highest level queue to be empty, it selects a job from the next level down, but with a time quantum twice the basic value. If that queue is empty, it goes down another level, this time offering (say) 4 times the basic quantum.
  - A job that completes its CPU activity before the quantum expires may move up a queue next time (unless it is already in the highest queue).
  - A job that fails to complete its CPU burst within the time quantum is moved down to the next queue, until it reaches the bottom, where it remains.
- Of course, many variants on this scheme are possible. If it is used on a particular system, then management may be given a means to tune the system by adjusting various parameters, such as:
  - The quantum for each level.
  - The rule that determines when a job moves down (perhaps it should be given a second chance the first time it exceeds the time quantum for its level.)
  - The rule that determines when a job moves up.
  etc.

Analyzing scheduling schemes

With so many choices to be made among scheduling, it is desireable to have some basis for choice. In general, two approaches can be taken: various schemes can be analyzed; or a general scheme can be built with parameters (such as time quanta values, priority increments for certain events etc.) that can be tuned on the field.
Discussion of analytical schemes is beyond our purpose here. We note that they include methods like:
- Direct analysis: plugging specific data in and seeing what results.
- Applying queueing theory to obtain descriptive equations, such as Little's rule described in the text.
- Simulations.
System tuning is still an important component of achieving efficiency - in other realms as well as scheduling. Unfortunately, much of this is still trial and error; but an experienced system manager will tend to discover those techniques that work for him.

$Id: cpu_scheduling.html,v 1.1 1998/02/04 04:56:40 senning Exp $

These notes were written by Prof. R. Bjork of Gordon College. In January 1998 they were converted to HTML and lightly editited by J. Senning of Gordon College.