CS222 Lecture: CPU Implementation revised 10/20/2000
Materials: Transparency of Patterson-Hennessy figures 5.35, 5.32
I. Introduction
- ------------
A. We now shift the focus of the course from computer architecture to
computer organization - going from describing how a computer behaves at
the machine/assembly language level to how it is implemented.
B. At the start of the course, we noted that a computer system may be viewed
at various levels of abstraction: (ASK)
1. The user level
2. The higher-level language programming level
3. The machine/assembly language programming level
4. The hardware design level
5. The solid-state physics level
We have spent the first half of this course at the machine/assembly
language programming level. We are now going to drop down to the
hardware design level, to see how the functionality we have been
studying can actually be realized.
C. It turns out that each of these levels can be divided into sublevels
for detailed study.
1. For example, we studied machine language as one level for a couple of
weeks, and then built assembly language as a higher level on top of
that.
2. At the hardware design level, we can talk about the following
sublevels:
a. The system level (coming in a week or so)
b. The CPU implementation level (we start this today)
c. The logic design level (we did this in CS221)
3. Our particular concern at this point is with the implementation of
the CPU, which is the component that implements the architectural
capabilities we have been studying for the first half of the
course. From there, we will move on to consider how the CPU is
combined with other subsystems (IO, memory, etc.) to build a
complete computer.
D. Rather than attempting to describe the implementation of a specific CPU,
we will talk in more general terms about what might be needed to
implement an Instruction Set Architecture (ISA). Our examples will
mostly be based from a one-address architecture machine.
E. Consider a typical one-address machine instruction - say ADD.
ADD X ; Meaning AC <- AC + contents of memory cell "X"
What must take place to perform this instruction (beginning with fetching
the instruction itself)? (ASK)
1. Instruction fetch (IF) - go to the memory location pointed to by the
PC, and fetch the instruction stored there; then update the PC.
2. Instruction decode - (IOD) determine out what the instruction is.
(Exactly how this done depends on how the control unit of the CPU is
implemented, something we will discuss in the next series of lectures.)
NOTE: The above two steps are always the same, regardless of what
instruction is being executed.
3. Operand address calculation (OAC) - this depends on what addressing
mode is being used (e.g. it might involve simply extracting an absolute
address from the instruction; or it might involve adding a displacement
to the PC or some other register; or it might even involve a trip
to memory if some sort of deferred mode is being used.
4. Operand fetch (OF) - get the value of the operand from the memory
address just calculated.
5. Execution (EXEC) - add the value just fetched to the AC, and store the
result back into the AC.
NOTE: The actual execution of the computation is just a small part of
the effort involved in executing the instruction.
F. Of course, the exact series of steps (after instruction fetch and
decode) will vary with the instruction being executed.
1. An instruction like STORE X will not need an operand fetch step, but
will need an operand store (OS) step as its last step - store the AC
in the location whose address was calculated in the OAC step.
2. An instruction like branch will require an OAC step, but not an OF
or OS step.
3. An instruction that doesn't involve any memory location (e.g.
shift the AC left one place) won't need OAC, OF, or OS - but will
still need OF, IOD, and EXEC.
G. The steps will also vary with the architecture of the machine:
1. A memory-memory architecture machine may require either 2 (two-address
architecture) or 3 (three-address architecture) OAC steps; plus two OF
steps and one OS step, all for the same instruction.
2. A load-store machine instruction will either have an OAC and OF or OS
step or an EXEC step - but not both.
3. A machine with variable length instructions (like the VAX) may require
additional portions of the IF step after initial IOD.
etc.
H. We now look at the hardware required to carry out these steps.
II. Overview of CPU Components
-- -------- -- --- ----------
A. Though CPU's vary widely in design, many can be modelled by a structure
like the following:
|------------------------------------------------------------|
| |===============| v
| || || ---------------
| || ------------ ------- | |
| || / ALU \ -->|Flags|---------->| Control |
| || / \ ------- | --------|
| || ---------------- | | State |
| || /||\ /||\ ---------------
| || || || ^
| || |-----| |-----| |
| || |_____| |_____| |
--------- || || || |
| Clock | || |========| |
| | || || |
--------- || ----------------- |
| || | CPU Registers | |
| || | (Some Visible | |
|------------------>| to the user, | |
| || | others not) | |
| || | | |
| || ----------------- |
| || | MAR | |
| || |---------------| -------
| || | MBR | ===================>| IR |
| || ----------------- -------
| || /||\ /||\
| || || ||
| |=============| ||
| \||/
|
|--------------------> Memory and/or IO Buses
B. The most visible component of the CPU is the register set. As we have
seen, the particular assortment of registers will vary from machine to
machine, but will typically include one or more registers that can
function as accumulators, index registers, a program counter etc.
1. Often, the register set will include some registers used for internal
purposes that are not directly accessible to the assembly language
programmer; these can be used as scratch pads for more complex
computations such as multiplication, division, or floating point
operations, and to provide support for operating system functions
such as memory management that are not directly visible to the
ordinary programmer.
2. Also included among the registers are an MBR and MAR which serve as
the interface between the CPU and the memory system.
a. To perform a read access to memory, the CPU places the address
desired in the MAR and then issues a "read" command to the memory
system. When the memory completes the operation, the data requested
will be in the MBR, from which the CPU can transfer it to whereever
it is needed.
b. A write is similar, but the CPU places the data to be written into
the MBR before issuing the command to the memory.
c. Strictly speaking, these registers may not be necessary on a given
CPU; the memory bus may connect directly to the input and output
of the ALU. However, it is conceptually simpler to think of them
as being present.
3. Coming out of the register system are two buses serving as inputs to
the ALU.
a. The busses may be formed by using tri-state outputs on each
individual register.
b. Or, there may be a set of multiplexers - one for each bit in -
with one bit of each register connected to each MUX.
4. There is also a bus which carries the output of the ALU back to the
register set, where it can be loaded into a specific register at
the end of the cycle. This can be accomplished by giving each register
a parallel load capability.
C. The ALU would generally include several subunits for functions like:
1. Addition/subtraction.
2. Logical operations: AND, OR etc.
3. Shifts
4. On higher power machines, the ALU may include special hardware
for fast multiplication/division and/or floating point operations, as
we discussed earlier under arithmetic algorithms.
F. The ALU is also connected to a set of flags that can be set according
to the outcome of an operation - e.g. the following are frequently found:
1. A Z flag set if the result is zero
2. An N flag set if the result is negative
3. A C flag set if there is carry out from addition/subtraction or
containing the bit shifted out by a shift.
4. A V flag set if there is overflow on addition/subtraction or a shift.
5. Other flags may be included: parity odd or even, half carry for
decimal operations etc.
E. The control unit controls the operation of the remaining units by
enabling select inputs to the various MUXes etc.
1. This would be determined by a system clock, internal state information
in the control, and the contents of:
a. An IR holding the current machine language instruction being
executed.
b. The ALU flags
2. The output of control would include lines to select:
a. Which registers serve as input to the ALU (selection lines to MUXes
or address to RAM or the like.)
b. What function(s) the ALU performs.
c. Which register receives the ALU output.
d. Other functions such as control of the memory system etc.
These lines are activated at the start of a clock cycle, the
computation they carry out is carried out during the clock cycle,
and the final result is stored in a register and/or the flags at
the end of the clock cycle.
3. The control is by far the most complex part of the CPU. For this
reason, we will devote a separate lecture to it. For now, we go on
to consider the various kinds of operations that occur within the CPU.
F. As we have drawn it, many of the components are used by two or more
steps in performing an instruction.
1. E.g. the memory bus/MAR/MBR are used by OF, OAC (possibly), OF, OS
2. The ALU is used by OF (update PC), OAC, and EXEC
etc.
G. It is also poss to build a CPU with functional units dedicated to the
various step units - e.g. an instruction fetch unit, an address
calculation unit, a data memory interface unit, an instruction execution
unit etc. This entails replication of certain hardware components, but
allows enhanced speed through parallelism.
III. The Register-Transfer Level of System Description
--- --- ----------------- ----- -- ------ -----------
A. We have just noted that computer systems can be described at various
levels. Associated with each of these levels is one or more "languages"
or systems of notation.
1. At the user level, computer systems have command "languages", which
may be textual (e.g. DCL, DOS) or graphical.
2. At the HLL level, we have languages like Pascal, C, C++ ...
3. At the machine language programming level, we have used two languages:
machine language and assembly language.
4. At the hardware design level, we have seen that we can divide it into
sublevels, each of which will have a language of its own:
a. The logic design level is described using the language of gates,
flip-flops, and finite state machines, as we have already seen.
b. The CPU implementation level uses a notation called REGISTER
TRANSFER LANGUAGE, which we are about to learn.
c. We will also utilize a system of notation for describing the
overall system level.
B. In describing the organization of the CPU, we need a system of
notation to describe the basic operations that are allowed to take place,
and the circumstances under which they occur. Such a system of notation
is called a register-transfer language, and resembles a programming
language in the sense that it is used to describe the series of steps
needed accomplish some task - only in this case the steps are primative
hardware operations such as transfering a word from one register to
another, placing data on a bus, or computing a sum in an adder.
C. Each primative operation described by RTL is called a micro-operation.
1. A micro-operation is a primative data transfer or transformation
operation accomplished by the hardware in A SINGLE CLOCK CYCLE.
2. In contrast, a macro-operation is a single machine instruction as
seen by an assembly-language programmer.
a. When you studied assembly language, you learned that a single
higher-level language statement might require several machine
language instructions. Consider the implementation of the
following Pascal statement on a one-accumulator machine using
one-address instructions:
Pascal: X := Y + Z
machine: LOAD Y
ADD Z
STORE X
b. Likewise, each machine language instruction (macro-operation) will
be implemented as a series of micro-operations. For example, take
the ADD instruction abov, and assume the use of absolute addressing.
. The following series of micro-operations may be used to actually
perform it.
machine: ADD Z
becomes: MAR <- PC (OF)
MBR <- M[MAR] "
PC <- PC + size of instruction "
IR <- op code portion of MBR (IOD)
MAR <- address portion of MBR (OAC)
MBR <- M[MAR] (OF)
AC <- AC + MBR (EXEC)
D. RTL not only allows us to describe primative operations, but also the
conditions under which those operations occur. This is accomplished
by preceeding the micro-operation with a logical expression followed
by a colon. For example, in the above, suppose that the OP code is
stored in the instruction register (IR), and that the op code for ADD
is 1001. Suppose further that the addition step takes place when an
internal timing signal T7 is true. Then the last microoperation could
be written:
IR = 1001 and T7: AC <- AC + MBR
1. Note that the portion before the colon becomes the description for a
combinatorial network that must be implemented to generate the control
signals necessary to effect the specified transfer - e.g. the parallel
load enable input to the AC plus possible inputs to several MUX's to
select AC and MBR as inputs to the adder (assuming other inputs are
possible) and the adder output as input to the AC (assuming other
inputs to the AC are possible).
2. At the RTL level, a system can be viewed as consisting of two parts:
a. A data part, consisting of registers, data paths (busses) and the
data transformation elements that comprise the ALU.
b. A control part that generates the necessary enable and selection
inputs to the devices in the data part at the correct time.
c. A micro-operation specification in RTL can be read as:
if the following conditions are true, then
the control unit must generate the control signals needed
to cause ___ to occur.
E. RTL allows us to specify that a number of micro-operations occur in
parallel, by separating them by commas.
1. For example, most CPU's are constructed in such a way that the
operation of incrementing the program counter uses different hardware
than that used to actually read a word from memory.
2. Thus, in the ADD instruction example we considered above, these two
steps could be done in parallel:
MBR <- M[MAR], PC <- PC + instruction size
3. In the ADD instruction, these are the only steps that can be done
in parallel, because each of the other steps depends on the result
of the previous step.
F. Basic RTL nomenclature:
1. Registers are referred to by all capital letter names - e.g. AC, R3
etc.
2. Busses are referred to similarly.
3. A single bit of a register or bus is referred to by using a subscript-
e.g. R3
2
4. A group of bits of a register or bus are referred to by enclosing the
bit numbers or a mnemonic name in parentheses - e.g. IR(15-0),
MBR(AD).
5. An arrow is used to denote the loading of a value into a register or
its gating onto a bus - e.g. AC <- AC + 1 or ABUS <- AC. Cf the
assignment operation of Pascal: AC := AC + 1.
6. A colon separates the conditions under which a micro-operation is to
be done (boolean expression) from the operation itself. Cf the
if..then of pascal:
IR=1001 and T7: AC <- AC + MBR
if (IR=1001) and T7 then
AC := AC + MBR
7. Commas are used to separate micro-operations done in parallel (at the
same time.)
IV. Survey of typical micro-operations:
-- ------ -- ------- ----------------
A. Parallel transfer: condition: dest <- source
1. Meaning: If condition is true, then all bits of destination register
are loaded with corresponding bit of source on the next clock pulse.
2. Implementation: destination register is a register with parallel
load. Its inputs are tied to corresponding outputs of source, which
may be another register, an array of arithmetic elements (eg a
16 bit adder) or a bus. Parallel load enable is activated by a
network realizing the specialized boolean condition.
Example: for xy: A(3-0) <- B(3-0)
____ ____________
x ---| \ Load | |
| )---------| Register A |----------- Clock
y ---|____/ Enable |____________|
| | | | Inputs
| | | |
| | | | Outputs
__|_|_|_|___
| |
| Register B |
|____________|
B. Bus transfer: condition: BUS <- source
or condition: dest <- BUS
or condition: BUS <- source, dest <- BUS
(In the first case, the bus is presumably used as input to some
functional unit such as the adder; in the second case, the bus is
presumably the output from some functional unit such as the adder;
in the third case the bus is used to route data directly from one
register to another without any intervening computation.)
1. Meaning: If condition is true, then one of several possible sources
is selected and placed onto a common bus and/or at the next clock
pulse all bits appearing on the bus are copied into destination.
2. Implementation: each of the sources connects to the bus in one of
two ways:
a. One input to a MUX.
b. A tri-state gate.
The bus connects to the parallel input of the destination. A logical
network realizing condition is used to enable either the correct MUX
channel or the tri-state gates, as well as parallel load.
Example: Consider a CPU having 4 bit-registers, any one of which
can be transferred to the input of a fifth such register.
We consider how to do this two ways: with MUXes, and with
tristate outputs on the registers.
- with MUXes (note: selection lines of all four MUXes would be
tied together)
________
| 4 bits |
|________|
| | | |
_____________________| | | |_____________________
| _______| |_______ |
____|____ ____|____ ____|____ ____|____
| 4 x 1 | | 4 x 1 | | 4 x 1 | | 4 x 1 |
| MUX | | MUX | | MUX | | MUX |
|_______| |_______| |_______| |_______|
| | | | | | | | | | | | | | | |________
| | | | | | | | | | | |_________|_|_|________ |
| | | | | | | | | | |____ | | | | |
| | | | | | | |_________|_|_____|_______|_|_|______ | |
| | | | | | |___________|_|____ | | | | | | |
________| | | |_________|_|_____________|_|___|_|_______|_|_|____ | | |
| | |___________|_|_____________|_|__ | | | | | | | | |
| |______ | | | | | | | | | | | | | |
| ______________|_______| | | | | | | | | | | | | |
| | | ________| | | | | | | | | | | | |
| | ____________|_|_____________________| | | | | | | | | | | |
| | | __________|_|_______________________|_|_|_|_______| | | | | | |
| | | | | | ______________________| | | | | | | | | |
| | | | | | | ______________________|_|_|_________| | | | | |
| | | | | | | | | | | __________| | | | |
_|_|_|_|_ _|_|_|_|_ _|_|_|_|_ _|_|_|_|_
| 4 bits | | 4 bits | | 4 bits | | 4 bits |
|________| |________| |________| |________|
- with tri-states: (note: only one of the four registers would
have its tri-state enable input active.)
________
| 4 bits |
|________|
| | | |
+--------------------+-|-|-|-------------+-------------------+
| +------------------|-+-|-|-------------|-+-----------------|-+
| | +----------------|-|-+-|-------------|-|-+---------------|-|-+
| | | +--------------|-|-|-+-------------|-|-|-+-------------|-|-|-+
_|_|_|_|_ _|_|_|_|_ _|_|_|_|_ _|_|_|_|_
---| 4 bits | ---| 4 bits | ---| 4 bits | ---| 4 bits |
|________| |________| |________| |________|
C. Memory transfer: condition: MAR <- address or MAR <- address
MBR <- M[MAR] MBR <- source
dest <- MBR M[MAR] <- MBR
1. Meaning: if condition is true, a word of data is transferred to/from
a specified memory address.
2. Implementation: The memory system interfaces to the rest of the
system via two registers, MAR and MBR, which have parallel load
capabilities. Ordinary parallel transfer techniques are used to
move data between them and CPU registers; the memory itself is
controlled by lines such as chip select and write enable. (We will
discuss all this later.)
Note: sometimes these are abbreviated to
dest <- M[address] or M[address] <- s
But when we do so we are normally describing a SERIES of micro-operations
that take place over a series of cycles.
D. Arithmetic Operations
1. The most basic arithmetic operation is ADD, which takes two full-size
operands plus a one-bit carry in and produces a full-size result plus
a one-bit carry out.
a. RTL: Dest <- Source1 + Source2
b. This can be implemented by using an array of ripple carry or
carry-lookahead, as discussed previously.
2. By allowing the inputs to the adders to come from multiplexers, we can
realize several different kinds of addition operations, as follows:
Carry-out Sum
| ||
____|____||____
/ n Full adders \ <------ carry - in
/ _____________ \
/__/ \__\
|| ||
______||_____ ______||_____
Select -----| n 2x1 MUXes | | n 4x1 MUXes |---- Select
|_____________| |_____________|----
|| || || || || ||
_ _
A A B B 0 -1 (all 1's)
Possible functions:
Function A MUX select B MUX select Carry - in
A + B A (0) B (00) 0
A + B + 1 A B 1
A + B' A B' (01) 0
A - B A B' (01) 1
A A 0 (10) 0
A + 1 A 0 (10) 1
A - 1 A -1 (11) 0
A' A' (1) 0 (10) 0
-A A' 0 1
B - A A' B 1
(Others are possible but would probably not be useful.)
E. Logic operations.
1. A typical instruction set requires us to perform certain kinds of
bit-wise logical operations on pairs of operands - e.g. some subset
of bitwise or, and, xor, bit-clear.
2. It is possible to realize a general logic network in which a
MUX is used to select one of the possible logic operations. For
example, the following network realizes one of the four functions
A^B, AvB, AO+B, A^B' as selected by a two-bit selection line.
____|____
| 4x1 MUX |-- Select
|_________|--
____ | | | |
A ---| \ | | | |
| )-+ | | |
B ---|____/ | | |
____ | | |
A ----\ \ | | |
) >---+ | |
B ----/___/ | |
____ | |
A --\-\ \ | |
) ) >-----+ |
B --/-/___/ |
____ |
A ---| \ |
| )-------+
B'---|____/
3. Logical operations can be used to manipulate individual bits or
groups of bits, as in converting ASCII to decimal, unpacking
packed data etc. The following operations are useful in this
regard:
_
a. Selective clear: A <- A ^ B - a 1 in B causes the
corresponding bit in A to be
cleared.
b. Selective set: A <- A u B - a 1 in B causes the
corresponding bit in A to be
set.
c. Mask A <- A ^ B - only the bits in A corresponding
to 1's in B remain set. Note
that this is an alternative to
selective clear.
F. Shift operations:
1. Shift operations can be classified as to direction and type.
a. Direction: left or right.
b. Type: logical, rotate, arithmetic. The distinction is what gets
shifted into the sign position and into the bit position vacated.
i. Logical shifts shift a zero into the vacated bit. Thus
shl(1111) --> 1110; shr(1111) --> 0111
ii. Rotate shifts the bit shifted out into the vacated bit. Thus
cil(1000) --> 0001; cir(0001) --> 1000
iii. Arithmetic shifts implement the operations *2 (ashl) and
div 2 (ashr). The rules depend on the sign-convention in use:
a. Unsigned: same as logical shift. If a one is shifted out
on an ashl, then overflow has occurred.
b. Sign-magnitude: use rule for unsigned on all bits except
the sign, which is left unchanged.
c. 2's complement:
i. Left shift: the sign is left unchanged, and 0 is shifted
into the low order bit. If the bit shifted out is not
the same as the sign, then overflow has occurred.
example: ashl(0001) -> 0010 1 -> 2
ashl(1100) -> 1000 -4 -> -8
ashl(0100) -> 0000 ovf 4 -> 0 should be 8
ii. Right shift: the sign is propagated.
example: ashr(0011) -> 0001 3 -> 1
ashr(1001) -> 1100 -7 -> -4
2. Implementation - two choices:
a. Use of a shift register, with appropriate shift in/out connections.
b. Use of a gating network. Example: a network that realizes all 6
operations on 4 bit 2's comp numbers, controlled by 3 select lines
with two combinations unused. (shl A, shr A, cil A, cir A,
ashl A, ashr A in that order. Bits numbered with 0 = lsb)
Result Result Result Result
| 3 | 2 | 1 | 0
____|____ ____|____ ____|____ ____|____
| 8x1 MUX | | 8x1 MUX | | 8x1 MUX | | 8x1 MUX |
|_________| |_________| |_________| |_________|
| | | | | | | | | | | | | | | | | | | | | | | |
A 0 A A A A A A A A A A A A A A A A 0 A A A 0 A
2 2 0 3 3 1 3 1 3 1 3 0 2 0 2 0 2 1 3 1 1
c. Sometimes it is desirable to shift more than one place in a single
cycle. A network like that described above can, of course, be
custom-designed to shift any number of places (up to the word
length). To get a general shift capability, one can use a
LOGARITHMIC SHIFTER. For example, the following could shift its
input any number of places from 0 to 15:
||||||||||||||||
-----------------
| 0 or 8 place |
| shift |--- control
-----------------
||||||||||||||||
-----------------
| 0 or 4 place |
| shift |--- control
-----------------
||||||||||||||||
-----------------
| 0 or 2 place |
| shift |--- control
-----------------
||||||||||||||||
-----------------
| 0 or 1 place |
| shift |--- control
-----------------
||||||||||||||||
Each stage shifts its input either the specified number of places,
or no places at all, based on its control input.
To do, say, a 13 place shift, one would enable the 8, 4, and 1
place shifters and disable the 2 place shifter. (Note that the
binary representation of the number of places to shift becomes
the set of control signals to the various stages!)
V. Execution of Machine-Language Instructions - One Address Machine
- --------- -- ---------------- ------------ - --- ------- -------
A. We have noted that each macro-instruction (instruction in the set that
is visible to the assembly-language programmer) is implemented by an
appropriate series of micro-operations like the above, one per clock
cycle.
B. As we shall see in our next lecture, it is the task of the control part
of the CPU to arrange for these micro-operations to occur in the correct
order.
C. Typically, the execution of each is done in a series of phases - each
consisting of one or more micro-operations. (But some phases may not
be needed for some instructions.)
For illustration purposes, we will give below a series of micro-operations
for each phase of an instruction for a one-address machine. These are
meant to give an idea of what MIGHT occur - actual implementations will
vary. We assume an instruction format in which the op-code and address are
fetched from memory together as a single unit - e.g.
__________________
| op | address |
------------------
1. Instruction fetch:
MAR <- PC
MBR <- M[MAR]
PC <- PC + size of instruction
(Note: on machines with variable-format instructions, several memory
references may be needed.)
2. Instruction decoding
IR <- MBR
3. Operand address calculation (perhaps for several operands).
a. Example: direct addressing:
MAR <- address part of IR
b. Example: displacement addressing:
MAR <- address part of IR + designated register
c. Example: indirect addressing:
MAR <- address part of IR
MBR <- M[MAR]
MAR <- MBR
4. Operand fetch (if needed).
MBR <- M[MAR] -- once for each operand
5. Instruction execution
a. Example: ADD memory location to R0
R0 <- R0 + MBR
b. Example: Branch to some address if N condition code is set
N: PC <- branch address
6. Storing the result and/or setting condition codes (if needed)
Example:
MBR <- result of operation
M[MAR] <- MBR
Note that (1) involves a read from memory, and (3) and (4) may involve
one or more reads from memory, while (6) may involve a write to memory.
D. In the simplest case, the various phases of an instruction are done in
sequence - one after another. Thus, at any given moment of time, the CPU
is carrying out one particular phase of one particular instruction -
fetching it, or decoding it, or calculating the address of its operand(s),
or ...
E. However, one very important way of improving system performance is by
the use of various forms of parallelism in the instruction
fetch/decode/execute cycle. We will discuss this under the topic of
"pipelining" later in the course.
F. In our discussion of the control unit which comes next, we will stick
with a simplified model of the CPU in which all phases of execution of
an instruction are done sequentially. We will come back to parallelism
later.
VI. Execution of Machine-Language Instructions - Load-Store Machine
-- --------- -- ---------------- ------------ - ---------- -------
A. Thus far, our discussions of CPU implementation have been based on a
one-address architecture machine. We now look at how this would change
for a load-store machine such as MIPS.
B. For a one-address machine, each instruction is performed by a subset of
of the following steps - always done in the relative order shown:
IF - Instruction fetch
IOD - Instruction decode
OAC - Operand Address calculation
OF - Operand fetch
EXEC - Instruction execution
OS - Operand store
Each of these stages make use of some subset of the basic functional units
of the CPU - e.g.
IF uses PC, MAR, MBR, Memory system, and ALU
IOD uses MBR, IR
OAC uses MAR, MBR, ALU, and (maybe) Memory system
OF uses MAR, MBR, and Memory system
EXEC uses some subset of PC, MBR, ALU, AC, and flags
OS uses MAR, MBR, Memory system
C. For a load-store architecture machine, as presented in our text, the
basic steps and usage of functional units is similar, but not identical.
1. Instead of a single memory system, there may be two:
a. Instruction memory
b. Data memory
(These are not ultimately two totally separate systems, but rather
separate subsystems of a single memory system. We will say more about
how this is accomplished later, when we talk about memory systems.)
c. Some ramifications:
i. No need for an MAR, as such:
- For accesses to instruction memory, the PC always furnishes
the address.
- For accesses to data memory, the address is always calculated
by the ALU - hence the ALU output furnishes the address.
ii. No need for an MBR, as such:
- For accesses to instruction memory, the data read always goes
into the IR.
- For accesses to data memory, the data either comes from or
goes to one of the general registers.
2. In the one address machine, all computational instructions use the AC
as one source, and as the destination, and use the MBR as the other
source. In the load-store architecture, each of the sources and the
destination can be any of the programmer-visible registers.
a. For the one address machine, it is reasonable to do the complete
execution of a computational instruction as a single step, in one
clock.
b. For the load-store machine, execution of a computational instruction
needs to be broken into several steps if the clock cycle time is
not to be made unreasonably long:
i. Reading the two source registers into holding registers at the
ALU inputs.
ii. Doing the actual computation and writing the result into a
holding register at the ALU output.
iii. Writing the result back into the appropriate register.
3. The separation of memory reference instructions (load/store) from
computational instructions means that an instruction will either do
an operand address calculation or an arithmetic/logic computation,
but not both. Further, the same field(s) in the instruction specify
the source register(s) and immediate value (if any) for whichever
type of computation is being done. Thus, these two steps can be
folded into one.
4. The fact that all instructions have the same length (1 word) and
basic format means that certain computations can be done
speculatively during the IOD step - i.e. before it is known for
certain that their result will be needed, since they don't take any
extra time and don't change any programmer-visible part of the machine
state:
a. Source registers can be read out of the register file into the
ALU input holding registers, even if the instruction turns out to
be one that does not use them.
b. The target address for a branch instruction can be computed and
stored in the ALU output holding register, even if the instruction
is not actually a branch or is a conditional branch that is not
taken.
D. This leads to the following series of steps for various types of
instructions:
TRANSPARENCY: Patterson-Hennessy figure 5.35
1. R-Type instructions require a total of 4 steps
2. Memory reference instructions require 4 (store) or 5 (load)
3. Branch instructions require 3 steps.
E. Structure of CPU Data Paths for a MIPS-like machine
TRANSPARENCY: Patterson-Hennessy figure 5.32
Go over
(Note: In this example, there is one memory system which can accept
an address from one of two sources (PC or ALU) and can send data
read to one of two destinations (IR or Memory data register)
(Note: figure 5.33 adds some additional logic to support jmp, which we
won't discuss now)
Copyright ©2000 - Russell C. Bjork