CS222 Lecture: Course Introduction; Architecture and Organization;
Performance 1/11/99
Objectives:
1. Introduce course, requirements
2. Tie this course into CS221
3. Review levels of structure and major components of a computer
4. Introduce concepts of architecture and organization and relate to structure
of this course.
5. Introduce the notion of performance as a driving force in this field.
6. Review basic VonNeumann machine architecture
7. Introduce concepts of machine and assembly language programming
I. Preliminaries: Roll, Syllabus
- ------------- ---- --------
II. Course Introduction
-- ------ ------------
A. Last semester, we started off CS221 by observing that the complexity of
computer systems requires us to study them at various levels of
abstraction. Can anyone recall what those levels are?
ASK
1. The user level: the computer system performs certain tasks in
response to certain commands (e.g. EDIT). To the user, it appears
as if the system "understands" a command language such as the DCL
command language of the VAX, or the keypad commands of EDT, or the
mouse clicks of a graphical interface.
2. The higher-level language programming level: each application is
programmed using the statements of a higher-level language such as
Pascal or C. A single user-level command is thus implemented by
100's or 1000's of statements in a programming language. To
the programmer, it appears as if the system "understands" the
particular higher-level language he or she is programming in.
3. The machine language programming level: as delivered by the
manufacturer, a given computer system has certain primative
components and capabilities:
a. A memory system, capable of storing and retrieving information
in fixed-size units known as "bytes" or "words".
b. An input-output system, capable of transferring information
between memory and some number of devices such as keyboards,
screens, disks etc.
c. A CPU, capable of performing primative operations such as
addition, subtraction, comparison, etc., and also capable of
controlling the other two systems.
i. The CPU is designed to respond to a set of basic machine
language instructions, which is specific to a given type of
CPU. (E.g. the machine language for the VAX is vastly
different from that of the Intel CPU's used in PC's or the
Motorola 68K and Power PC CPU's used in MacIntoshes.)
ii. The compiler for a higher level language translates that
language into the native machine language of the underlying
machine.
- The same program must be translated into different machine
languages to run on different machines; thus, each type of
machine must have its own set of compilers.
- Regardless of the HLL used, the machine code generated by the
compiler for a given machine will be in the same native
machine language of that machine.
- On the VAX, the .OBJ and .EXE files produced by the compiler
and linker contain machine language binary code.
At this level, it appears that the system "understands" its
machine language.
4. The hardware design level: Ultimately, computer systems are built as
interconnections of hardware devices known as gates, flip-flops, etc.,
combined to form registers and busses. These, in turn, are realized
from primitive electronic building blocks known as transistors,
resistors, capacitors etc. The resultant system is capable of
directly executing the instructions comprising the machine language
of the system.
5. The solid-state physics level: current computers are fabricated from
materials such as silicon that have been chemically "doped" to alter
their electronic properties. Transistors, resistors, and capacitors
are realized by utilizing the properties of these semiconductor
materials. (Of course, future computers may use some other technology
such as optics.)
Summary:
User Level User commands, Application software
-----------------------------------------------------------
HLL Programming level Statements in Pascal, C, etc.
-----------------------------------------------------------
Machine language level Machine language instructions
-----------------------------------------------------------
Hardware design level Gates, flip-flops etc.
-----------------------------------------------------------
Solid-state physics Physical properties of semiconductors
In CS221 we spent most of our time at the 4th level - hardware design.
In this course, we will consider this level further, but will also study
the third level in detail. (In fact, we will study the 3rd level
first, and then go back to the 4th level to see how the capabilities we
have studied are implemented there.)
B. Note the course schedule in the syllabus: the first half will focus on
machine and assembly language programming (the third level in our
hierarchy), and the second half on how this level of abstraction is
implemented at the hardware level (the fourth level).
C. One way to view how these two emphases of the course relate is in
terms of two words that are often used interchangeably, but which
really have distinct technical meanings: COMPUTER ARCHITECTURE and
COMPUTER ORGANIZATION. (Note title of course).
1. Computer architecture is concerned with the FUNCTIONAL CHARACTERISTICS
of a computer system - as seen by the assembly language programmer.
One of the major topics of the first half of the course will be
the architecture of the VAX, and we will also devote some time to
the architecture of the MIPS CPU's used in our workstations and
to several other architectures.
2. Computer organization is concerned with how an architecture can be
REALIZED: the logical arrangement of various component parts to
produce an overall system to accomplish certain design goals.
a. The technology used to build the system components.
b. The component parts themselves
c. Their interconnection
d. Strategies for improving performance.
3. Note that a given architecture may be realized by many different
organizations. The VAX is a good example.
a. At one time, our main academic system was a VAX-11/780 - which
was the first VAX model developed. It occupied four good size
cabinets - each big enough to hold a person. In particular, the
VAX instruction set was realized in this machine by a CPU
implemented using 20 circuit boards, each about 8" x 15", using
third-generation integrated-circuit technology.
b. Each of our current VAX systems sits in a small box not much bigger
than a PC. In it, the VAX instruction set is realized by a single
1" square chip, using fourth-generation CMOS VLSI technology.
Yet this CPU is many times faster than the 11/780!
c. Further, the two systems have different kinds of internal busses.
As a result, the two systems use very different kinds of memory
expansion boards. Though they can use the same IO devices, a
different kind of controller board is needed for each machine.
d. Nonetheless, the two machines have the same architecture. An
assembly language programmer could not tell the difference
between them.
4. Computer architectures tend to be rather stable.
a. E.g. the VAX architecture has been in use essentially unchanged
since 1981, and IBM's basic mainframe architecture has lasted even
longer. The 80x86 architecture used in Wintel PC's has its roots
in an architecture developed in the late 1970's, with a major
revision in the mid 1980's and minor revisions since then.
b. A major factor in the stability of architecture is the need to be
able to continue to use existing software. Potential changes
to an architecture have to be weighed carefully in terms of their
impact on existing software, and adoption of an altogether new
architecture comes at a hugh software development cost - which
is why we are still using architectures developed in the 1970's.
5. On the other hand, computer organization tends to evolve quickly with
changes in technology - each new model of a given system will
typically have different organizational features from its predecessors
(though some aspects will be common, too.) The driving factor here
is performance; and it is common for one or more new implementations
of a popular architecture to be developed each year.
D. A fair question to ask at this point is "why should I need to learn
about computer architecture and organization, given that I'm not planning
to be a computer hardware designer, and that higher level language
compilers insulate the software I write from the details of the
hardware on which it is running?"
1. An understanding of computer architecture is important for a number
of reasons:
a. Although modern compilers hide the underlying hardware architecture
from the higher-level-language programmer, it is still useful to
have some sense of what is going on "under the hood"
i. Cf the benefit of learning Greek for NT studies.
ii. There will be times when one has to look at what is happening at
the machine language level to find an obscure error in a program.
b. Further, familiarity with the underlying architecture is necessary
for developing and maintaining some kinds of software:
i. compilers
ii. operating systems and operating system components (such as
device drivers)
iii. embedded systems.
c. In order to understand various performance-improvement techniques,
one must have some understanding of the functionality whose
performance they are improving.
2. Likewise, an understanding of computer organization is important for
a number of reasons:
a. Intelligent purchase decisions - seeing beyond the "hype" to
understand what the real impact of various features on performance
is.
b. Making effective use of high performance systems - sometimes the
way data and code is structured can prevent efficient use of
mechanisms designed to improve performance.
c. Increasingly, compilers that produce code for high performance
systems have to incorporate knowledge as to how the code is
actually going to be executed by the underlying hardware -
especially when the CPU uses techniques like pipelining and
out-of-order execution to maximize performance.
III. Performance
--- -----------
A. We have noted that computer organization is largely driven by performance
issues. A CPU manufacturer cannot be content to keep selling the same
basic design, but must continually be developing better designs in order
to remain competitive.
B. This raises an important issue: how do we measure the performance of a
computer system, and how do we compare the performance of different
systems?
1. Based on general reading you do in trade publications, what performance
metrics do manufacturers tend to advertise?
ASK
2. The authors of our textbook point out that the only really legitimate
way to measure performance is TIME.
a. Two different time-related metrics are important
i. Response time - how long does it take a given system to complete
a given task - start to finish?
ii. Throughput - how many tasks of a given kind can a given system
complete in a unit of time?
iii. The former metric is of most importance in a single-user system
(e.g. a personal computer or workstation). The latter metric
may be more important for multi-user systems (e.g. time-shared
systems, servers).
b. The time needed to complete a given task consists of several
components:
i. CPU time (computation)
ii. I/O time (e.g. time spent accessing a disk or transmitting
information over a network)
iii. (On a multi-user system) Time spent waiting for a resource that
is in use by another user
c. Response time can be improved by speeding up the CPU, speeding up
I/O, or both. These measures will also improve throughput; in
addition, throughput can be improved by more effective overlapping
of the use of various resources (e.g. doing computation on the
CPU for one user while simultaneously performing disk accesses
for other users on the various disks.)
C. Most of the performance improving techniques we will consider focus on
speeding up computation - i.e. reducing the amount of CPU time needed
to perform a given task. This reflects the fact that this component of
overall time is the more easily improved - I/O operations tend to be
mechanical in nature (e.g. moving disk heads) and are therefore less
easily speeded up.
D. The CPU time needed to perform a given task is given by the following
equation:
number of instructions average number of time for one
that must be executed X clock cycles needed X clock cycle
to perform the task per instruction (CPI)
Since the latter term is the reciprocal of the clock rate, this can
also be written:
number of instructions average number of
that must be executed X clock cycles needed
to perform the task per instruction (CPI)
---------------------------------------------------
clock rate
Example: A certain task requires the execution of 1 million instructions,
each requiring an average of three clock cycles. On a 300 MHz
clock system, this task will take:
1 million instructions x 3 clocks/instruction
--------------------------------------------- = .01 second = 10 ms
300 million clocks/second
1. This equation suggests three basic ways that performance on a given
task might be improved:
a. Reduce the total number of instructions that must be executed
i. Use a better algorithm (a software issue, not a hardware one)
ii. Use a CPU with a more powerful instruction set (e.g. a VAX
instruction might perform a task that would take several
instructions on a MIPS CPU).
b. Reduce the average number of clocks needed to execute an instruction
i. Better implementation of the instruction in hardware
ii. Use of various forms parallelism to allow the CPU to be working
on different portions of several different instructions at the
same time. This doesn't reduce the total number of clocks needed
to execute one instruction, but it does reduce the total number
of clocks needed to execute a series of instructions and hence
the effective average number of clocks needed per instruction.
c. Increase the clock rate
i. Use of improved technology - e.g. smaller basic feature sizes on
a chip result in lower capacitances and inductances, allowing
faster clock rates.
ii. The time needed for a clock cycle is determined by the amount of
time needed for a signal to propagate down the longest internal
data path in the CPU. Using internal data paths with fewer gates
allows a shorter clock cycle and a higher clock rate. (E.g.
using carry lookahead instead of ripple carry in an adder uses
more gates overall, but results in shorter individual data
paths).
2. Unfortunately, these three components interact with each other, so that
improving one dimension of performance may come at the cost of reducing
performance in another direction.
Example:
a. Until the 1980's, a basic trend in CPU design was toward
increasingly powerful instruction sets - i.e. increasing the amount
of computation that a single instruction could perform. (In many
ways, the VAX architecture represents the high water mark of this
trend.)
b. In the 1980's, an alternate approach emerged that focussed on using
simpler instructions that lend themselves to faster clock rates and
a much higher level of intra-instruction parallelism. (The MIPS
architecture is representative of this trend.)
c. Proponents of the latter trend coined the name Reduced Instruction
Set Computer (RISC) to describe this approach. The earlier approach
then was given the name Complex Instruction Set Computer (CISC).
3. Further, measures to improve CPU performance also impact other system
components.
a. Since each instruction executed by the CPU involves at least one
reference to memory (to fetch the instruction), improving CPU speed
necessitates improving memory system speed as well. However, basic
DRAM memory chip technology has not improved significantly (access
times remain around 60 ns), so memory systems have had to
incorporate sophisticated cache memory techniques to keep up with
CPU speeds.
b. Increased CPU speed typically results in increased power
consumption, which impacts power supplies, CPU cooling, and the
ability to run a system off rechargeable batteries.
E. Further, the equation we have been discussing does not lend itself
to direct calculation of the time needed for a given task, so other
techniques must be used to actually measure performance.
1. The clock rate is the one number that is easily obtained. The
total number of instructions needed to perform a given task
could be calculated from the program code, but the computation would
be laborious. And determination of CPI would be very difficult, since
it may depend on:
a. The exact nature of the instructions executed (on CISC's, some
instructions require more clocks than others; on RISC's, average
CPI is affected by program flow).
b. The interaction between the CPU and the memory system.
2. For similar reasons, one cannot simply say that if a given program
takes time t on a system with a given clock rate, it will take, say,
time t/2 on a system whose clock is twice as fast.
a. The improvement could be much less than the clock rate ratio, if
the rest of the system (e.g. memory, I/O) is not speeded up
proportionally.
b. Sometimes, the performance improvement turns out to be greater than
that implied by the clock rate ratio, because other components
of the equation (e.g. CPI) have been improved as well.
3. In practice, speeds of various systems are typically compared by
the use of BENCHMARKS (individual programs or sets of programs.)
The book discusses a number of reasons why this approach is fraught
with pitfalls, ranging from statistical issues to the possibility
of manufacturers "rigging" their product to do well on known
benchmarks.
IV. Review Of Basic Von Neumann Machine Architecture
-- ------ -- ----- --- ------- ------- ------------
A. Last semester we saw that modern computer systems are based on a
basic architecture frequently known as "the Von Neumann machine".
1. In this architecture there are five fundamental kinds of building
blocks. Can anyone recall what they are?
ASK
a. Memory
b. Arithmetic-logic Unit (ALU)
c. Control
d. Input
e. Output
2. In CS221 we looked in detail at memory elements (registers, memory
chips, and various magnetic media) and the arithmetic-logic unit (e.g.
shifts, hardware realizations of arithmetic operations.) In this
course, we will look at the other building blocks - especially the
control element, which is responsible for interpreting machine language
programs. We will also look at how these building blocks are
interconnected to one another. (Overall system structure, plus the IO
and memory subsystems.)
Note: In most modern computers the ALU and Control elements are
combined into a single building block known as the CPU. However,
for the purpose of understanding how the CPU works, it is
helpful to consider reach part separately.
3. The basic VonNeumann machine architecture can be pictured as follows:
- - - - - - - CONTROL - - - - - - Solid lines: flow of data
| | ^ < -| | Dashed lines: flow of control
|
v | | | v
INPUT - - > MEMORY OUTPUT
| | ^ | | ^
| | | |
| |- - > | v - -| |
|-----------> A.L.U. ------------|
4. The execution cycle of this machine could be described as follows:
while not halted do
begin
fetch an instruction from memory (Symbolically: IR <- M[PC])
update program counter (Symbolically: PC <- PC + 1)
decode instruction
execute instruction
end
5. We will now briefly review the various individual building blocks.
B. We saw in CS221 that a conventional memory system can be viewed as an
array of addressable units or cells, each of which has an unsigned
integer address in the range 0..# of cells - 1. This system interfaces
to the rest of the computer through two special registers called the
memory address register (MAR) and the memory buffer register (MBR).
1. The memory system is capable of performing two primative operations:
a. Reading the contents of a cell (or sometimes two or more adjacent
cells), delivering the data stored there to the rest of the
system (while leaving the copy in memory unchanged.)
b. Writing a new value into a cell (or sometimes two or more
adjacent ones.)
2. To access the memory, the control unit arranges for an address to be
placed in the MAR. If the operation is to be a write into the memory,
it also arranges for data to be placed in the MBR. Then it issues
a command to the memory to do the required operation. (In the case
of a read, the data read will be placed in the MBR upon completion.)
3. A fundamental concept is the notion of a "memory cell".
a. The basic unit of information storage in a computer is, of course,
the bit. But since a single bit is too small a unit of information
for most purposes, memories are organized on the basis of larger
units each consisting of some fixed number of bits.
b. In the early days of computing, computers were usually
specialized as either "business" or "scientific" machines.
i. On a typical "business" machine the unit of storage in memory
was the character, represented by a code requiring 6-8 bits.
ii. On a typical "scientific" machine the unit of storage was the
word, involving typically around 24-60 bits. Later, when
minicomputers were introduced, a word size of 16 became common,
and one minicomputer even had a 12-bit word.
c. The IBM 360 (early 60's) introduced a new concept: multiple
memory organizations in a single machine:
i. The primary organization of memory was by bytes of 8 bits.
ii. Two adjacent bytes formed a halfword (16 bits), four adjacent
bytes formed a word (32 bits), and 8 adjacent bytes formed a
doubleword.
iii. Memory was byte-addressable. The address of larger units was
specified by giving the address of its lowest byte. Thus
halfwords always had even addresses; words had addresses that
were multiples of 4 etc.
d. This organization has been adopted by many modern machines,
including the VAX and MIPS. However, the terminology varies -
e.g. on the VAX a "word" is 16 bits and a "longword" is 32 bits;
on MIPS a "word" is 32 bits and 16 bits is called a "halfword",
as on the 360.
4. In addition to specifying the basic unit of memory, we also talk about
the address space of a machine as representing the range of possible
memory addresses. This is basically a function of how many bits are
used in the formation of a memory address (and thus the size of the
MAR). Examples:
a. IBM 360/370 - 24 bit address -> 16 Megabytes.
b. PDP-11 - 16 bit address -> 64 K bytes
c. VAX - 32 bit address -> 4 gigabytes potential - but actual
implementations of the VAX architecture use a somewhat smaller
address size to keep costs down.
d. DEC Alpha (successor to VAX) and later members of the MIPS family
- 64 bit address -> 4 x 10e18 bytes potential (only partially
supported in terms of actual memory, of course).
C. The ALU is a portion of the system where there is considerable
architectural diversity.
1. All ALU's consist of three basic types of building block:
a. Registers - special high-speed memory cells for storing items that
are being worked on.
b. Functional elements - e.g. adders, shifters, comparators etc.
c. Data paths connecting (1) and (2), as well as external connections
to the memory and I/O subsystems.
d. Not all of these are directly visible to the assembly language
programmer. Typically, the registers and the functional elements
are the most noticeable.
2. The following simplified block diagram shows how a typical ALU might
be organized:
To memory/IO bus(ses)
^ v
|_|______________________________
|
______________________________________ Result bus |
| | | | |
| | | Bitwise |
| Adder Shifter Logic |
| ^ ^ ^ Functions |
| | | | ^ ^ ^
| |_|_______|_________| | |
| | |___________________| v
| | | |
| Operand 1|_______ ___________| Operand 2 |
| Bus ^ ^ Bus |
| | | |
| Register Set |
| (including |
| MAR _________________________|
| MBR) _________________________|
| ^
| |
|__________________________|
a. An instruction to add the contents of a memory location to a
register could be executed as follows:
i. The address of the appropriate cell in memory would be placed
in the MAR.
ii. The memory would be instructed to read that cell and place its
contents in the MBR.
iii. The contents of the appropriate register would be placed on
the Operand 1 Bus, and the contents of the MBR would be placed
on the Operand 2 Bus (or vice versa). The adder would sum the
two numbers and place the result on the Result Bus, and
the value on the Result Bus would be stored back into the
appropriate register.
b. Note that some instructions would not use all the busses. For
example, an instruction to copy a memory location into a
register would not use the Operand 1 bus; the adder would be
told to route the Operand 2 bus value straight through (in
effect by adding zero to it.)
D. The I/O subsystem is is the area where there is the greatest diversity
between systems. Almost anything can be an IO device for a computer -
from a terminal to an automobile or a power plant!
1. Broadly speaking, IO devices fall into two categories:
a. Sequential devices can process data only in a certain fixed order.
This category includes devices like terminals and printers.
b. Direct access devices allow the user to read/write a specific
location on the device. Disks are the major example of such a
device.
c. Some devices - such as magnetic tape - are hybrids: they are
basically sequential, but have some direct access capability.
2. Each IO device connects to the system through a CONTROLLER or
INTERFACE, that in turn connects to a system bus. (Often, one
controller may service several devices of the same type to keep
costs down.) For example, the following is a simplified version of
the basic configuration of our old Micro-VAX (CHARITY):
Terminal---
Terminal--- Terminal
Terminal--- controller ________________ Disk ----- Disk
Terminal--- | Controller ----- Disk
Terminal--- | ----- Disk
Terminal--- |
Terminal--- |________ Tape
| Controller ----- Tape
Ethernet--- Network --------|
controller |
System IO Bus
3. Programming routines that access IO devices is quite complicated,
because the code is very device specific, and because it is often
necessary to deal with various kinds of error conditions that can
arise. For this reason, most computer systems are used with an
OPERATING SYSTEM (such as VMS on the VAX or MS/DOS on PC's) that
contains routines (known as device drivers) for each kind of device
on the system. For this reason, we will say only a little about
accessing IO devices in this course.
E. The control unit is the part of the system that is responsible for
carrying out the basic Von Neumann machine "fetch-execute" cycle.
1. To facilitate this, the control unit contains two special registers:
a. An instruction register (IR) to hold the instruction that is
currently being worked on.
b. A program counter (PC) to hold the address of the next instruction
to execute. This must be updated each time through the fetch
execute cycle.
i. Typically, this is done by adding the length of each
instruction to the PC when it is fetched - i.e. instructions
occupy successive locations in memory. (This is analogous
to the way statements in a Pascal program are executed
successively, one after another.
ii. Some instructions serve to alter the PC to change this flow
of control. They are analogous in function to the goto and
procedure call instructions of Pascal (which are, of course,
derived from them.)
2. What does an instruction look like?
a. Normally,it consists of an OPERATION CODE (op code) that tells what
is to be done, plus some number of OPERAND SPECIFIERS that
specify the operands to be used.
b. The set of op-codes that can be used comprises the INSTRUCTION SET
of a particular machine. A typical instruction set might include
operations like the following:
i. Data movement operations
ii. Arithmetic operations: add, sub, mul, div - often with different
versions for different data types (e.g. integer, floating point,
possibly with different operand sizes.)
iii. Bitwise logical operations - bitwise and, or, xor.
iv. Arithmetic comparison operations
v. Conditional and unconditional branch instructions (gotos) and
procedure call instructions
vi. Etc.
c. The operand specifiers often allow a variety of ADDRESSING MODES -
e.g. an operand specifier may specify that a certain constant value
is to be used, or that the contents of a certain register are to
be used, or that the operand is to be found in a certain memory cell.
Example: Consider the following Pascal program fragment
var
i: integer;
p: ^integer;
...
... i + p^ + 3
At the machine language level, three different addressing modes
could be used for the three operands (if available on the machine
in use.)
i. For i - direct addressing. The instruction would contain the
address of i, which contains the value to use.
ii. For p^ - indirect addressing. The instruction would contain the
address of p, which in turn contains the ADDRESS of the value to
use. To access the data, the CPU would make two trips to memory
- one to get the address of the data item (contents of p) and one
the get the data value (contents of p^).
iii. For 3 - immediate addressing. The instruction would contain the
actual value 3.
3. The instruction set of a given machine constitutes a language - the
machine language of that machine - and the control unit is an
interpreter for that language. Each different CPU architecture is
characterized by a distinctive machine language.
Example: Consider the task of adding one to the contents of an
integer variable X that happens to be stored in memory
cell 42. The following is the machine language code for
this on various machines (in hexadecimal)
VAX 0000002A 9F D6
80x86 FF 35 0000002A
MIPS (minimum of three instructions - recall that MIPS is a RISC!)
9402002A
20420001
AC02002A
(Actually, some NOP instructions would likely be needed to account
for delays needed by pipelining)
4. Of course, if one needs to write programs at this level, one
seldom programs directly in machine language. Instead, one typically
uses a symbolic language known as ASSEMBLY LANGUAGE.
Example: Assembly language equivalents of the above:
VAX INCL X
80x86 INC X
MIPS (three instructions - recall that MIPS is a RISC!)
lw $2, X
addi $2, $2, 1
sw $2, X
Note:
a. Mnemonics are used in place of numeric op-codes
b. The ability use symbolic names for storage locations.
c. Generally: one line of assembly language per machine language
instruction.
d. Translation into machine language is a straight-forward mechanical
process done by a program called an ASSEMBLER.
5. Even though we will focus on a particular assembly language (that of
the VAX), it is a goal of this course that you should be able to
transfer what you have learned to another machine. All assembly
languages are similar in principle, though different in form.
F. Conclusion
1. This ends our discussion of the general structure of Von Neumann
computers.
2. Beginning with the next lecture, we will begin considering in detail
the architecture of the VAX, with some comparative looks at MIPS.
Copyright ©1999 - Russell C. Bjork