|
Project 1 2021
The goal of this project is to emulate the execution of programs
from a small, C-like programming language,
on a stack-based machine with a write-back cache,
and to report some metrics of the program's execution.
Successful completion of the project will enhance your understanding of
some features of the C11 programming language,
and develop your understanding of how simple programs can be executed
on a general-purpose computer architecture.
The cool programming language
The cool programming language (not a real programming language that you'll find on the web!)
is a very small programming language with a C-like syntax.
It is designed to be easily understood by C programmers
(who will have no difficulty relating it to C's appearance and execution)
so that attention may focus on the execution of the programs.
Its
syntax is presented here.
Here is a quick summary of cool's (lack of) features:
- programs are written in text files,
whose filename ends in the .cool extension.
- each self-contained program is written in a single source-file,
and cannot access any external libraries.
- all programs must have a single 'int main(void)' function,
which receives no parameters,
and returns an integer result providing the program's exit status.
- program termination must be through main(),
as there is no exit() function.
- all variables (globals, parameters, locals) and expressions are
of type integer.
- all variables are scalar variables; there are no array variables.
- by default, all variables are initialised to zero,
or may be initialised to an integer constant.
- there is no bool datatype,
but non-zero expressions are considered 'true'
in evaluating if and while statements.
- all functions are of type void, or return an integer.
- all identifiers must be defined before their use
(no forward declarations or mutually-recursive functions).
- to aid debugging,
a single print statement prints either
an integer expression or a character string
(the only place that strings exist).
Compiling and running cool programs
Programs are compiled by the coolc compiler,
which translates a .cool program to an 'executable' format,
whose filename ends in the .coolexe extension.
The compiler will only be available via a webpage
while the project is running (details follow),
but its source code (2410 lines, so far)
will be made available after the project.
This approach allows bugs to be corrected for all users,
avoids the problems of providing multi-platform support,
and avoids compiler versioning and distribution problems.
The coolc compiler generates code for a (ficticious) stack-based machine.
Code generation is performed in a single-pass without the need
for an intermediate assembly-language file
(although coolc displays a 'fake' assembly-language file on its webpage).
cool programs may be compiled in two (related) ways:
Once you have a coolexe file on your computer,
using either of the two mechanisms described above,
you can emulate the execution of the program using your program,
your project,
named runcool:
prompt> ./runcool myprogram.coolexe
The stack-based computer architecture
The target computer is a stack-based machine,
with no general-purpose registers.
The stack is used to hold temporary results during arithmetic evaluation,
and memory addresses to manage the flow of each program's execution (described later).
Control registers
Three specific on-CPU control registers control the execution of programs:
- PC - the program counter, holds the memory address of the next
instruction to fetched from the computer's RAM.
- SP - the stack pointer points to the word on the top of
the computer's stack.
- FP - the frame pointer points to the current function's
activation frame of the computer's stack (described later).
The on-CPU control registers are modified in response to each program's instructions.
For example, an instruction requiring an integer to be pushed onto the stack,
results in SP being modified.
Similarly, a function call results in the PC's value being pushed onto the stack,
and the PC modified to point to the first instruction of the called function.
16-bit words
The computer on which cool programs execute can be described as an (aging)
16-bit computer
[historic reference
for those interested].
- a pair of 2 successive bytes are termed a (16-bit) word
- each memory address is 16-bits wide,
meaning that there are 216 distinct addresses.
- memory addresses may be considered as unsigned integers,
ranging from 0 to 65535.
- memory is word addressable,
meaning that each address refers to one of 216 possible
words of memory (or 128KB in total).
The on-CPU registers - PC, SP, and FP - each hold one address,
and are thus 16-bits wide.
- the only supported datatype is the signed integer,
ranging from -32,768 to 32,767.
- conveniently, and by design, each address and each integer occupies
the same sized unit of memory - one word.
Main memory and cache memory
Our stack-based machine has a 3-stage memory hierarchy.
During a program's execution,
requests to read and write memory are made by the CPU,
and the requests and data 'flow' between CPU, cache-memory, and main-memory.
- The CPU contains just three very fast on-CPU control registers,
the PC, SP, and FP,
but these cannot be used as general-purpose memory locations to store arbitrary values.
These registers are only changed through the interpretation of a program's instructions.
- The cache-memory is a relatively fast bank of memory
(typically 8-20x faster than main-memory, but not as fast as the CPU's registers),
and stores a total of just 32 (25) words.
- The main-memory is a relatively slow 'bank' of memory,
is used to store the program's instructions, data, and runtime stack,
and stores a total of 64K (216) words,
Different computer architectures have varying sized memory hierarchies -
a typical laptop computer may have 8GB of main-memory,
16MB of cache-memory,
and 12+ on-CPU general purpose registers.
If the instructions and data we want are in the fastest form of available memory,
our programs will run faster.
However, faster memory costs more money (built using different memory technologies) and,
so, we typically cannot store all of our instructions and data in the fastest memory.
As in the following image,
cache-memory lies (logically) between the CPU and main-memory and,
so, all requests to read instructions and to read or write data words are seen by the cache.
|
⟷
|
cache memory
32 words total
|
|
⟷
|
stack ⬇ ⬇ |
main memory
64K words total
|
data segment
|
text segment
|
boot segment
|
|
Because cache is much smaller than main-memory,
each cache location must remember both its data and the main-memory location
to which that data 'belongs'.
When a word from main-memory is copied through the cache,
that word's address is always 'mirrored' in the same cache location.
There is only one single cache location for each main-memory location,
but many potential main-memory locations for each cache location.
Thus, each cache location must remember which main-memory location it is caching.
- Each word read from main-memory is retained by the cache and,
if required again,
is quickly taken from cache without 'asking' the main-memory for another (identical) copy.
When the required data is available in the cache,
we have a cache-hit.
However, if the required data is not available in the cache,
it must first be fetched from main-memory.
This is termed a cache-miss.
- When the CPU writes data words,
the cache has two potential courses of action:
- it could immediately write the word to the main-memory.
As no other components can modify main-memory,
the cache and main-memory are always 'in sync'.
This is termed a write-through policy.
Simple.
- alternatively, the cache could remember the new word,
but not immediately copy it to main-memory.
The cache and main-memory are now 'out of sync'.
Any future read request for the same word can be satified by the cache.
However, if the requested word is not in cache,
it must be copied from the main-memory on its way to the CPU.
As the new word requires the same cache location
as the data that was not previously written to main-memory,
that delayed write to main-memory must now be be performed first.
For each word of cache,
we need to remember whether is correctly reflects what is in main-memory -
whether the cache's copy is clean or dirty.
Dirty data needs writing to main-memory before its cache location
can be re-used to store new data from main memory.
This is termed a write-back policy.
Phew.
For this project, you must implement a write-back cache.
You are also required to calculate (count) and report
four basic statistics to measure the effectiveness of the cache:
- the number of main-memory reads,
- the number of main-memory writes,
- the number of cache-memory hits, and
- the number of cache-memory misses.
The instruction set
cool programs consist of a mixture of
instructions,
addresses,
and
data (integers).
The computer's
instruction set is presented here.
(At present) there are only 17 instructions.
Each instruction has a unique (integer) value (0..16)
and each instruction requires one or two words of the computer's memory:
- Some instructions, such as 'add', require no additional information
(here, because the values to be added reside on the stack).
The very next word after an 'add' instruction
is the next instruction to be executed.
'add' is an example of a 0-operand instruction,
and requires one word in total.
- Some instructions, such as 'call' require additional information.
The very next word after a 'call' instruction
is the memory address of the first instruction of the function to be called.
'call' is an example of a 1-operand instruction,
and requires two words in total.
Calling and returning from functions
The implementation of functions is one of the more difficult concepts to understand, because:
- functions need to access their parameters and local variables,
- functions need to return to wherever they were called from,
- functions can call other functions (even recursively), and
- functions can return values to their caller.
Each function employs memory on the stack to hold incoming parameters
and local variables.
This region of memory is called a stack-frame and resides on the stack.
An on-CPU control register, FP (frame-pointer),
allows us to locate the current function's frame.
The code to access parameters and local variables
while a function is executing is generated in terms of offsets to FP.
Parameters appear 'above' FP (at higher addresses, positive offsets),
and
local variables appear 'below' FP (at lower addresses, negative offsets).
The stack pointer, SP, may change during the execution of a function as
values are pushed or popped off the stack
(such as pushing parameters in preparation for calling another function).
However, the FP doesn't change throughout the execution of a function.
Consider the following cool function,
and the modifications to the on-CPU control registers and the stack
when myfunc() is called:
- the function wishing to call myfunc()
passes 2 actual parameters by pushing them onto the stack (light green)
- the PC now points to the call instruction,
which is followed by the address of the first instruction of myfunc().
- so that we know where to return to,
the address immediately following these two words
(the address of the next instruction)
is saved onto the stack - the 'return address' (pink).
- the current value of FP (soon to be the previous value of FP)
is saved onto the stack (pink),
and the current value of SP, is copied into FP
(to become myfunc()'s frame-pointer).
- space for myfunc()'s 3 local variables (purple) is allocated
by pushing their initialised values onto the stack.
- execution may now contine from the first 'real' instruction of myfunc(),
knowing that we can locate each of the parameters and local variables
by their offsets from the current FP.
- we can return the function's return value,
by effectively pushing the returned value to the TOS of the caller's stack frame
by overwriting the first element of myfunc()'s stack frame,
(parameter a, at FP+3 in this example),
and we can restore the FP to the correct value in the caller's stack frame.
int myfunc(int a, int b) |
{ |
int c = 4; |
int d; |
int e = 9; |
|
// first 'real' instruction of myfunc() |
// ... |
return c + d + e; |
} |
|
high address ⟶ (stack grows downward) |
caller's stack frame
|
⟵ previous FP |
FP+3 |
parameter a |
⟵ returned value will go here |
FP+2 |
parameter b |
|
FP+1 |
return address |
|
|
saved previous FP |
⟵ current FP |
FP-1 |
local c |
|
FP-2 |
local d |
|
FP-3 |
local e |
⟵ current SP (TOS) |
low address ⟶ |
future stack frames ⬇ ⬇
|
|
|
When returning from myfunc(),
the on-CPU control registers need restoring to their correct values
so that execution may continue within the function that called myfunc().
In addition,
the function's returned value (which will be on its TOS)
needs leaving on the TOS of the calling function -
to the first location of the current frame.
In the diagram above,
the returned value will be copied to the same stack location
as parameter a.
Careful - if the function has no parameters,
the returned value will be copied to the same stack location as the return address.
Project requirements
|