CITS2002 Systems Programming, Lecture 19,

CITS2002 Systems Programming

CITS2002

CITS2002 schedule

Dynamic data structures

Initially, we focused on scalar and array variables, whose size is known at compile-time.

More recently, we've focused on arrays of values, whose required size was only known at run-time.

In the case of dynamic arrays we've used C11 functions such as:

malloc(), calloc(), realloc(), and free()

to manage the required storage for us.

An extension to this idea is the use of dynamic data structures - collections of data whose required size is not known until run-time. Again, we'll use C11's standard memory allocation functions whenever we require more memory.

However, unlike our use of realloc() to grow (or shrink) a single data structure (there, an array), we'll see two significant differences:

we'll manage a complete data structure by allocating and deallocating its "pieces", and
we'll keep all of the "pieces" linked together by including, in each piece, a "link" to other pieces.

To implement these ideas in C11, we'll develop data structures that contain pointers to other data structures.

All code examples in this lecture are available from here: examples.zip

CITS2002 Systems Programming, Lecture 19, p1, 3rd October 2023.

A simple dynamic data structure - a stack

We'll commence with a simple stack - a data structure that maintains a simple list of items by adding new items, and removing existing items, from the head of the list.

Such a data structure is also termed a first-in-last-out data structure, a FILO, because the first item added to the stack is the last item removed from it (not the sort of sequence you want while queueing for a bank's ATM!).

Let's consider the appropriate type definition in C11:


typedef struct _s {
    int         value;
    struct _s   *next;
}  STACKITEM;

STACKITEM    *stack = NULL;

Of note:

we haven't really defined a stack datatype, but a single item that will "go into" the stack.
the datatype STACKITEM contains a pointer field, named next, that will point to another item in the stack.
we've defined a new type, a structure named _s, so that the pointer field next can be of a type that already exists.
we've defined a single pointer variable, named stack, that will point to a stack of items.

CITS2002 Systems Programming, Lecture 19, p2, 3rd October 2023.

Adding items to our stack data structure

As a program's execution progresses, we'll need to add and remove data items from the data structure.

The need to do this is not known until run-time, and data (perhaps read from file) will determine how large our stack eventually grows.

As its name suggests, when we add items to our stack, we'll speak of pushing new items on the stack, and popping existing items from the stack, when removing them.


typedef struct _s {     // same definition as before   
    int         value;
    struct _s   *next;
}  STACKITEM;

STACKITEM    *stack = NULL;

 ....

void push_item(int newvalue)
{
    STACKITEM  *new = malloc( sizeof(STACKITEM) );  

    if(new == NULL) {     // check for insufficient memory   
        perror( __func__ );
        exit(EXIT_FAILURE);
    }

    new->value   = newvalue;
    new->next    = stack;
    stack        = new;
}

The functions push_item and pop_item are quite simple, but in each case we must worry about the case when the stack is empty.
We use a NULL pointer to represent the condition of the stack being empty.

CITS2002 Systems Programming, Lecture 19, p3, 3rd October 2023.

Removing items from our stack data structure

The function pop_item now removes an item from the stack, and returns the actual data's value.

In this example, the data held in each STACKITEM is just a single integer, but it could involve several fields of data. In that case, we may need more complex functions to return all of the data (perhaps using a structure or pass-by-reference parameters to the pop_item function).

Again, we must ensure that we don't attempt to remove (pop) an item from an empty stack:


int pop_item(void)
{
    STACKITEM  *old;
    int        oldvalue;

    if(stack == NULL) {
        fprintf(stderr, "attempt to pop from an empty stack\n");  
        exit(EXIT_FAILURE);
    }

    oldvalue     = stack->value;
    old          = stack;
    stack        = stack->next;
    free(old);

    return oldvalue;
}

CITS2002 Systems Programming, Lecture 19, p4, 3rd October 2023.

Printing our stack data structure

To print out our whole data structure, we can't just use a standard C11 function as C11 doesn't know/understand our data structure.

Thus we'll write our own function, print_stack, to traverse the stack and successively print each item, using printf.

Again, we must check for the case of the empty stack:


void print_stack(void)
{
    STACKITEM  *thisitem = stack;

    while(thisitem != NULL) {
        printf("%i", thisitem->value);

        thisitem = thisitem->next;

        if(thisitem != NULL)
            printf(" -> ");
    }
    if(stack != NULL)
	printf("\n");
}

Again, our stack is simple because each node only contains a single integer. If more complex, we may call a different function from within print_stack to perform the actual printing:


    ....
    print_stack_item( thisitem );

CITS2002 Systems Programming, Lecture 19, p5, 3rd October 2023.

Using our stack in a Reverse Polish Calculator

Let's employ our stack data structure to evaluate basic integer arithmetic, as if using a Reverse Polish Calculator.

Each integer read from lines of a file is pushed onto the stack, arithmetic operators pop 2 integers from the stack, perform some arithmetic, and push the result back onto the stack.


int evaluate_RPN(FILE *fp)
{
    char  line[BUFSIZ];
    int   val1, val2;

    while( fgets(line, sizeof(line), fp) != NULL ) {  
        if(line[0] == '#')
            continue;
        if(isdigit(line[0]) || line[0] == '-')
            push_item( atoi(line) );

        else if(line[0] == 'a') {
            val1 = pop_item();
            val2 = pop_item();
            push_item( val1 + val2 );
        }
        ....
        else if(line[0] == 'd') {
            val1 = pop_item();
            val2 = pop_item();
            push_item( val2 / val1 );
        }
        else
	    break;
    }
    return pop_item();
}


# Our input data:
12
3
add
5
div

CITS2002 Systems Programming, Lecture 19, p6, 3rd October 2023.

Using our stack in a Reverse Polish Calculator, continued

Careful readers may have noticed that in some cases we don't actually need the integer variables val1 and val2.

We can use the 2 results returned from pop_item as arguments to push_item:


int evaluate_RPN(FILE *fp)
{
    char  line[BUFSIZ];

    while( fgets(line, sizeof line, fp) != NULL ) {
        if(line[0] == '#')
            continue;
        if(isdigit(line[0]) || line[0] == '-')
            push_item( atoi(line) );

        else if(line[0] == 'a') {
            push_item( pop_item() + pop_item() );
        }
        ....
         
    }
    return pop_item();
}

int main(int argc, char *argv[])
{
    printf("%i\n", evaluate_RPN( stdin ) );
    return 0;
}

CITS2002 Systems Programming, Lecture 19, p7, 3rd October 2023.

Problems with our stack data structure

As written, our stack data structure works, but may be difficult to deploy in a large program.

In particular, the whole stack was represented by a single global pointer variable, and all functions accessed or modified that global variable.

What if our program required 2, or more, stacks?
What if the required number of stacks was determined at run-time?
Could the stacks be manipulated by functions that didn't actually "understand" the data they were manipulating ?

Ideally we'd re-write all of our functions, push_item, push_item, and print_stack so that they received the required stack as a parameter, and used or manipulated that stack.

Techniques on how, and why, to design and implement robust data structures are a focus of the unit CITS2200 Data Structures & Algorithms.

CITS2002 Systems Programming, Lecture 19, p8, 3rd October 2023.

Declaring a list of items

Let's develop a similar data structure that, unlike the first-in-last-out (FILO) approach of the stack, provides first-in-first-out (FIFO) storage - much fairer for queueing at the ATM!

We term such a data structure a list, and its datatype declaration is very similar to our stack:


 typedef struct _l {
     char        *string;
     struct _l   *next;
 } LISTITEM;

 LISTITEM   *list  =  NULL;

As with the stack, we'll need to support empty lists, and will again employ a NULL pointer to represent it.

This time, each data item to be stored in the list is string, and we'll often term such a structure as "a list of strings".

CITS2002 Systems Programming, Lecture 19, p9, 3rd October 2023.

Adding (appending) a new item to our list

When adding (appending) new items to our list, we need to be careful about the special (edge) cases:

the empty list, and
when adding items to the end:


void append_item(char *newstring)
{
    if(list == NULL) {           // append to an empty list   
        list = malloc( sizeof(LISTITEM) );
        if(list == NULL) {
            perror( __func__ );
            exit(EXIT_FAILURE);
        }
        list->string  =  strdup(newstring);
        list->next    =  NULL;
    }
    else {                       // append to an existing list
        LISTITEM *p = list;

        while(p->next != NULL) { // walk to the end of the list  
            p  =  p->next;
        }
        p->next = malloc( sizeof(LISTITEM) );
        if(p->next == NULL) {
            perror( __func__ );
            exit(EXIT_FAILURE);
        }
        p          =  p->next;   // append after the last item
        p->string  =  strdup(newstring);
        p->next    =  NULL;
    }
}

Notice how we needed to traverse the whole list to locate its end.
Such traversal can become expensive (in time) for very long lists.

CITS2002 Systems Programming, Lecture 19, p10, 3rd October 2023.

Removing an item from the head our list

Removing items from the head of our list, is much easier.

Of course, we again need to be careful about the case of the empty list:


char *remove_item(void)
{
    LISTITEM *old = list;
    char     *string;

    if(old == NULL) {
        fprintf(stderr, "cannot remove item from an empty list\n");  
        exit(EXIT_FAILURE);
    }

    list   = list->next;
    string = old->string;
    free(old);

    return string;
}

Notice that we return the string (data value) to the caller, and deallocate the old node that was at the head of the list.

We say that the caller now owns the storage required to hold the string - even though the caller did not initially allocate that storage.

It will be up to the caller to deallocate that memory when no longer required.
Failure to deallocate such memory can lead to memory leaks, that may eventually crash long running programs.

CITS2002 Systems Programming, Lecture 19, p11, 3rd October 2023.

Problems with our list data structure

As written, our list data structure works, but also has a few problems:

Again, our list accessing functions use a single global variable.
What if our program required 2, or more, lists?
Continually searching for the end-of-list can become expensive.
Could the lists be manipulated by functions that didn't actually "understand" the data they were manipulating?

We'll address all of these by developing a similar first-in-first-out (FIFO) data structure, which we'll name a queue.

CITS2002 Systems Programming, Lecture 19, p12, 3rd October 2023.

A general-purpose queue data structure

Let's develop a first-in-first-out (FIFO) data structure that queues (almost) arbitrary data.

We're hoping to address the main problems that were exhibited by the stack and list data structures:

We should be able to manage the data without knowing what it is.
We'd like operations, such as appending, to be independent of the number of items already stored.
Such (highly desirable) operations are performed in a constant-time.


typedef struct _e {
    void        *data;
    size_t      datalen;
    struct _e   *next;
} ELEMENT;

typedef struct {
    ELEMENT     *head;
    ELEMENT     *tail;
} QUEUE;

Of note:

We've introduced a new datatype, ELEMENT, to hold each individual item of data.
Because we don't require our functions to "understand" the data they're queueing, each element will just hold a void pointer to the data it's holding, and remember its length.
Our "traditional" datatype QUEUE now holds 2 pointers - one to the head of the list of items, one to the tail.

CITS2002 Systems Programming, Lecture 19, p13, 3rd October 2023.

Creating a new queue

We'd like our large programs to have more than a single queue - thus we don't want a single, global, variable, and we don't know until run-time how many queues we'll require.

We thus need a function to allocate space for, and to initialize, a new queue:


QUEUE *queue_new(void)
{
    QUEUE *q = malloc( sizeof(QUEUE) );

    if(q == NULL) {
        perror( __func__ );
        exit(EXIT_FAILURE);
    }
    q->head    = NULL;
    q->tail    = NULL;

    return q;
}

    ....
    QUEUE  *people_queue  =  queue_new();
    QUEUE  *truck_queue   =  queue_new();


QUEUE *queue_new(void)  //  same outcome, often seen
{
    QUEUE *q = calloc( 1, sizeof(QUEUE) );

    if(q == NULL) {
        perror( __func__ );
        exit(EXIT_FAILURE);
    }
    return q;
}

If we remember that:

the calloc function both allocates memory and sets all of its bytes to the zero-bit-pattern, and
that (most) C11 implementations represent the NULL pointer as the zero-bit-pattern,

then we appreciate the simplicity of allocating new items with calloc.

CITS2002 Systems Programming, Lecture 19, p14, 3rd October 2023.

Deallocating space used by our queue

It's considered a good practice to always write a function that deallocates all space used in our own user-defined dynamic data structures.

In the case of our queue, we need to deallocate 3 things:

the memory required for the data in every element,
the memory required for every element,
the queue itself.


void queue_free(QUEUE *q)
{
    ELEMENT     *this, *save;

    this  = q->head;
    while( this != NULL ) {
        save      = this;
        this      = this->next;
        free(save->data);
        free(save);
    }
    free(q);
}

    QUEUE  *my_queue  =  queue_new();
    ....
    //  use my local queue
    ....
    queue_free( my_queue );

CITS2002 Systems Programming, Lecture 19, p15, 3rd October 2023.

Adding (appending) new items to our queue

Finally, we'll considered adding new items to our queue.

Remember two of our objectives:

To quickly add items - we don't wish appending to a very long queue to be slow.
We achieve this by remembering where the tail of the queue is, and quickly adding to it without searching.
To be able to queue data that we don't "understand".
We achieve this by treating all data as "a block of bytes", allocating memory for it, copying it (as we're told its length), all without ever interpreting its contents.

CITS2002 Systems Programming, Lecture 19, p16, 3rd October 2023.

Adding (appending) new items to our queue, continued


void queue_add(QUEUE *Q, void *data, size_t datalen)
{
    ELEMENT     *newelement;

//  ALLOCATE MEMORY FOR A NEW ELEMENT
    newelement          = calloc(1, sizeof(ELEMENT));  
    if(newelement == NULL) {
        perror( __func__ );
        exit(EXIT_FAILURE);
    }

//  ALLOCATE MEMORY FOR THE DATA IN THE NEW ELEMENT
    newelement->data    = malloc(datalen);
    if(newelement->data == NULL) {
        perror( __func__ );
        exit(EXIT_FAILURE);
    }

//  SAVE (COPY) THE UNKNOWN DATA INTO OUR NEW MEMORY
    memcpy(newelement->data, data, datalen);
    newelement->datalen = datalen;
    newelement->next    = NULL;

//  APPEND THE NEW ELEMENT TO AN EMPTY LIST
    if(q->head == NULL) {
        q->head         = newelement;
        q->tail         = newelement;
    }
//  OR APPEND THE NEW ELEMENT TO THE TAIL OF THE LIST
    else {
        q->tail->next   = newelement;
        q->tail         = newelement;
    }
}

Writing a function to remove items from our queue, is left as a simple exercise.

CITS2002 Systems Programming, Lecture 19, p17, 3rd October 2023.

Storing and searching ordered data - a binary tree

Each of the previous self-referential data-structures stored their values in their order of arrival, and accessed or removed them in the same order or the reverse. The actual time of insertion is immaterial, with the relative times 'embedded' in the order of the elements.

More common is to store data in a structure that embeds the relative magnitude or priority of the data. Doing so requires insertions to keep the data-structure ordered, but this makes searching much quicker as well.

Let's consider the type definition and insertion of data into a binary tree in C11:


typedef struct _bt {
    int            value;
    struct _bt     *left;
    struct _bt     *right;
} BINTREE;

BINTREE *tree_root    = NULL;


BINTREE *tree_insert(BINTREE *t, int value)
{
    if(t == NULL) {
        BINTREE *new  = calloc(1, sizeof(BINTREE));

        if(new == NULL) {
	    perror( __func__ );
	    exit(EXIT_FAILURE);
        }
        new->value    = value;
// the calloc() call has set both left and right to NULL
        return new;
    }

    int order = (t->value - value);

    if(order > 0) {
        t->left       = tree_insert(t->left,  value);
    }
    else if(order < 0) {
        t->right      = tree_insert(t->right, value);
    }
    return t;
}

Of note:

we've defined a data-structure containing two pointers to other instances of the data-structure.
the use of the struct _bt data type is temporary, and never used again.
here, each element of the data-structure, each node of the tree, holds a unique instance of a data value - here, a single integer - though it's very common to hold multiple data values.

we insert into the tree with:

tree_root = tree_insert(tree_root, new_value);

the (magnitude of the) integer data value embeds the order of the structure - elements with lesser integer values are stored 'below' and to the left of the current node, higher values to the right.
unlike some (more complicated) variants of the binary-tree, we've made no effort to keep the tree balanced. If we insert already sorted elements into the tree, the tree will degenerate into a list, with every node having either a NULL left or a NULL right pointer.

CITS2002 Systems Programming, Lecture 19, p18, 3rd October 2023.

Storing and searching ordered data - a binary tree, continued

Knowing that we've built the binary tree to maintain an order of its elements, we exploit this property to find elements:


bool find_recursively(BINTREE *t, int wanted)
{
    if(t != NULL) {
        int order = (t->value - wanted);

        if(order == 0) {
            return true;
	}
        else if(order > 0) {
            return find_recursively(t->left, wanted);
	}
        else {
            return find_recursively(t->right, wanted);
	}
    }
    return false;
}


bool find_iteratively(BINTREE *t, int wanted)
{
    while(t != NULL) {
        int order = (t->value - wanted);

        if(order == 0) {
            return true;
	}
        else if(order > 0) {
            t = t->left;
	}
        else {
            t = t->right;
	}
    }
    return false;
}

Of note:

we search for a value in the tree with:

bool found = find_recursively(tree_root, wanted_value);

we do not modify the tree when searching, we simply 'walk' over its elements, determining whether to go-left or go-right depending on the relative value of each element's data to the wanted value.
some (more complicated) variants of the binary-tree re-balance the tree by moving recently found values (their nodes) closer to the root of the tree in the hope that they'll be required again, soon.
if the required value if found, the searching functions return true; otherwise we keep walking the tree until we find the value or until we can no longer walk in the required direction (because either the left or the right pointer is NULL).

CITS2002 Systems Programming, Lecture 19, p19, 3rd October 2023.