CITS2002 Systems Programming  
CITS2002 CITS2002 schedule  

The structure of C programs

Let's looks at the high-level structure of a short C program, rotate.c (using ellipsis to omit some statements for now).
At this stage it's not important what the program is supposed to do.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

/* Compile this program with:
   cc -std=c99 -Wall -Werror -pedantic -o rotate rotate.c
 */

#define ROT 13

static char rotate(char c)
{
    .....
    return c;
}

int main(int argcount, char *argvalue[])
{
    // check the number of arguments
    if(argcount != 2) {
        ....
        exit(EXIT_FAILURE);
    }
    else {
        ....
        exit(EXIT_SUCCESS);
    }
    return 0;
}

Of note in this example:

  • Characters such as a space, tab, or newline, may appear almost anywhere - they are stripped out and ignored by the C compiler.

    We use such whitespace characters to provide a layout to our programs. While the exact layout is not important, using a consistent layout is very good practice.

  • Keywords, in bold, mean very specific things to the C compiler.

  • Lines commencing with a '#' in blue are processed by a separate program, named the C preprocessor.

    In practice, our program is provided as input to the preprocessor, and the preprocessor's output is given to the C compiler.

  • Lines in green are comments. They are ignored by the C compiler, and may contain (almost) any characters.

    C99 provides two types of comments -

    1. /* block comments */  and
    2. // comments to the end of a line

CITS2002 Systems Programming, Lecture 2, p1, 2nd August 2019.

 

The structure of C programs, continued


#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

/* Compile this program with:
   cc -std=c99 -Wall -Werror -pedantic -o rotate rotate.c
 */

#define ROT 13

static char rotate(char c)
{
    .....
    return c;
}

int main(int argcount, char *argvalue[])
{ 
    // check the number of arguments
    if(argcount != 2) {
        ....
        exit(EXIT_FAILURE);
    }
    else {
        ....
        exit(EXIT_SUCCESS);
    }
    return 0;
}

Same program, but more to note:

  • A variety of brackets are employed, in pairs, to group together items to be considered in the same way. Here:

    • angle brackets enclose a filename in a #include directive,
    • round brackets group items in arithmetic expressions and function calls,
    • square brackets enclose the index when access arrays (vectors and matrices...) of data, and
    • curly brackets group together sequences of one or more statements in C. We term a group of statements a block of statements.

  • Functions in C, may be thought of as a block of statements to which we give a name. In our example, we have two functions - rotate() and main().

  • When our programs are run by the operating system, the operating system always starts our program from main(). Thus, every complete C program requires a main() function.

    The operating system passes some special information to our main() function, command-line arguments, and main() needs a special syntax to receive these.

  • Most C programs you read will name main()'s parameters as argc and argv.

  • When our program finishes its execution, it returns some information to the operating system. Our example here exits by announcing either its failure or success.

CITS2002 Systems Programming, Lecture 2, p2, 2nd August 2019.

 

Compiling and linking our C programs

C programs are human-readable text files, that we term source-code files.

This makes them very easy to copy, read, and edit on different computers and different operating systems.
C is often described as being portable at the source-code level.

Before we can run (execute) our C programs, we must translate, or compile, their source-code files to files that the operating system can better manage.
A program known as a compiler translates (compiles) source-code files into object-code files.

Finally, we translate or link one or more object-code files to produce an executable program, often termed a 'binary', an 'executable', or an 'exe' file.
A program known as a linker performs this translation, also linking our object-code file(s) with standard libraries and (optionally) 3rd-party libraries.

compilation compiling

Depending on how we invoke the compiler, sometimes we can 'move' straight from the source-code files to the executable program, all in one step.
In reality the compiler is 'silently' executing the linker program for us, and then removing any unwanted object-files.

CITS2002 Systems Programming, Lecture 2, p3, 2nd August 2019.

 

Variables

Variables are locations in a computer's memory. A typical desktop or laptop computer will have 4-16GB of memory, or four-sixteen billion addressable memory locations,

A typical C program will use 4 bytes to hold a single integer value, or 8 bytes to hold a single floating-point value.

Any variable can only hold a single value at any time - they do not maintain a history of past values they once had.

Naming our variables

To make programs more readable, we provide variables with simple names. We should carefully choose names to reflect the role of the variable in our programs.

  • While variable names can be almost anything (but not the same as the keywords in C) there's a simple restriction on the permitted characters in a name -

    • they must commence with an alphabetic or the underscore character (_ A-Z a-z), and
    • be followed by zero or more alphabetic, underscore or digit characters (_ A-Z a-z 0-9).

  • C variable names are case sensitive, thus:

    MYLIMIT,  mylimit,  Mylimit  and  MyLimit

    are four different variable names.

  • While not required, it's preferred that variable names do not consist entirely of uppercase characters.
    We'll consistently use uppercase-only names for constants provided by the C preprocessor, or user-defined type names:

    MAXLENGTH,  AVATAR,  BUFSIZ,  and  ROT

  • Older C compilers limited variable names to, say, 8 unique characters. Thus, for them,

    turn_nuclear_reactor_coolant_on    and    turn_nuclear_reactor_coolant_off

    are the same variable! Keep this in mind if ever developing portable code for old environments.

CITS2002 Systems Programming, Lecture 2, p4, 2nd August 2019.

 

Basic datatypes

Variables are declared to be of a certain datatype, or just type.

We use different types to represent the permissible values that a program's variable has.

For example, if we're using a variable to just count things, we'll use an integer variable to hold the count; if performing trigonometry on angles expressed in radians, we'll use floating-point variables to hold values with both an integral and a fractional part.

C provides a number of standard, or base types to hold commonly required values, and later we'll see how we can also define our own user-defined types to meet our needs.

Let's look quickly at some of C's base datatypes:

typename description, and an example of variable initialization
bool Boolean (truth values), which may only hold the values of either true or false
e.g.  bool finished = false;
char character values, to each hold a single values such as an alphabetic character, a digit character, a space, a tab...
e.g.  char initial = 'C';
int integer values, negative, positive, and zero
e.g.  int year = 2006;
float floating point values, with a typical precision of 10 decimal digits (on our lab machines)
e.g.  float inflation = 4.1;
double "bigger" floating point values, with a typical precision of 17 decimal digits (on our lab machines)
e.g.  double pi = 3.1415926535897932;

Some textbooks will (too quickly) focus on the actual storage size of these basic types, and emphasise the ranges of permissible values. When writing truly portable programs - that can execute consistently across different hardware architectures and operating systems - it's important to be aware of, and avoid, their differences. We'll examine this issue later, but for now we'll focus on using these basic types in their most obvious ways.

From where does the bool datatype get its name? - the 19th century mathematician and philosopher, George Boole.

CITS2002 Systems Programming, Lecture 2, p5, 2nd August 2019.

 

The Significance of Integers in C

Throughout the 1950s, 60s, and 70s, there were many more computer hardware manufacturers than there are today. Each company needed to promote its own products by distinguishing them from their competitors.

At a low level, different manufacturers employed different memory sizes for a basic character - some just 6 bits, some 8 bits, 9, and 10. The unfortunate outcome was the incompatability of computer programs and data storage formats.

The C programming language, developed in the early 1970s, addressed this issue by not defining the required size of its datatypes. Thus, C programs are portable at the level of their source code - porting a program to a different computer architecture is possible, provided that the programs are compiled on (or for) each architecture. The only requirement was that:

sizeof(char) ≤ sizeof(short) ≤ sizeof(int) ≤ sizeof(long)

Since the 1980s, fortunately, the industry has agreed on 8-bit characters or bytes. But (compiling and) running the C program on different architectures:


#include <stdio.h>

int main(void)
{
    printf("char  %lu\n", sizeof(char));
    printf("short %lu\n", sizeof(short));
    printf("int   %lu\n", sizeof(int));
    printf("long  %lu\n", sizeof(long));
    return 0;
}

may produce different (though still correct) results:
char 1 short 2 int 4 long 8
It's permissible for different C compilers on different architectures to employ different sized integers.

Why does this matter? Different sized integers can store different maximum values - the above datatypes are signed (supporting positive and negative values) so a 4-byte integer can only represent the values -2,147,483,648 to 2,147,483,647.

If employing integers for 'simple' counting, or looping over a known range of values, there's rarely a problem. But if using integers to count many (small) values, such as milli- or micro-seconds, it matters:

CITS2002 Systems Programming, Lecture 2, p6, 2nd August 2019.

 

The scope of variables

The scope of a variable describes the range of lines in which the variable may be used. Some textbooks may also term this the visibility or lexical range of a variable.

C has only 2 primary types of scope:

  • global scope (sometimes termed file scope) in which variables are declared outside of all functions and statement blocks, and

  • block scope in which variables are declared within a function or statement block.


01  #include <stdio.h>
02  #include <stdlib.h>
03  #include <string.h>
04  #include <ctype.h>
05
06  static int count = 0;
07
08  int main(int argcount, char *argvalue[])
09  {
10      int nfound = 0;
11
12      // check the number of arguments
13      if(argcount != 2) {
14          int nerrors = 1;
15
16          ....
17          exit(EXIT_FAILURE);
18      }
19      else {
20          int ntimes = 100;
21
22          ....
23          exit(EXIT_SUCCESS);
24      }
25      return 0;
26  }

  • The variable count has global scope.

    It is defined on line 06, and may be used anywhere from line 06 until the end of the file (line 26).

    The variable count is also preceded by the keyword static, which prevents it from being 'seen' (read or written) from outside of this file rot.c

  • The variable nfound has block scope.

    It is defined on line 10, and may be used anywhere from line 10 until the end of the block in which it was defined (until line 26).

  • The variable nerrors has block scope.

    It is defined on line 14, and may be used anywhere from line 14 until line 18.

  • The variable ntimes has block scope.

    It is defined on line 20, and may be used anywhere from line 20 until line 24.


  • We could define a different variable named nerrors in the block of lines 20-24 - without problems.

  • We could define a different variable named nfound in the block of lines 20-24 - but this would be a very bad practice!

CITS2002 Systems Programming, Lecture 2, p7, 2nd August 2019.