CITS2002 Systems Programming  
CITS2002 CITS2002 schedule  

The two standard output streams - stdout and stderr

To date, most of our output has been produced using the function printf() which, by default, sends its output to the terminal screen.

Pedantically, printf() actually sends its output to the stdout (pronounced standard-output) stream which, by default, is connected to the screen, but may be redirected to a file or even to another program (using operating system features, not C features).

For example, on macOS and Linux:

prompt> mycc -o myprogram myprogram.c                                   
prompt> myprogram     # appears on the screen
prompt> myprogram  >  output.dat
prompt> myprogram  |  wc -l

Here, our program "doesn't care" where its output is going - to the default location (the screen), to a file, or through a pipe.
In our programs, we could choose to explicitly send to the stdout stream, instead of printf(), with:

fprintf(stdout, "a longhand mechanism, equivalent to calling printf\n"); 

stdout and stderr are two of the standard I/O streams that are created and initialized by the C runtime system when the execution of new C programs is commenced.

In general, we prefer to write "normal" output to stdout, and errors to stderr, and we'll adopt this practice in the rest of the unit.

FILE *fp = fopen("results.data", "w");

if(fp == NULL) {
    fprintf(stderr, "Cannot open results.data\n");                       
    exit(EXIT_FAILURE);
}
else {
    fprintf(fp, .....);                       
    .....
}

The standard perror() function (from Lecture-11) also sends its output to stderr.

CITS2002 Systems Programming, Lecture 19, p1, 7th October 2019.

 

Reading input via stdin

We'll commence using an improved, more general, main() function, accepting zero or more command-line arguments representing filenames. If no file names are provided, our programs will process input from the stdin (pronounced standard-input) stream.

This is standard mechanism by which a single program receives its input via a named file, the contents of a redirected file, or through a pipe. We describe such a program as a filter [see The Art of Unix Programming, ch07]:

#include  <stdio.h>

//  process() READS FROM A FILE-POINTER, REGARDLESS OF ITS SOURCE
int process(FILE *infp)
{
    int result = 0;
    ....
    return result;
}

int main(int argc, char *argv[])
{
    int result;

    printf("program's name is %s\n", argv[0]);
//  NO ARGUMENTS PROVIDED, ACCEPT DEFAULT (STANDARD) INPUT
    if(argc == 1) {
        result = process(stdin);
    }
    else {
//  FILE NAME ARGUMENTS PROVIDED, READ FROM EACH OF THEM
        for(int a=1 ; a<argc ; ++a) {
            FILE *infp = fopen(argv[a], "r");

            if(infp == NULL) {
                fprintf(stderr, "%s: cannot open %s\n", argv[0], argv[a]);  
                exit(EXIT_FAILURE);
            }
            result = process(infp);
            fclose(infp);   // WE OPENED, SO WE CLOSE IT
        }
    }
    return result;
}

CITS2002 Systems Programming, Lecture 19, p2, 7th October 2019.

 

File streams and file descriptors

File streams such as stdin, stdout, and stderr are features of the C99 language. They hide from a (portable) C99 program the underlying operating system details.

Unix-based operating systems provide file descriptors, simple integer values, to identify 'communication channels' - such as files, interprocess-communication pipes, (some) devices, and network connections (sockets).

C99 defines the FILE * datatype (the file-pointer) as an abstraction over file descriptors to manage (simplify?) access to descriptors.

(As we know) when calling C99 standard functions we provide, or receive, a file-pointer. When calling OS system-calls we provide, or receive, a file-descriptor.

operation OS file-descriptors C99 file-pointers
opening and creating int open(char *pathname, int flags) FILE *fopen(char *pathname, char *mode)
closing int close(int fd) int fclose(FILE *fp)
reading size_t read(int fd, void *buffer, size_t nbytes) size_t fread(void *ptr, size_t size, size_t n, FILE *fp)
int fscanf(FILE *fp, char *format, ...)
char *fgets(char *array, int size, FILE *fp)
int fgetc(FILE *fp)
writing size_t write(int fd, void *buffer, size_t nbytes) size_t fwrite(void *ptr, size_t size, size_t n, FILE *fp)
int fprintf(FILE *fp, char *format, ...)
int fputs(char *string, FILE *fp)
int fputc(int ch, FILE *fp)
positioning long lseek(int fd, long offset, int wrt); int fseek(FILE *fp, long offset, int wrt)
long ftell(FILE *fp)
void rewind(FILE *fp)

CITS2002 Systems Programming, Lecture 19, p3, 7th October 2019.

 

Buffered Input

When reading from a file, the code for using file pointers and file descriptors can be very similar.

However, the standard I/O streams perform input buffering - the data read from a file is actually read into a large memory buffer (owned by the user's process and 'remembered' by the file-pointer) and then 'dished out' to the requesting application:

Remember that the read() function is a system-call, and that system-calls can be expensive because it permits the operating system to reschedule the requesting process from the RUNNING state to the BLOCKED state if the data is not ready.

#include  <stdio.h>
#include  <stdlib.h>
#include  <unistd.h>


#define  MYSIZE      10000

void file_pointer_read(char *filename)
{
    size_t got;
    char   buffer[MYSIZE];

    FILE *fp    = fopen(filename, "r");
    if(fp == NULL) {
        perror( progname );
        exit(EXIT_FAILURE);
    }

    while((got = fread(buffer,1,sizeof buffer,fp)) > 0) {
        ......
    }
    fclose(fp);
}
#include  <stdio.h>
#include  <fcntl.h>
#include  <stdlib.h>
#include  <unistd.h>

#define  MYSIZE      10000

void file_descriptor_read(char *filename)
{
    size_t got;
    char   buffer[MYSIZE];

    int fd    = open(filename, O_RDONLY);
    if(fd == -1) {
        perror( progname );
        exit(EXIT_FAILURE);
    }

    while((got = read(fd, buffer, sizeof buffer)) > 0) {
        .....
    }
    close(fd);
}

Consider what may happen in the above code if the value of MYSIZE is not always 10000, but is 1, 10, 1000, or 100000.

Buffered Output

Similarly, a big distinction between stdout and stderr is that the former is buffered (for efficiency), while the latter is unbuffered (to ensure output appears immediately).

A consequence of this is that, if all of your output is sent to stdout and your program crashes, you may not see all of your output.

CITS2002 Systems Programming, Lecture 19, p4, 7th October 2019.

 

The standard command-line arguments

In all of our C programs to date we've seen the use of, but not fully explained, command-line arguments.

We've noticed that the main() function receives command-line arguments from its calling environment (usually the operating system):

#include <stdio.h>

int main(int argc, char *argv[])
{
    printf("program's name: %s\n", argv[0]); 

    for(int a=0 ; a < argc ; ++a) {
        printf("%i: %s\n", a, argv[a] );
    }
    return 0;
}

We know:

  • that argc provides the count of the number of arguments, and
  • that all programs receive at least one argument (the program's name).

CITS2002 Systems Programming, Lecture 19, p5, 7th October 2019.

 

But what is argv?

If we read argv's definition, from right to left, it's

"an array of pointers to characters"

Or, try cdecl.org

While we typically associate argv with strings, we remember that C doesn't innately support strings. It's only by convention or assumption that we may assume that each value of argv[i] is a pointer to something that we'll treat as a string.

In the previous example, we print "from" the pointer. Alternatively, we can print every character in the arguments:

#include <stdio.h>

int main(int argc, char *argv[])
{
    for(int a=0 ; a < argc ; ++a) {
        printf("%i: ", a);

	for(int c=0 ; argv[a][c] != '\0' ; ++c)  {  
            printf("%c", argv[a][c] );
        }
        printf("\n");
    }
    return 0;
}

The operating system actually makes argv much more usable, too:

  • each argument is guaranteed to be terminated by a null-byte (because they are strings), and
  • the argv array is guaranteed to be terminated by a NULL pointer.

CITS2002 Systems Programming, Lecture 19, p6, 7th October 2019.

 

Parsing command-line arguments

By convention, most applications support command-line arguments (commencing with a significant character) that appear between a program's name and the "true" arguments.

For programs on Unix-derived systems (such as Apple's macOS and Linux), these are termed command switches, and their introductory character is a hyphen, or minus.

Keep in mind, too, that many utilities appear to accept their command switches in (almost) any order. For the common ls program to list files, each of these is equivalent:

  • ls   -l -t -r  files
  • ls   -lt -r  files
  • ls   -ltr  files
  • ls   -rtl  files

Of note, neither the operating system nor the shell know the switches of each program, so it's up to every program to detect them, and report any problems.

CITS2002 Systems Programming, Lecture 19, p7, 7th October 2019.

 

Parsing command-line arguments, continued

Consider the following program, that accepts an optional command switch, -d, and then zero or more filenames:

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

char *progname;
bool dflag        = false;

int main(int argc, char *argv[])
{
    progname = argv[0];

    --argc; ++argv;

    while(argc > 0 && (*argv)[0] == '-') {  // or  argv[0][0]
        if((*argv)[1] == 'd')               // or  argv[0][1]
            dflag = !dflag;
        else
            argc = 0;
        --argc; ++argv;
    }
    if(argc < 0) {
        fprintf(stderr, "Usage : %s [-d] [filename]\n", progname);  
        exit(EXIT_FAILURE);
    }
    if(argc > 0) {
        while(argc > 0) {
            process_file(*argv);    // provide filename to function
            --argc; ++argv;
        }
    }
    else {
        process_file(NULL);         // no filename, use stdin or stdout?
    }
    return 0;
}

CITS2002 Systems Programming, Lecture 19, p8, 7th October 2019.

 

Parsing command-line arguments with getopt

As programs become more complicated, they often accept many command switches to define and constrain their execution (something casually termed creeping featurism [1]).

In addition, command switches do not just indicate, or toggle, a Boolean attribute to the program, but often provide additional string values and numbers to further control the program.

The correct parsing of command switches can quickly become complicated!

To simplify the task of processing command switches in our programs, we'll use the function getopt().

getopt() is not a function in the Standard C library but, like the function strdup(), it is widely available and used. In fact, getopt() conforms to a different standard - an POSIX standard [2], which provides functions enabling operating system portability.

 

[1] UNIX Style, or cat -v Considered Harmful
[2] The IEEE Std 1003.1 (POSIX.1-2008)

CITS2002 Systems Programming, Lecture 19, p9, 7th October 2019.

 

Parsing command-line arguments with getopt, continued

Let's repeat the previous example using getopt:

#include <stdio.h>
#include <stdbool.h>
#include <unistd.h>

#include <getopt.h>

#define	OPTLIST		"d"

int main(int argc, char *argv[])
{
    int		opt;

    .....
    opterr	= 0;
    while((opt = getopt(argc, argv, OPTLIST)) != -1) {
	if(opt == 'd')
            dflag = !dflag;
        else
            argc = -1;
    }
    if(argc < 0) {
        fprintf(stderr, "Usage: %s [-d] [filename]\n", progname);  
        exit(EXIT_FAILURE);
    }
    while(optind < argc) {
        process( argv[optind] );
        ++optind;
    }
    .....
    return 0;
}

CITS2002 Systems Programming, Lecture 19, p10, 7th October 2019.

 

Parsing command-line arguments with getopt

Let's repeat the previous example, but now support an additional command switch that provides a number as well. The getopt function is informed, through the OPTLIST character string, that an argument is expected after the new -n switch.

#include <stdio.h>
#include <stdbool.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#include <getopt.h>

#define	OPTLIST		"df:n:"

int main(int argc, char *argv[])
{
    int  opt;
    bool dflag   = false;
    char *filenm = NULL;
    int  value   = DEFAULT_VALUE;

    opterr	= 0;
    while((opt = getopt(argc, argv, OPTLIST)) != -1)   {  
//  ACCEPT A BOOLEAN ARGUMENT
	if(opt == 'd') {
            dflag  =  !dflag;
        }
//  ACCEPT A STRING ARGUMENT
	else if(opt == 'f') {
            filenm  =  strdup(optarg);
        }
//  ACCEPT A INTEGER ARGUMENT
	else if(opt == 'n') {
            value  =  atoi(optarg);
        }
//  OOPS - AN UNKNOWN ARGUMENT
        else {
            argc = -1;
        }
    }

    if(argc <= 0) {    //  display program's usage/help   
        usage(1);
    }
    argc  -= optind;
    argv  += optind;
    .....
    return 0;
}

getopt sets the global pointer variable optarg to point to the actual value provided after -n - regardless of whether any spaces appear between the switch and the value.

CITS2002 Systems Programming, Lecture 19, p11, 7th October 2019.