C Programming/Low-level IO

File descriptors

edit

While not specified by the C standard, many operating systems provide the concept of a file descriptor (sometimes abbreviated as fd). While the FILE type from stdio.h and its associated functions encapsulate the low-level details of a stream, a file descriptor is an integer that refers to a stream that the operating system is keeping track of.

This section will explore file descriptors as they are implemented in POSIX systems, such as Linux.

Standard streams as file descriptors

edit

When a process is being created, the operating system allocates, among other resources, three streams for a process: the standard streams stdin, stdout, and stderr. Typically, the standard streams are interacted with using their FILE-based definitions in stdio.h, as covered in an earlier section. These streams can also be interacted with through their raw file descriptors, which are the same for each process:

unistd.h symbol stream File descriptor
STDIN_FILENO stdin 0
STDOUT_FILENO stdout 1
STDERR_FILENO stderr 2

Notice that these file descriptors are the same for every process, even though the standard streams contain different data for each process. This means that file descriptors are not necessarily unique system-wide; each process may have a different view of which file descriptors map to which streams, just like how each process has a different view of the system's virtual address space.

Basic reading and writing

edit

Reading to and writing from a file descriptor can be performed using the following functions:[1]

#include <unistd.h>
ssize_t read(int fd, void *buf, size_t count);
ssize_t write(int fd, const void *buf, size_t count);

Compare and contrast these definitions with the FILE-based functions:[2]

#include <stdio.h>
char *fgets(char *s, int size, FILE *stream);
int fputs(const char *s, FILE *stream);

Three differences are apparent:

  1. The data being read from and written to the stream are not assumed to be strings.
  2. File descriptors are taken as parameters instead of FILEs.
  3. A consistent type is used for the return value.

read and fgets take similar sets of parameters: something representing the stream, a buffer, and a size; additionally, if the amount of data read equals the requested size, the buffer will have the same contents regardless of the function used. However, these functions behave differently in the case where the amount of data read does not match the requested size. fgets, being intended for use with strings, will stop reading early if a newline is encountered, and the function may block if it is waiting for the rest of the string to appear in the stream. read, on the other hand, won't stop reading early if a special value is encountered, but it will stop if not all the requested data has been written to the pipe yet. Since read can't guarantee that something wholly usable has been written to the buffer (in the case that it stops reading early), the return value contains the number of bytes written to the buffer. This makes read more appropriate for situations where the programmer needs more control over the type of the data being read or is willing to trade receiving partially-read data for reducing the number of blocking I/O operations.

Similarly, write needs an explicit size parameter since it can't assume a NULL-terminated string is being written, and it will return the number of bytes written so the program can determine whether the passed data was fully written to the stream.

Obtaining and discarding file descriptors

edit

FILE-file descriptor conversions

edit

Security through openat

edit
  1. read(2) and write(2), Linux Programmer's Manual, 2019-10-10
  2. fgets(3) and fputs(3), Linux Programmer's Manual, 2020-08-13