Portability and the C Language


The Linux trademark is owned by Linus Torvalds.

UNIX® is a registered trademark of The Open Group

Prefaces to Editions

edit

Edition 1, 1989 (with light editing)

[Howard W. Sams, Hayden Books, ISBN 0-672-48428-5, 1989.]

Early in 1986, I was invited to teach a 3-day seminar on portability as it pertained to the C language. The seminar was to be offered in several major cities around the United States. As it happened, the series was cancelled, but by then I had put together a 70-page manuscript intended for use as handouts.

Ever since I came to the C fold, I have been fascinated by the apparent contradiction of C being both a low-level systems-implementation language yet, somehow, also being a portable one. And every time I heard someone speak or write enthusiastically about C's “inherent” portability, I became more uneasy with the observation that either I or a significant part of the C community was missing some major part of the “C picture.” As it happens, I don't think it's me although it does seem that a surprising amount of well-written C code can be ported relatively easily.

Given that I had a base portability document and an acute interest in the C phenomenon generally, and in the C Standard and portability in particular, I embarked on a formal and detailed look at C and portability. As I also make a substantial living from consulting in C and teaching introductory and advanced seminars about it added more weight to my decision to develop a serious manuscript for a 3-day portability seminar. Along the way, I decided the end result was worthy of becoming a book.

At first, I expected to produce about 200 book pages. Then it became 300 and 400, and I finally settled on 425, but only after I decided to cut quite a few appendices, purely for space reasons. As the amount and utility of the material left “on the editing room floor” is substantial, I am looking at ways to distribute that as well, perhaps through a future revision, or a companion volume. In any case, this book does not contain all my findings.

This book attempts to document C-specific issues you may encounter when porting existing code, or when writing code that is to be ported to numerous target environments. I use the term attempt because I don't believe this book provides all the answers, and in many cases, it does not even pretend to do so. For example, if you are porting from one flavor of UNIX to another, this book does not discuss any of the dark corners of that operating system. Nonetheless, I do believe it to be a credible beginning on which future works can be based. It is, as far as I can tell, the first widely published work of more than 20–30 pages that specifically addresses portability as it pertains to C. Because I do not claim to be well versed in more than 3–4 operating system and hardware environments, no doubt I have overlooked some relevant issues. Alternately, I may have overindulged in various esoteric aspects that may occur only in theory.

Whatever your interest in portability is, I hope this book provides some food for thought, even if only to help convince you that portability is not for you. If that is all this book achieves, it will have been wildly successful. If, on the other hand, it helps you define a porting strategy, or saves you going down a few wrong roads, then, too, I am happy. Whatever your opinion of this text, let me know since only by getting constructive criticism, outside input, and more personal experience can I improve it in future revisions or in a companion volume.

Anyone who has ever written a lengthy document that is to be read by more than a few people knows that after the first two or three reads, you no longer actually read what is written. You simply read what should be there. Therefore, you need technically competent reviewers who can provide constructive criticism. In this regard, the following people made significant contributions by proofing all or major parts of the manuscript: Steve Bartels, Don Bixler, Don Courtney, Dennis Deloria, John Hausman, Bryan Higgs, Gary Jeter, Tom MacDonald, and Sue Meloy. While I implemented many of their suggestions, space and time constraints prohibited me from capitalizing fully on their organizational and other suggestions. But as software vendors say, “We have to leave something to put in the next release.”

Others who have had more than a passing influence on my relatively short, but intense, C career are: P.J. Plauger, Standard C Secretary, ISO C Convener, and President of Whitesmiths Ltd, an international vendor of C and Pascal development tools; Tom Plum, Standard C Vice-Chair, Chairman of Plum Hall, and leading C author; Larry Rasler, formerly the Editor of the Draft Standard C Document, and AT&T's principal member on the Standard C Committee (now of Hewlett-Packard); and Jim Brodie, an independent consultant (formerly of Motorola) who convened the C Standards Committee in mid-1983, and has so ably chaired it to its (hopefully) successful completion in late 1988 or thereabouts. Also, to my colleagues on the Standard C X3J11 Standards Committee, I say thanks for the opportunity to work with you all—without your papers, presentations, and sometimes volatile (pun intended) discussions both in and out of committee, the quality and quantity of material in this book would have been significantly reduced, perhaps to the point of not being sufficient enough for publication.

Rex Jaeschke

Edition 2, 2022

Fast-forward 32 years, and a lot has happened in the world of C. In particular,

  • C95, C99, C11, and C17 have been produced.

  • C++ has been standardized, and that standard has been revised several times.

  • 16-bit systems are rare, and even 32-bit systems are less common. The mainstream world has moved to 64-bits.

  • C compilers that only support C prior to C89 are unlikely to be common, although code that initially went through them might still be in use.

This revision was the result of my estate planning during which I asked myself the question, “If I take no action, what will happen to my intellectual property when I die?” Presumably, it would be lost! As such, I looked around for a public venue in which to place it, where it could be read, and (hopefully responsibly) kept current.

Once I decided a revision was in order, I got quite ruthless. (I’m a great believer in Strunk and White’s advice, “Less is more!”) I removed all the material that was not related directly to portability. As a result, a great deal of the library-chapter content was cut. Back in 1988, the first C Standard was just about to debut, and there was little definitive text available about the library. As such, I included that in the first edition. However, that is no longer necessary. Also, one can purchase searchable electronic copies of the C (and C++) and related standards.

I made two important decisions regarding potential port targets:

  • To acknowledge that it is okay to want to port code, even if it is not, and never will be, Standard C-compliant!
  • To mention C++: C++ is widely used, and many programmers call C functions from C++, or put C code through a C++ compiler.

Of course, this edition will become outdated; as I write this, the C Standard’s committee is finalizing C23!

The first edition contained an annex that consisted primarily of lists of reserved identifiers in various orders. I chose to not include this annex for several reasons: A very large number of names has been added by the various standard revisions since C89, so it would have required a lot of effort to update the lists, and with C23 on the horizon, more work would be needed to revise that list yet again. In any event, the reviewers couldn’t agree on what form those lists should have to be easy to read while remaining useful.

Finally, thanks much to the reviewers of this edition: Rajan Bhakta, Jim Brodie, Doug Gwyn, David Keaton, Tom MacDonald, Robert Seacord, Fred Tydeman, and Willem Wakker.

Rex Jaeschke

Future Revision of This Document

edit

There will be reasons to want to update this document, for example, to do the following:

  • Fix typographic or factual errors

  • Expand on a topic

  • Add details of specific porting scenarios and target hardware and operating systems

  • Add incompatibilities between Standard C and Standard C++

  • Cover future editions of the C and C++ standards

  • Expand on issues relating to the headers added by C99 and later editions, especially those relating to floating-point

  • Add issues regarding the optional IEC 60559 (IEEE 754) floating-point and complex support

  • Add issues regarding the optional extended library

  • Add instances of unspecified, undefined, implementation-defined, and locale-specific behaviors not already mentioned

  • Flesh-out the “Intended Audience” section.

  • Consider making available downloadable, lists of reserved identifiers, possibly organized by header and Standard edition.

Regarding specific library functions, entries exist only for those having commentary relating to portability. If such commentary is to be added for a function that is not listed, an entry for it will have to be created first.

If you are adding to this book, please be precise and use the correct terminology, as defined by the C Standard. Only say things once, in the appropriate place, and then point to the definitive statement from other places, as necessary.

Intended Audience

edit

Reviewer Willem Wakker wrote: I have flipped through the document and I think it is (of might be) a very useful document, although I am not too sure of the intended audience. Your introduction does not mention an intended audience, and an experienced C programmer probably think that she/he does not need this information ('I already know the nitty/gritty details because I am an experienced programmer').

Portability, like security, needs to be taken into account right from the start of a project, and at that early stage there is a need for more overall considerations regarding portability than all the (though useful!) details and pitfalls described in your book. This probably means that the book need to be on the radar of the more managerial type of people in a project who then can 'force' the programmers to take the good advice into account. And for those managers the book looks far too much like a technical guide, not something that they have to be concerned with. So, maybe, some introductory paragraphs about the concept of, and the need for portability written for the non-technical manager right at the start of the book might be a useful addition.

My response: For the time being, I’m adding this section as a placeholder. However, rather than try to write the content myself, I’ve decided to leave it to readers to flesh it out as they see fit once it gets published.

Reader Assumptions and Advice

edit

This book does not attempt to teach introductory, or even advanced, C constructs. Nor is it a tutorial on Standard C. At times, some paragraphs might seem as terse as C itself. While I have attempted to soften such passages, I make no apologies for those that remain. Portability is not something a first time or trainee C programmer embarks on—quite the opposite.

The text is aimed specifically at the language-related aspects of porting C source code. However, it does not provide a recipe for successfully porting a system in any given set of target environments—it merely details many of the problems and situations you may encounter or may need to investigate. This book presumes that you are familiar with the basic constructs of the C language, such as all the operators, statements, and preprocessor directives, and that you are fluent with data and function pointers, and interfacing with the standard run-time library.

Because the C Standard, its accompanying Rationale document, and this text have the same basic organization, having a copy of each is advantageous, although not completely necessary, because the Standard can sometimes be challenging to read. However, the Rationale is much more leisurely paced and readable by the non-linguist. Note though that, having participated in the deliberations of the Standard Committee for 15 years (1984–1999), my vocabulary reflects that of the C Standard. Therefore, a copy of that document will prove especially useful.

Throughout the book, uses of “K&R” refer to the first edition (1978) of Kernighan and Ritchie's book, The C Programming Language.

References to Standard C include all editions, and are used for core facilities that have been present since the first standard, C89. For a facility added in a specific edition, that edition number is used. C90 is not so used, as that is just an ISO repackaging of the ANSI Standard C89.

The history of the standardization of C is as follows:

  • C89 – The first C standard, ANSI X3.159-1989, was produced in 1989 by the U.S. committee X3J11.
  • C90 – The first ISO C standard, ISO/IEC 9899:1990, was produced in 1990 by committee ISO/IEC JTC 1/SC 22/WG 14 in conjunction with the US committee X3J11. C90 was technically equivalent to C89.
  • C95 – An amendment to C90 was produced in 1995 by committee WG 14 in conjunction with the U.S. committee X3J11. The term C95 means “C90 plus that amendment.”
  • C99 – The second edition of the ISO C standard, ISO/IEC 9899:1999, was produced by committee WG14 in conjunction with the U.S. committee INCITS/J11 (formerly X3J11).
  • C11 – The third edition of the ISO C standard, ISO/IEC 9899:2011, was produced by committee WG14 in conjunction with the U.S. committee INCITS/PL22.11 (formerly INCITS/J11).
  • C17 – The fourth edition of the ISO C standard, published the following year as ISO/IEC 9899:2018, was produced by committee WG14 in conjunction with the U.S. committee INCITS/PL22.11. This was a maintenance release that included corrections to the standard based on Defect Reports. No new functionality was added.
  • C23 – Planned release of the fifth edition of the ISO C standard.

Some paragraphs are tagged “C++ Consideration.” C++ is widely used, and many programmers call C functions from C++, or put C code through a C++ compiler. However, C++ is not a superset of C, so it is worth understanding the incompatibilities. A common saying in the C++ standard’s community is “As close as possible to Standard C, but no closer!”

Numerous references to acronyms, abbreviations, and specialized terms are made throughout the book. Most are in common use in the C community today; however, a few that relate directly to portability are shown here (with their definitions taken verbatim from C17):

  • Unspecified behavior – Behavior that results from the use of an unspecified value, or other behavior upon which this document provides two or more possibilities and imposes no further requirements on which is chosen in any instance.

  • Undefined behavior – Behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this document imposes no requirements.

  • Implementation-defined behavior – Unspecified behavior where each implementation documents how the choice is made.

  • Locale-specific behavior – Behavior that depends on local conventions of nationality, culture, and language that each implementation documents.

The C Standard contains a more complete list of definitions and, in particular, discusses the criteria for conformance of programs and implementations.

Although this book contains many instances of the four behaviors defined above, it does not contain all of them. The complete list is contained in the “Portability issues” annex of the C Standard.

While a conforming implementation is required to document implementation-defined behavior, the term “implementation-dependent” is used in this book to refer to some characteristic of an implementation that is not required by Standard C to be documented.

C89 stated, “Certain features are obsolescent, which means that they may be considered for withdrawal in future revisions of the Standard. They are retained in the Standard because of their widespread use, but their use in new implementations (for implementation features) or new programs (for language or library features) is discouraged.” Some editions of Standard C declare certain features obsolescent by deprecating them. According to Wiktionary, to deprecate means “To declare something obsolescent; to recommend against a function, technique, command, etc. that still works but has been replaced.”

From very early in its life, the committee that standardizes C has had a charter, which it has followed (and revised over time). Several items from the original charter are worth mentioning here:

Item 2. C code can be portable. Although the C language was originally born with the UNIX operating system on the DEC PDP-11, it has since been implemented on a wide variety of computers and operating systems. It has also seen considerable use in cross-compilation of code for embedded systems to be executed in a free-standing environment. The Committee has attempted to specify the language and the library to be as widely implementable as possible, while recognizing that a system must meet certain minimum criteria to be considered a viable host or target for the language.

Item 3. C code can be non-portable. Although it strove to give programmers the opportunity to write truly portable programs, the Committee did not want to force programmers into writing portably, to preclude the use of C as a “high-level assembler;” the ability to write machine-specific code is one of the strengths of C. It is this principle which largely motivates drawing the distinction between strictly conforming program and conforming program.

Introduction

edit

Defining Portability

edit

According to The Prentice-Hall Standard Glossary of Computer Terminology by Robert A. Edmunds, portability is defined as follows: “Portability: A term related to compatibility. Portability determines the degree to which a program or other software can be moved from one computer system to another.” The key phrase here is “the degree to which a program can be moved.”

From Wikipedia, “In software engineering, porting is the process of adapting software for the purpose of achieving some form of execution in a computing environment that is different from the one that a given program (meant for such execution) was originally designed for (e.g., different CPU, operating system, or third party library). The term is also used when software/hardware is changed to make them usable in different environments. Software is portable when the cost of porting it to a new platform is significantly less than the cost of writing it from scratch. The lower the cost of porting software relative to its implementation cost, the more portable it is said to be.”

We can talk about portability from two points of view: generic and specific. Generically, portability simply means running a program in one or more environments that are somehow different from the one(s) for which it was designed. Because the cost of producing and maintaining software far outweighs that for producing hardware, we have a huge incentive to increase the shelf life of our software beyond the current incarnations of hardware. It simply makes economic sense to do so.

Specific portability involves identifying the individual target environments in which a given program must execute, and clearly stating how those environments differ. Examples of porting scenarios include:

  • Moving from one operating system to another on the same machine.

  • Moving from a version of an operating system on one machine to the same operating system version on another machine with a different architecture.

  • Moving between variants of the same operating system (such as various flavors of UNIX and Linux) on different machines.

  • Moving between two entirely different hardware and operating system environments.

  • Moving between systems using different floating-point hardware or emulation software.

  • Moving between different compilers on the same system.

  • Moving between a Standard C-compliant implementation to one that is not, and vice versa.

  • Recompiling code on the same compiler, but using different compiler options.

  • Moving from one version of a compiler to another version of the same compiler on the same system.

The last two scenarios might not be obvious. However, it is possible to encounter problems when taking existing code that compiles without error, runs, and does the job, and running it through a new version of the same compiler or simply with different compile-time options. One reason for potentially unexpected behavior is when implementation-defined behavior changes (such as the signedness of a plain char). Another might be the previous reliance on undefined behavior that just happened to do what the programmer expected (such as the order of evaluations of some expressions).

Note that it is okay to port code between systems that are not Standard C-compliant! For example, early Digital Signal Processing (DSP) chips supported only 32-bit floating-point data and operations, in which case, the types float, double, and long double (if the latter two are even supported by the compiler), are mapped to 32-bits. In such cases, meaningful applications can still be ported among members of a DSP-chip family.

Porting is not simply getting a piece of software to work on multiple targets. It also involves doing so with a reasonable (and affordable) amount of resources, in a timely manner, and in such a way that the resulting code will perform adequately. There is little point in porting a system to a target such that when the port is complete, it runs so slowly or uses so many system resources that it is rendered unusable.

Important questions to ask yourself are:

  • Am I porting to or from a Standard C implementation? If so, which standard versions are supported?

  • Am I porting code that was designed and written with portability in mind?

  • Do I know what all the environments will be up front and how many of them I will actually have available for testing?

  • What are my performance requirements regarding speed, memory, and disk efficiency?

There is another, important porting scenario, of compiling with a C++ compiler. Even if such ported code does not take advantage of C++’s features, certain extra checking will be done. For example, C++ requires C’s prototype style of function declaration and definition. And, over time, new code that does use C++’s features can be added, or the C code could be called by existing C++ functions. Note that there is not just one C++ standard; so far, we’ve had C++99, C++03, C++11, C++14, C++17, and C++20.

Portability is Not New

edit

Along with the wide availability of good and cheap C compilers and development tools in the early 1980s, the idea of software portability became popular. So much so, that, to hear some people talk, portability became possible because of C.

The notion of portability is much older than C, and that software was being successfully ported long before C became an idea in Dennis Ritchie's head. In 1959, a small group defined a standard business language called COBOL, and in 1960, two vendors (Remington Rand and RCA) implemented compilers for that language. In December of that year, they conducted an experiment where they exchanged COBOL programs, and according to Jean E. Sammet, a member of the COBOL design team, “… with only a minimum number of modifications primarily due to differences in implementation, the programs were run on both machines.” Regarding COBOL's development of a description for data that is logically machine independent, Sammet wrote in 1969, “[COBOL] does not simultaneously preserve efficiency and compatibility across machines.”

Fortran was also an early player in the portability arena. According to Wikipedia, “… the increasing popularity of FORTRAN spurred competing computer manufacturers to provide FORTRAN compilers for their machines, so that by 1963 over 40 FORTRAN compilers existed. For these reasons, FORTRAN is considered to be the first widely used cross-platform programming language.”

Given that a program was written in C provides no indication whatsoever as to the effort required to port it. The task may be trivial, difficult, impossible, or uneconomical. Given that a program was written in a language without regard to the possibility of its being ported to some different environment, the ease with which it may actually be ported to that environment probably depends as much on the discipline and idiosyncrasies of its author, as the language itself.

Designing a program to be portable over a range of environments, some of which may not yet be defined, may be difficult, but it is not impossible. It just requires considerable discipline and planning. It requires understanding and controlling (and eliminating where reasonable) the use of features which may give unacceptably different results in your expected different environments. This understanding helps you avoid knowingly (or more likely, unknowingly) counting on non-portable features or characteristics of the program you are writing. In addition, frequently one of the key aims in such a project is not to write a program that will run on any system without modification, but to isolate environment-specific functions, so they may be rewritten for new systems. The major portability considerations are much the same for any language. Only the specific implementation details are determined by the language used.

The Economics of Portability

edit

Two main requirements for being successful at porting are: having the necessary technical expertise and tools for the job, and having management support as well as approval. That said, it must be acknowledged that many projects are implemented by individuals or a small group with no management, yet portability is still desired

Clearly, one needs to have, or be able to get and keep, good C programmers. The term “good” does not imply guru status alone, or at all, because such staff can often have egos that are difficult to manage. And perhaps the most important attribute required in a successful porting project is discipline, both at the individual and at the group levels.

The issue of management support is often more important, yet it is largely ignored both by the developers and by management itself. Consider the following scenario. Adequate hardware and software is provided for all (or a representative subset) of the specified target environments and the development group religiously runs all its code through all targets at least weekly. Often, it submits a test stream in a batch job every evening.

Six months into the project, management reviews progress and finds that the project is taking more resources than anticipated (doesn't it always?) and decides to narrow the set of targets, at least on a temporary basis. That is, “We have to have something tangible to demonstrate at tradeshows because we have already announced the product” or “The venture capitalists are expecting to see a prototype at the next board meeting.” Whatever the reasons, testing on, and development specifically for, certain targets is suspended, often permanently.

From that point on, the development group must ignore the idiosyncrasies of the dropped machines because they are no longer part of the project, and the company cannot afford the extra resources to consider them seriously. Of course, management's suggestion is typically, “While we don't want you to go out of your way to support the dropped environments, it would be nice if you don't do anything to make it impossible or inefficient for us to pick them up again at some later date.”

Of course, as the project slips even further, competitors announce and/or ship alternative products, or the company falls on hard economic times, other targets may also be dropped, possibly leaving only one because that is all that development and marketing can support. And each time it drops a target, the development group starts to cut corners because it no longer has to worry about those other hardware and/or operating system targets. Ultimately, this decreases the chances of ever starting up on dropped targets at some later date as all code designed and written because support for those targets was dropped needs to be inspected (assuming, of course, that this code can even be identified) to ascertain the effort required, and the impact on resuming supporting that target. You may well find that certain design decisions that were made either prohibit or negatively impact reactivating the abandoned project(s).

The end result often is that the product is initially delivered for one target only and is never made available in any other environment. Another situation is to deliver for one target, and then go back and salvage “as much as possible” for one or more other targets. In such cases, the task may be no different from one in which you are porting code that was never designed with portability in mind.

Measuring Portability

edit

How do you know when or if a system has been successfully ported? Is it when the code compiles and links without error? Do results have to be identical? If not, what is close enough? What test cases are sufficient to demonstrate success? In all but the most trivial of cases, you will not be able to test exhaustively/completely every possible input/situation.

Certainly, the code must compile and link without error, but because of implementation-defined behavior, it may be quite possible to get different results from different targets. The legitimate results may even be sufficiently different as to render them useless. For example, floating-point range and precision may vary considerably from one target to the next such that results produced by the most limited floating-point environment are not precise enough. Of course, this is a design question and should be considered well before the system is ported.

A general misconception is that exactly the same source code files must be used on all targets such that the files are full of conditionally compiled lines. This need not be the case at all. Certainly, you might require custom headers for some targets. You might also require system-specific code written in C, and possibly in assembler or other languages. Provided such code is isolated in separate modules and the contents of and interfaces to such modules are well documented, this approach need not be a problem.

If you are using the same data files across multiple targets, you will need to ensure that the data is ported correctly, particularly if it is stored in binary rather than text format, and endian differences are involved. If you do not, you may waste considerable resources looking for non-existent code bugs.

Unless you have adequately defined what your specific portability scenario and requirements are, you cannot tell when you have achieved it. And by definition, if you achieve it, you must be satisfied. If you are not, either your requirements have changed, or your design was flawed. And most importantly, successfully porting a program to some environments is not a reliable indication of the work involved in porting it to yet another target.

Environmental Issues

edit

As pointed out in other sections, some portability issues have little or nothing to do with the implementation language. Rather, such issues are relevant to the hardware and operating system environments on which the program must execute. Some of these issues are hinted at in the main body of this book; they are summarized here, as follows:

  • Mixed-language environments. Certain requirements may be placed on C code that is to call, or be called by, some other language processor.

  • Command-line processing. Not only do different command-line processors vary widely in their behavior, but the equivalent of a command-line processor may not even exist for some of your targets.

  • Data representation. This is, of course, completely implementation-defined and may vary widely. Not only can the size of an int differ across your targets, but you are not even guaranteed that all bits allocated to an object are used to represent the value of that object. Another significant problem is the ordering of bytes within words and words within long words. Such encoding schemes are referred to as big-endian or little-endian.

  • CPU speed. It is a common practice to assume that executing an empty loop n times in a given environment causes a pause of 5 seconds, for example. However, running the same program on a faster or slower machine will invalidate this approach. (The same is true when running it on versions of the same processor having different clock speeds.) Or perhaps the timing is slightly different when more (or fewer) programs are running on the same system. Related issues include the frequency and efficiency of handling timer interrupts, both hardware and software.

  • Operating system. If even present (free-standing C does not require an operating system), the principal issues are single- versus multi-tasking and fixed- versus virtual-memory organization. Other issues involve the ability to field synchronous and asynchronous interrupts, the existence of reentrant code, and shared memory. Seemingly simple tasks such as getting the system date and time may be impossible on some systems. Certainly, the granularity of system time measurement varies widely.

  • File systems. Whether multiple versions of the same file can coexist or whether the date and time of creation or last modification are stored, is implementation-dependent. Likewise for the character set permitted in file names, length of the names, and whether or not such names are case-sensitive. And as for device and directory-naming conventions, the variations are as broad as their inventor's imaginations. Consequently, the C Standard says nothing about file systems except for sequential files being accessed by a single user.

  • Development support tools. These tools may have a significant effect on the way you write, or are required to write, code for a given system. They include the C translator, linker, object and source librarian, assembler, source-code management system, macro preprocessors, and utility libraries. Examples of restrictions include the casing, significance, and number of external identifiers, perhaps even the size of each object module or the number and size of source modules. Perhaps the overlay linker has significant restrictions on the complexity of the overlay scheme.

  • Cross-compilation. In environments where the target is not the system on which the software is being developed, differences in character sets, arithmetic representations, and endianness become important.

  • Screen and keyboard devices. The protocols used by these vary widely. While many implement some or all of various ANSI standards, just as many do not, or contain incompatible extensions. Getting a character from the standard input without echoing it, or without needing to press the return or enter key as well, might not be universally possible. The same is true for direct cursor addressing, graphics display, and input devices such as light pens, track balls, and mice.

  • Other peripheral interfaces. Your design may call for interactions with printers, plotters, scanners, and modems, among other pieces of equipment. While some de facto standards may exist for each, you may be forced, for one reason or another, to adopt “slightly” incompatible pieces.

Programmer Portability

edit

In all the discussions on portability, we continually refer to the aspect of moving code from one environment to another. And while this is an important consideration, it is more likely that C programmers will move to a different environment more often than the software they write. For this reason, the author has coined the term programmer portability.

Programmer portability can be defined as the ease with which a C programmer can move from one environment to another. This is an issue important to any C project, not just one that involves code portability. If you adopt certain programming strategies and styles, you can make it much easier and quicker to integrate new team members into the project. Note though that, while you may have formulated a powerful approach, if it is too far from the mainstream C practice, it will either be difficult and/or expensive to teach or to convince other C programmers of its merits.

The Environment

edit

When a C program is written, consider two primary environments—that in which it is compiled (that is, translated) and that in which it is executed. For the vast majority of C programs, these two environments are likely to be one and the same. However, C is used in an increasing number of situations where the execution environment has properties different from that of the translation environment.

Conceptual Models

edit

Translation Environment

edit

Translation Phases

edit

Prior to C89, C compilers varied in the way in which they recognized and processed tokens. To nail down the order in which source tokens should be processed, Standard C explicitly identifies a set of rules collectively known as the phases of translation. These rules break programs that previously relied on a different order of translation.

Recommendation: Read and understand Standard C’s phases of translation, so you can see if your implementation follows them.

Standard C does not require the preprocessor to be a separate/stand-alone program, although it permits it. For the most part, the preprocessor is permitted to work without knowing the specific properties of the target implementation. (One exception is that as Standard C requires preprocessing arithmetic expressions to be computed using a given type; see #if Arithmetic.)

Diagnostics

edit

Standard C defines the circumstances in which a conforming implementation is required to issue a diagnostic. The form of the diagnostic is implementation-defined. The Standard makes no statement about information or warning messages such as “Variable x used before being initialized” and “Unreachable code.” These are considered to be quality of implementation issues best left for the marketplace to decide.

Standard C allows extensions provided they do not render a strictly conforming program invalid. A conforming compiler must be able to disable (or diagnose) extensions. Extensions to a conforming compiler are limited to assigning meaning to syntax not given semantics by Standard C, or defining the meaning of otherwise undefined or unspecified behaviors.

Execution Environments

edit

Standard C defines two kinds of execution environments: freestanding and hosted. In both cases, program startup occurs when a designated C function is called by the execution environment.

The manner and timing of static initialization is unspecified. However, all objects in static storage must be initialized before program startup. For hosted environments, the designated C function is typically called main, although it need not be. With Standard C, function main is called at program startup. A program is not strictly conforming if an entry point other than main is used.

Recommendation: For hosted applications, always use main as the program's entry point unless you have a particularly good reason for not doing so, and you make sure you adequately document it.

Program termination is the return of control to the execution environment

Freestanding Environment

edit

A freestanding environment runs without the benefit of an operating system, and, as a result, program execution can begin in any manner desired. Although such application environments are somewhat nonportable by definition, much of their code can often be ported (to an upwards-compatible series of device controllers, for example) if designed properly. Even writers of embedded systems need to port to new and different environments.

The name and type of the function called at program startup is implementation-defined as is the method of program termination.

The library facilities (if any) available to a freestanding program are implementation-defined. However, Standard C requires the headers <float.h>, <iso646.h>, <limits.h>, <stdalign.h>, <stdarg.h>, <stdbool.h>, <stddef.h>, <stdint.h>, and <stdnoreturn.h>.

Hosted Environment

edit

Standard C permits main to have either none or two arguments, as follows:

int main(void) { /* ... */ }
int main(int argc, char *argv[]) { /* ... */ }

(Of course, argv could instead be declared char **, and the parameter names argc and argv are arbitrary.)

A common extension is that the function main receives a third argument, char *envp[], in which envp leads to a null pointer-terminated array of pointers to char, each of which points to a string that provides certain information about the environment for this execution of the process. Any program that defines more than two arguments is not strictly conforming. That is, it is not maximally portable.

Recommendation: Use the library function getenv instead of the envp parameter in main to access environment variables. Note, however, that the format of the string returned by getenv, and the set of environment variables, is implementation-defined.

Some user manuals and books erroneously suggest main be defined as having type void (or some other type) instead of int because many programs rarely, if ever, explicitly return from main (with or without a return value).

Recommendation: Always define function main as having int type and return an appropriate exit code.

Standard C requires that argc be nonnegative. Traditionally, argc is at least one even if argv[0] is set to point to an empty string.

Recommendation: Do not assume that argc is always greater than zero Standard C permits it to be zero.

Standard C requires that argv[argc] contain the null pointer. This means that the argv array contains argc + 1 elements, not argc. This allows the argv pointer array to be processed without regard to the value of argc.

Standard C makes no comment about the handling of quoted literals on command lines. Therefore, the ability to handle quoted strings at all, or those containing embedded white space, is implementation-dependent. If the host environment cannot handle command-line arguments containing both alphabetic cases, it must supply text arguments in lowercase.

Recommendation: Make no assumptions about special handling of quoted literals in command-line processing. Such quotes may delimit strings, or they may be considered part of the string in which case, "abc def" would result in two arguments "abc and def". The casing of letters might not be preserved, even in the presence of quotes. (Use tolower (or toupper) with command-line arguments before comparing them against a list of valid strings.) Even if quotes are recognized, the method of escaping a quote (so it can be passed in an argument) may vary. Standard C doesn't even require that a command-line environment exist.

A primary use of command-line arguments is to specify switches that determine the kind of processing to be done by the program being invoked. In a text-processing utility, for example, you may wish to use multi-word switches. In this case, connect the words using an underscore as follows:

textpro /left_margin=10 /page_length=55

and ignore case during switch processing. With care, you can design a command-line argument syntax that is extremely portable. Take care though that you don't need a larger command-line buffer than a system can support. If a program can potentially have many and/or long arguments, you should put them in a configuration file and pass its name as a command-line argument. For example,

textpro /command_file=commands.txt

allows an unlimited number of arguments to be processed regardless of command-line buffer size.

According to Standard C, argv[0] represents the “program name” (whatever that may translate to for a given implementation.) If this is not available, argv[0] must point to an empty string. (Some systems that cannot determine the program's name have argv[0] point to a string such as "c" or "C".)

Recommendation: Don't assume the program's name is available. Even if argv[0] points to the program's name, the name may be as it was specified on the command-line (possibly with case conversion) or it may be the translated name the operating system used to actually locate and load the program. (Full name translation is often useful if you wish to parse the string pointed to by argv[0] to determine certain disk and directory information.)

Standard C requires that the parameters argc and argv, and the strings pointed to by argv, be modifiable by the user program and that these may not be changed by the implementation while the user program is executing.

Numerous environments support the command-line operators <, >, and >>. In such systems, these characters (and the filenames that accompany them) are handled by the command-line processor (and removed) before it passes off the remaining command-line to the execution environment. Systems that do not handle these operators in such a manner pass them through to the execution environment as part of the command-line where they can be handled or passed through to the application program. Such operators are outside the scope of Standard C.

The above-mentioned operators typically allow redirection of stdin and stdout. Some systems allow stderr to be redirected. Some systems consider stderr to be the same as stdout.

Recommendation: Don't assume universal support for the command-line redirection operators <, >, and >>. Redirection of the standard files may be possible from within a program via the freopen library function.

Recommendation: Write error messages to stderr rather than stdout even if both file pointers are treated as the same. This way, you can take advantage of systems that do allow stdout and stderr to be independently redirected.

The method used to invoke main during program startup can vary. Standard C requires that it be done as if the following code were used:

exit(main(argc, argv));

in which case, any value returned from main will be passed on as the program's exit code.

Dropping through the closing brace of main results in an exit code of zero.

Some implementations may restrict exit codes to unsigned integral values or to those values that fit into a byte. Refer to the library function exit for more details. Also, although some systems interpret an exit code of 0 as success, others may not. Standard C requires that 0 mean “success.” It also provides the implementation-defined macros EXIT_SUCCESS and EXIT_FAILURE in <stdlib.h>.

Recommendation: The range of values, meaning, and format of exit codes is implementation-defined. Even though exit returns an int argument, that argument may be modified, truncated, etc., by the termination code before being handed to the host system. Use EXIT_SUCCESS rather than 0 to indicate a success exit code.

If you are using exit codes to return information from one user program to its parent user program, you are typically free to adopt your own value conventions because the host environment probably won't be processing the exit code directly.

Program Execution

edit

Standard C goes to some lengths to define an abstract machine. At certain specified points in the execution sequence called sequence points, all side-effects of previous evaluations shall be complete, and no side-effects of subsequent evaluations shall have taken place.

One particular problem has been the handling of terminal input and output where some implementations have used buffered I/O while others used unbuffered I/O.

An optimizing compiler is permitted to optimize across sequence points provided it can guarantee the same result as if it had followed them rigorously.

C11 added support for multiple threads of execution. Previously, multi-threaded programs used library functions and/or compiler extensions.

Environmental Considerations

edit

Character Sets

edit

A C program is concerned with two possible character sets: source and execution. The source character set is used to represent the source code program, and the execution character set is available at run time. Most programs execute on the same machine on which they are translated, in which case their source and execution character sets are the same. Cross-compiled programs generally run on a different machine than that used for their development, in which case the source and execution sets might be different.

The characters in the source character set, except as explicitly specified by Standard C, are implementation-defined. The characters in the execution character set (except for the '\0' character) and their values are implementation-defined. The execution character '\0' must be represented by all-bits zero.

The meaning of an unspecified character in the source text, except in a character constant, a string literal, or a comment, is implementation-defined.

While many C programs are translated and execute in an ASCII (and now Unicode) environment, other character sets are in use. As the set of upper- and/or lowercase letters may not be contiguous (such as in with EBCDIC), care must be taken when writing routines that handle multiple character sets. It is also possible when dealing with non-English letters that they do not have a corresponding upper- or lowercase equivalent. The collating sequence of character sets is also important when using the library function qsort.

Recommendation: If you write code that is specific to a particular character set, either conditionally compile it based on the host character set or document it as being an implementation-specific module. Use the ctype.h functions rather than testing characters against a specific set or range of integers.

Trigraph Sequences

edit

In certain environments, some of the required source characters are not available to programmers. This is typically because they are using a machine with a character set that does not include all the necessary punctuation characters. (It may also be because they are using a keyboard that does not have keys for all the necessary punctuation characters.)

To enable the input of characters that are not defined in the ISO 646–1983 Invariant Code Set (which is a subset of the seven-bit ASCII code set), the following trigraph sequences were introduced by C89:

Trigraph Meaning
??= #
??( [
??/ \
??) ]
??' ^
??< {
??! |
??> }
??- ~

A trigraph is a token consisting of three characters, the first two of which are ??. The three characters collectively are taken to represent the corresponding character in the table above.

The addition of support for trigraphs in a compiler may change the way existing character constants or string literals are interpreted. For example,

printf("??(error at line %d)\n", msgno);

will be treated as if it had been written as

printf("[error at line %d)\n", msgno);

and sizeof("??(") will be two, not four.

If such literal strings are intended to be displayed, then the impact of moving to a system supporting trigraphs from one that doesn't will be minimal and overt—the user will see a slightly different output. However, if the program parses a string expecting to find a specific character, such as ?, it will no longer find it if it has been previously interpreted as part of a trigraph sequence.

Even though the vast majority of C programmers likely will have no use for trigraphs, a conforming implementation is required to support them. Therefore, you need to be aware of their existence so you can understand why seemingly innocent strings are being “misinterpreted.”

Recommendation: Use a search program to check if sequences of ?? occur in existing source. If they do occur in more than a few places, you may wish to search specifically for the trigraph sequences.

Recommendation: To preserve sequences that look like trigraphs but are not intended to be, use the Standard C escape sequence \? to force a literal ? character in a literal string or single-character constant. For example, sizeof("\??(") is four as is sizeof("\?\?(").

Recommendation: If your implementation doesn't support trigraphs, you can protect against them in the future by using the \? sequence now because the backslash is supposed to be ignored if it does not begin a recognized escape sequence.

While some compilers recognize trigraphs, other implementations require the use of a standalone tool to convert code containing trigraphs to code without them.

C95 added digraphs as a mechanism to allow sometimes-unavailable source tokens to have alternate spellings (see Source Tokens). Unlike trigraphs, digraphs are tokens, so they can’t be recognized inside another token, such as a character constant or string literal.

Multibyte characters

edit

C89 introduced the notion of a multibyte character. Certain aspects of the handling of such characters are locale specific. Prior to that, some implementations used double-byte and other approaches to dealing with extended characters.

Character Display Semantics

edit

The handling of certain escape sequences in Standard C involves locale specific or unspecified behavior.

C89 defined the escape sequences \a and \v.

Some systems treat \n as a carriage-return and a new-line, while others treat it as just a new-line.

Signals and Interrupts

edit

Standard C places certain restrictions on the kinds of objects that can be modified by signal handlers. With the exception of the signal function, the Standard C library functions are not guaranteed to be reentrant and they are permitted to modify static data objects.

Environmental Limits

edit

There are a number of environmental constraints on a conforming implementation, as discussed below.

Translation Limits

edit

As of C17, Standard C requires that “The implementation shall be able to translate and execute at least one program that contains at least one instance of every one of the following limits:

  • 127 nesting levels of blocks

  • 63 nesting levels of conditional inclusion

  • 12 pointer, array, and function declarators (in any combinations) modifying an arithmetic, structure, union, or void type in a declaration

  • 63 nesting levels of parenthesized declarators within a full declarator

  • 63 nesting levels of parenthesized expressions within a full expression

  • 63 significant initial characters in an internal identifier or a macro name(each universal character name or extended source character is considered a single character)

  • 31 significant initial characters in an external identifier (each universal character name specifying a short identifier of 0000FFFF or less is considered 6 characters, each universal character name specifying a short identifier of 00010000 or more is considered 10 characters, and each extended source character is considered the same number of characters as the corresponding universal character name, if any)

  • 4095 external identifiers in one translation unit

  • 511 identifiers with block scope declared in one block

  • 4095 macro identifiers simultaneously defined in one preprocessing translation unit

  • 127 parameters in one function definition

  • 127 arguments in one function call

  • 127 parameters in one macro definition

  • 127 arguments in one macro invocation

  • 4095 characters in a logical source line

  • 4095 characters in a string literal (after concatenation)

  • 65535 bytes in an object (in a hosted environment only)

  • 15 nesting levels for #included files

  • 1023 case labels for a switch statement (excluding those for any nested switch statements)

  • 1023 members in a single structure or union

  • 1023 enumeration constants in a single enumeration

  • 63 levels of nested structure or union definitions in a single struct-declaration-list”

These numbers are somewhat misleading. In effect, Standard C does not guarantee any specific support for all combinations of limits.

Numerical Limits

edit

A conforming implementation must document these limits via a series of macros defined in the headers <limits.h> and <float.h>. Additional limits are specified in <stdint.h>, which was added by C99.

Starting with C99, the existence of the optional predefined macro __STDC_IEC_559__ indicates support for the IEC 60559 floating-point standard, as described in an annex of the C Standard.

Starting with C99, the absence of the optional predefined macro __STDC_NO_COMPLEX__ indicates support for complex types and their associated arithmetic. Furthermore, the existence of the optional predefined macro __STDC_IEC_559_COMPLEX__ indicates that complex support conforms to IEC 60559, as described in an annex of the C Standard.

See also <complex.h> and <fenv.h>.

Lexical Elements

edit

Source Tokens

edit

Standard C requires that when source input is parsed into tokens, the longest possible valid token sequence must be formed. There must be no ambiguity as to what a particular construct means. For example, the text a+++++b must generate a syntax error because the tokens found are a, ++, ++, +, and b, and the (postfix) second ++ operator has an operand that is not an lvalue. Note that a++ + ++b is valid, as the white space causes the tokens to be parsed as a, ++, +, ++, and b. Likewise, for a+++ ++b.

Archaic: Prior to C89, some preprocessors allowed tokens to be created from other tokens. For example:

#define PASTE(a,b) a/**/b

PASTE(total, cost))

The intent here is that the macro expands to the single token totalcost rather than the two tokens total and cost. It relies on the non-Standard C approach of replacing the comment in the macro definition with nothing, rather than a single space. Standard C added the preprocessor token-pasting operator, ##, as a portable solution for achieving the desired behavior.

Prior to C89, some preprocessors allowed string literal tokens to be created during preprocessing. Standard C added the preprocessor stringize operator, #, as a portable solution for achieving the desired behavior.

Recommendation: Avoid exploiting idiosyncrasies of preprocessors that follow tokenizing rules different from those defined by Standard C.

Keywords

edit

The following tokens are defined as keywords by Standard C:

auto break case char constC89
continue default do double else
enum extern float for goto
if inlineC99 int long register
restrictC99 return short signedC89 sizeof
static struct switch typedef union
unsigned void volatileC89 while _AlignasC11
_AlignofC11 _AtomicC11 _BoolC99 _ComplexC99 _GenericC11
_ImaginaryC99 _NoreturnC11 _Static_assertC11 _Thread_localC11

Archaic: Although enum and void were not defined in K&R, they were supported by various compilers prior to C89.

Standard C does not define or reserve the keyword entry previously reserved in K&R and by some older C compilers.

C++ Consideration: Standard C++ does not define or reserve the keyword restrict. Nor does it define those keywords beginning with underscore and an uppercase letter. (However, for some, it provides alternate spellings, such as alignas, alignof, bool, and thread_local.)

Many compilers support extended keywords, some starting with one or two underscores, or with names in the programmer identifier space.

Identifiers

edit

Spelling

edit

K&R and C89 allowed underscores, English upper- and lowercase letters, and decimal digits.

The set of external names allowed by an (older) environment might not include underscore and might not be case-sensitive, in which case, some characters in external names might be mapped to something else.

C99 added the predefined identifier __func__. C99 also added support for Universal character names in identifiers (see Universal character names), as well as any number of implementation-defined extended characters.

C++ Consideration: The following tokens are defined as keywords by Standard C++:

alignas alignof and and_eq asm
bitand bitor bool catch char8_t
char16_t char32_t class compl concept
consteval constexpr constinit const_cast co_await
co_return co_yield decltype delete dynamic_cast
explicit export false friend mutable
namespace new noexcept not not_eq
nullptr operator or or_eq private
protected public reinterpret_cast requires static_assert
static_cast template this thread_local throw
true try typeid typename using
virtual wchar_t xor xor_eq

Some of these names are defined as macros in Standard C (such as alignas in <stdalign.h>). These are discussed elsewhere.

C++ Consideration: Standard C++ gives special meaning to the following identifiers: final, import, module, and override.

Recommendation: If there is a possibility that your C code will be run through a C++ compiler, avoid using identifiers that Standard C++ defines as keywords or identifiers with special meaning.

C++ Consideration: According to Standard C++: “Each identifier that contains a double underscore __ or begins with an underscore followed by an uppercase letter is reserved to the implementation for any use” and “Each identifier that begins with an underscore is reserved to the implementation for use as a name in the global namespace.”

Length and Significant-Character Limits

edit

While Standard C places no maximum length on an identifier, the number of characters treated as significant, might be limited. Specifically, the length limit for an external name may be more restrictive than that for an internal name (typically due to linker considerations). The number of significant characters in an identifier is implementation-defined. Standard C requires implementations to distinguish at least the first 31 characters in an external identifier, and the first 63 in an internal identifier.

Name Spaces

edit

K&R defined two disjoint categories of identifiers: those associated with ordinary variables, and structure and union members and tags.

Standard C added several new categories of identifier name space. The complete set is labels; structure, union, and enumeration tags; the members of structures and unions (with each structure and union having its own name space); and all other identifiers, called ordinary identifiers.

The identifiers optionally allowed in Standard C function prototypes have their own name space. Their scope is from their name through the end of that prototype declaration. Therefore, the same identifier can be used in different prototypes, but it cannot be used twice in the same prototype.

K&R contained the statement, “Two structures may share a common initial sequence of members; that is, the same member may appear in two different structures if it has the same type in both and if all previous members are the same in both. (Actually, the compiler checks only that a name in two different structures has the same type and offset in both, but if preceding members differ the construction is nonportable.)” Standard C eliminated this restriction by endorsing the separate member-per-structure name space.

Universal Character Names

edit

C99 added support for universal character names. They have the form \uXXXX and \UXXXXXXXX, where X is a hexadecimal digit. They may appear in identifiers, character constants, and string literals.

Constants

edit

Standard C requires that an implementation use at least as much precision as is available in the target execution environment when handling constant expressions. It may use more precision.

Integer Constants

edit

C89 provided the suffix U (and u) to support unsigned constants. These suffixes can be used with decimal, octal, and hexadecimal constants. long int constants may be suffixed with L (or l).

C99 added the type long long int, and constants of that type may be suffixed with ll (or LL). C99 also added the type unsigned long long int, and constants of that type may be suffixed with ull (or ULL).

K&R permitted octal constants to contain the digits 8 and 9 (which have octal value 10 and 11, respectively). Standard C does not allow these digits in octal constants.

The type of an integral constant depends on its magnitude, its radix, and the presence of optional suffix characters. This can cause problems. For example, consider a machine on which an int is 16 bits, and twos-complement representation is used. The smallest int value is -32768. However, the type of the expression -32768 is long, not int! There is no such thing as a negative integer constant; instead, we have two tokens: the integer constant 32768 and the unary minus operator. As 32768 is too big to fit in 16 bits, it has type long, and the value is negated. As such, having the function call f(-32768) without a function prototype in scope might cause an argument/parameter mismatch. If you look at the definitions of INT_MIN in <limits.h> for an implementation on such a machine, you likely will find something like the following:

#define INT_MIN (-32767 - 1)

This satisfies the requirement that that macro have the type int.

Regarding radix, on this 16-bit machine, 0xFFFF has type unsigned int while -32768 has type long.

A similar situation occurs with the smallest value for a 32-bit, twos-complement integer, -2147483648, which might have type long or long long instead of int, depending on type mapping.

Recommendation: Explicitly type integral constants (or cast them) when their type is important (e.g., as function call arguments and with the sizeof operator).

A similar problem exists when passing a zero constant to a function expecting a pointer—intending this to mean “null pointer”—but no function prototype is in scope, as in f(0). The type of zero is int, whose size/format might not match the parameter’s pointer type. Besides, for machines having pointers that do not look like integers, no implicit conversion is done to compensate for this.

The correct thing to do is to use the NULL library macro, which is most often defined using one of the following:

#define NULL 0
#define NULL 0L
#define NULL ((void *)0)

Archaic: Prior to C89, different compilers used different rules for typing integer constants. K&R required the following: “A decimal constant whose value exceeds the largest signed machine integer is taken to be long; an octal or hexadecimal constant which exceeds the largest unsigned machine integer is likewise taken to be long.”

Standard C requires the following rules for typing integer constants: “The type of an integer constant is the first of the corresponding list in which its value can be represented. Unsuffixed decimal: int, long int, unsigned long int; unsuffixed octal or hexadecimal: int, unsigned int, long int, unsigned long int; suffixed by the letter U (or u): unsigned int, unsigned long int; suffixed by the letter L (or l): long int, unsigned long int; suffixed by both U (or u) and L (or l): unsigned long int.” C99 added steps for long long and unsigned long long.

Some compilers support integer constants expressed in binary (base-2, that is); others allow separators (such as underscore) between digits for all bases. These features are not part of Standard C.

Integer constants beginning with 0 are considered to be octal. The #line preprocessing directive has the form

# line digit-sequence new-line

Note carefully that the syntax does not involve integer-constant. Instead, digit-sequence is interpreted as a decimal integer even if it has one or more leading zeros!

Floating Constants

edit

The default type of a floating-point constant is double. C89 added support for the type long double along with the floating constant suffix F (or f) for float constants, and L (or l), for long double constants.

Recommendation: Explicitly type floating-point constants (or cast them) when their type is important (e.g., as function call arguments and with the sizeof operator).

C99 added support for writing floating constants using hexadecimal notation.

C99 also added the macro FLT_EVAL_METHOD (in <float.h>), whose value might allow a floating constant to be evaluated to a format whose range and precision is greater than required. For example, a compiler has the freedom to (quietly) treat 3.14f as 3.14 or even 3.14L, instead.

Enumeration Constants

edit

The names of the values defined for an enumeration are integer constants, and Standard C defines them to be ints.

K&R did not include enumerations.

C++ Consideration: An enumeration constant has the type of its parent enumeration, which is some integral type that can represent all the enumeration constant values defined in the enumeration.

Character Constants

edit

The mapping of characters in the source character set to characters in the execution character set is implementation-defined.

The value of a character constant that contains a character or escape sequence not represented in the execution character set is implementation-defined.

The meaning of an unspecified escape sequence (except for a backslash followed by a lowercase letter) in a character constant or string literal is implementation-defined. Note that unspecified sequences with a lowercase letter are reserved for future use by Standard C. This means that a conforming-implementation is quite free to provide semantics for '\E' (for the ASCII Escape character, for example) but it should not do so for '\e'.

Recommendation: Avoid using non-standard escape sequences in character constants.

The value of a character constant that contains more than one character is implementation-defined. On a 32-bit machine, it may be possible to pack four characters into a word using int i = 'abcd';. On a 16-bit machine, something like int i = 'ab'; might be permitted.

Recommendation: Avoid using multi-character constants, as their internal representation is implementation-defined.

Standard C supports the earlier popular extension of a hexadecimal-form character constant. This commonly has the form '\xh' or '\xhh' where h is a hexadecimal digit.

K&R declared that if the character following the backslash is not one of those specified, the backslash is ignored. Standard C says that the behavior is undefined.

Unlike with some older implementations, as Standard C does not permit the digits 8 and 9 in octal constants, previously supported characters such as '\078' take on new meaning.

To avoid confusion with trigraphs (which have the form ??x), the character constant '\?' was defined by C89. An existing constant of the form '\?' will now have different meaning.

Recommendation: Because of differing character sets, use graphic representations of a character instead of its internal representation. For example, use 'A' instead of '\101' in ASCII environments.

Some implementations may allow '' to represent the null character—Standard C does not.

K&R did not define the constant '\"', although it clearly is necessary inside of literal strings. In Standard C, the characters '"' and '\"' are equivalent.

C89 added the notion of a wide character constant, which is written just like a character constant, but with a leading L.

C99 added support for Universal character names in character constants. See Universal character names.

C11 added support for wide character constants with prefix u (or U).

Standard C requires that an integer character constant have type int.

C++ Consideration: Standard C++ requires that an integer character constant have type char.

String Literals

edit

Standard C permits string literals having the same spelling to be shared, but does not require that.

On some systems, string literals are stored in read-write memory, on others, in read-only memory. Standard C states that if a program attempts to modify a string literal, the behavior is undefined.

Recommendation: Even if your implementation allows it, do not modify literal strings, because this is counterintuitive to the programmer. Also, do not rely on like strings being shared. If you have code that modifies string literals, change it to a character array initialized to that string and then modify that array. Not only does this not require literals to be modified, but it also allows you to share like strings explicitly by using the same array elsewhere.

Recommendation: It is common to write something like the following: char *pMessage = "some text";. Assuming you are using a C89-or-later compiler, instead, declare the pointer using const char *, so any attempt to modify the underlying string will be diagnosed.

C++ Consideration: Standard C++ requires that a string literal be implicitly const-qualified, which means that the following often-used C-idiom is not valid C++:

char *message = "…";

This must be written instead as

const char *message = "…";

The maximum length of a literal string is implementation-defined, but Standard C requires it to be at least 509 characters.

Unlike with some older implementations, as Standard C does not permit the digits 8 and 9 in octal constants, previously supported string literals such as "\078" take on new meaning.

K&R and Standard C permit a string literal to be continued across multiple source lines using the backslash/new-line convention as follows:

static char text[] = "a string \
of text";

However, this requires that the continuation line begin exactly in the first column. An alternate approach is to use the string concatenation capability provided by C89 (and by some compilers prior to that), as follows:

static char text[] = "a string "
"of text";

C89 added the notion of a wide string literal, which is written just like a string literal, but with a leading L (e.g., L"abc").

C99 added support for Universal character names in string literals. See Universal character names.

C11 added support for wide character string literals with prefix u (or U), and for UTF–8 string literals via the prefix u8.

Punctuators

edit

Archaic: Prior to K&R, the compound-assignment operators were written as =op . However, K&R and Standard C write them as op= instead. For example, s =* 10 became s *= 10.

C89 added the ellipsis punctuator, ..., as part of the enhanced notation for function declarations and definitions. It also added the punctuators # and ##, which represent preprocessor-only operators.

C95 added the following digraph punctuators: <:, :>, <%, %>, %:, and %:%:.

Header Names

edit

Standard C defines a grammar for header names. If the characters ', \, ", or /* occur in an #include directive of the form <…>, the behavior is undefined. The same is true for ', \, and /* when using an #include directive of the form "…".

In Standard C, when using the #include "…" form, the text "…" is not considered to be a string literal. In an environment using a hierarchical file system where one needs to use a \ to indicate a different folder/directory level, this backslash is not the beginning of an escape sequence, so should not itself need to be escaped.

Comments

edit

C99 added support for line-oriented comments, which begin with //. Prior to C99, some implementations supported this as an extension.

Neither K&R nor Standard C support nested comments, although a number of existing implementations do. The need for nested comments is primarily to allow a block of code containing comments to be disabled as follows:

/*
int i = 10; /* ... */
*/

The same affect can be achieved by using

#if 0
int i = 10; /* ... */
#endif

Standard C requires that during tokenization, a comment be replaced by one space. Some implementations replace them with nothing and, therefore, allow some clever token pasting. See Source Tokens for an example.

Conversions

edit

Arithmetic Operands

edit

Boolean, Characters, and Integers

edit

Whether a plain char is treated as signed or unsigned is implementation-defined.

At the time C89 was being developed, two different sets of arithmetic conversion rules were currently in use: unsigned preserving (UP) and value preserving (VP). With UP, if two smaller unsigned types (e.g., unsigned char or unsigned short) are present in an expression, they are widened to unsigned int. That is, the widened value is also unsigned. The VP approach widens such values to signed int (provided they will fit), else it widens them to unsigned int.

While the same result arises from both approaches almost all of the time, there can be a problem in the following situation. Here, we have a binary operator with one operand of type unsigned short (or unsigned char) and an operand of int (or some narrower type). Consider that the program is running on a 16-bit twos-complement machine.

#include <stdio.h>

int main()
{
    unsigned char uc = 10;
    int i = 32767;
    int j;
    unsigned int uj;

    j = uc + i;
    uj = uc + i;

    printf("j = %d (%x), uj = %u (%x)\n", j, j, uj, uj);
    printf("expr shifted right = %x\n", (uc + i) >> 4);
    return 0;
}

With UP rules, uc will be promoted to unsigned int as will i with the result of uc + i being an unsigned int. With VP rules, uc will be promoted to int, the type of i, and the two will be added to produce a result of type int. This in itself is not a problem, but if (uc + i) were used as the object of a right-shift (as shown), or as an operand to /, %, <, <=, >, or >=, different results are possible. For example:

UP rules produce:

j = -32759 (8009), uj = 32777 (8009)
expr shifted right = 800

VP rules produce:

j = -32759 (8009), uj = 32777 (8009)
expr shifted right = f800

UP rules cause zero bits to replace high-order bits if the expression has unsigned type, whereas the result is implementation-defined if the object is signed (due to arithmetic versus logical shift possibilities). In the second output example above, VP produced sign-bit propagation during the shift producing a quite different result.

Note that the above example only causes concern for certain values of uc and i, not in all cases. For example, if uc were 10 and i were 30,000, the output would be:

UP rules produce:

j = 30010 (753a), uj = 30010 (753a)
expr shifted right = 753

VP rules produce:

j = 30010 (753a), uj = 30010 (753a)
expr shifted right = 753

In this case, the high bit (sign bit) of (uc + i) is not set, so both UP and VP produce the same result.

Casts can be used in such mixed-mode arithmetic to ensure that the desired results are achieved regardless of the rule used. For example,

#include <stdio.h>

int main()
{
    unsigned char uc = 10;
    int i = 32767;
    int expr1, expr2, expr3;

    expr1 = ((int) uc + i) >> 4;
    expr2 = (uc + (unsigned) i) >> 4;
    expr3 = (uc + i) >> 4;

    printf("expr1 = %x\n", expr1);
    printf("expr2 = %x\n", expr2);
    printf("expr3 = %x\n", expr3);
    return 0;
}

UP rules produce:

expr1 = f800
expr2 = 800
expr3 = 800

VP rules produce:

expr1 = f800
expr2 = 800
expr3 = f800

As demonstrated, the results of the two expressions containing the explicit casts are the same, even though the results are different without them.

Although Standard C uses VP rules, some widely used compilers prior to C89 used UP rules. Code that relies on UP rules may now give a different result. Specifically, a char, a short or an int bit field (all of them signed or unsigned) or an enumerated type may be used wherever an int may be used. If an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int.

Note that with Standard C, the “normal” integral widening rules also apply to bit fields, and that bit fields can be signed as well as unsigned.

C99 added the type _Bool. C99 also allowed the addition of extended integer types.

Floating and Integer

edit

Floating Types

edit

Standard C states, “When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.”

Standard C states, “When a value of integer type is converted to a real floating type, if the value being converted can be represented exactly in the new type, it is unchanged. If the value being converted is in the range of values that can be represented but cannot be represented exactly, the result is either the nearest higher or nearest lower representable value, chosen in an implementation-defined manner. If the value being converted is outside the range of values that can be represented, the behavior is undefined.”

Standard C requires that when a double is truncated to a float, or a long double is truncated to a double or float, if the value being converted cannot be represented, the behavior is undefined. If the value is in the range, but cannot be represented exactly, the truncated result is one of the two nearest representable values—it is implementation-defined as to which one of the two is chosen.

Note that by using function prototypes an implementation may allow a float to be passed by value to a function without its first being widened to a double. However, even though such narrow-type preservation is permitted by Standard C, it is not required.

Complex Types

edit

C99 added the type _Complex and its corresponding conversion rules, and the header <complex.h>.

Usual Arithmetic Conversions

edit

These were changed by Standard C to accommodate the VP rules described in Boolean, Characters, and Integers. Expressions may also be evaluated in a “wider” mode than is actually necessary, to permit more efficient use of the hardware. Expressions may also be evaluated in a “narrower” type, provided they give the same result as if they were done in the “wide” mode.

If operands of a binary operator have different arithmetic types, this results in the promotion of one or both operands. The conversion rules defined by Standard C are similar to those defined by K&R except that the VP rules are accommodated, some new types have been added, and narrow-type arithmetic is allowed without widening.

Other Operands

edit

Pointers

edit

C89 introduced the concept of a pointer to void, written as void *. Such a pointer can be converted to a pointer to an object of any type without using a cast. An object pointer can be converted to a pointer to void and back again without loss of information.

C++ Consideration: Standard C++ requires a cast when assigning a void pointer to a pointer to an object type.

Object pointers need not all have the same size. For example, a pointer to char need not be the same size as a pointer to int.

While a pointer to an object of one type can be converted to a pointer to a different object type, if the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined.

While an int and data pointers often occupy the same size storage, the two types are quite distinct, and nothing portable can be said about interchanging the two except that zero may be assigned or compared to a pointer. Note that this null pointer concept does not require the null pointer value to be “all-bits-zero,” although it may be implemented as such. All that Standard C requires is that (void *) 0 represent an address that will never equal the address of an object or function. In the expression p == 0, zero is promoted to the type of p before being compared with it.

Conversion of an integer to any pointer type results in implementation-defined behavior. Likewise, for conversion in the other direction except that if the result cannot be represented in the integer type, the behavior is undefined.

A function pointer is quite distinct from a data pointer, and no assumptions should be made about the relative sizes of the two. The format and size of a function pointer may be quite different from that of a data pointer.

Standard C requires an explicit cast when a pointer to a function returning one type is assigned to a pointer to a function returning a different type. Standard C is even more restrictive when copying function pointers because of function prototypes. Now, the attributes of a pointer to a function not only involve that function's return type but also its argument list. While a pointer to a function of one type may be converted to a pointer to a function of another type and back again, if a converted pointer is used to call a function whose type is not compatible with the referenced type, the behavior is undefined.

Expressions

edit

Order of Evaluation and Sequence Points

edit

According to Standard C, the order in which expressions are evaluated is unspecified except for the function call operator (), the logical OR operator ||, the logical AND operator &&, the comma operator, and the conditional operator ?:. While the precedence table defines operator precedence and associativity, these can be overridden by grouping parentheses. However, according to K&R, the commutative and associative binary operators (*, +, &, |, and ^) may be arbitrarily rearranged even if grouping parentheses are present. (Note that for &, |, and ^, the ordering is unimportant because the same result is always obtained.) However, Standard C requires grouping parentheses to be honored in all expressions.

With K&R (but not Standard C) rules, even though you may write the following:

i = a + (b + c);

the expression may be evaluated as

i = (a + b) + c;

or even

i = (a + c) + b;

This can cause overflow on intermediate values if the expression is evaluated one way versus another. To force a specific order of evaluation, break up the expression into multiple statements and use intermediate variables, as follows:

i = (b + c);
i += a;

These examples cause a problem only in “boundary” conditions for integer types and even then, only on some machines. For example, integer arithmetic on a twos-complement machine is usually “well behaved.” (However, some machines raise an interrupt when integer overflow occurs and presumably, this should be avoided.)

Recommendation: If you are concerned about the order of evaluation of expressions that associate and commute, break them into separate expressions such that you can control the order. Find out the properties of integer arithmetic overflow for your target systems and see if they affect such expressions.

The potential for overflow and loss of precision errors is much higher with floating-point operands where it is impossible to represent accurately certain real numbers in a finite space. Some mathematical laws that do not always hold true using finite representation are:

(x + y) + z == x + (y + z)
(x * y) * z == x * (y * z)
(x / y) * y == x /* for non-zero y */
(x + y) - y == x

When expressions involve side effects, the order of evaluation may be important. For example,

void test()
{
    int i, f(void), g(void);

    i = f() + g();
}

Here, f() may be evaluated before or after g(). While the value of i might be the same in either case, if f() and g() produce side effects, this may not be true.

The order in which side effects take place is unspecified. For example, the following are expressions with unpredictable outcomes:

j = (i + 1)/i++
dest[i] = source[i++]
dest[i++] = source[i]
i & i++
i++ | i
i * --i

In each line above, as to which expression containing i is evaluated first is unspecified.

Recommendation: Even if you can determine how your compiler evaluates expressions that contain side effects, don't rely on this being true for future releases of the same product. It may even vary for the same compiler given different circumstances. For example, by changing the source code in other, possibly unrelated, ways, you may change the optimizer's view of the world such that it generates different code for the same expression. The compiler writer is under no obligation whatsoever to support predictable behavior because the behavior is allowed to be undefined.

C89 introduced the notion of full expressions and sequence points, as follows: “A full expression is an expression that is not part of another expression, nor part of a declarator or abstract declarator. There is also an implicit full expression in which the non-constant size expressions for a variably modified type are evaluated; within that full expression, the evaluation of different size expressions are unsequenced with respect to one another. There is a sequence point between the evaluation of a full expression and the evaluation of the next full expression to be evaluated.”

Recommendation: Make sure you can identify all the sequence points in your code.

The results of bitwise operations (using ~, <<, >>, &, ^, and |) on signed types are inherently implementation-defined.

Recommendation: As the outcome of bitwise operations depends on the representation of integral types you should determine the nature of shift and bit-masking operations, particularly for signed types.

The properties of floating-point arithmetic are implementation-defined. Bear in mind, too, that there may be differences between results obtained with software emulation and hardware execution. Also, a machine may have several different floating-point formats, any one of which might be able to be selected via a compile-time switch.

Recommendation: When using floating-point data types in expressions, identify the size, range, and representation of each such type. Also, determine if there are differences between floating-point emulation in software and the results produced by floating-point hardware. See if you can determine whether floating-point hardware is available at run-time.

Regarding floating-point expression evaluation, C99 added the following: “A floating expression may be contracted, that is, evaluated as though it were a single operation, thereby omitting rounding errors implied by the source code and the expression evaluation method. The FP_CONTRACT pragma in <math.h> provides a way to disallow contracted expressions. Otherwise, whether and how expressions are contracted is implementation-defined.”

If an arithmetic operation is invalid (e.g., division by zero) or produces a result that cannot be represented in the space provided (e.g., overflow or underflow), the result is undefined.

Primary Expressions

edit

A parenthesized expression is a primary expression. C89 required support for at least 32 nesting levels of parenthesized expressions within a full expression. C99 increased that to 63.

A generic selection operation is a primary expression. This operator was introduced by C11, and involves the keyword _Generic.

Postfix Operators

edit

Array Subscripting

edit

The format of an array reference is a[b] where a and b are expressions. One of these expressions must have type pointer to some type (other than void), while the other expression must be of integral type. Neither K&R nor Standard C require a to be the pointer expression and b to be the integer expression, even though that is almost always the way a subscript expression is written. Specifically, a[b] can also be written as b[a], which may be surprising to many people, including C veterans.

C does not require that the integral expression in a subscript have an unsigned value—it may be signed. For example,

#include <stdio.h>

int main()
{
    int i[] = {0, 1, 2, 3, 4};
    int *pi = &i[2];
    int j;

    for (j = -2; j <= 2; ++j)
    {
        printf("x[%2d] = %d\n", j, pi[j]);
    }
    return 0;
}

x[-2] = 0
x[-1] = 1
x[ 0] = 2
x[ 1] = 3
x[ 2] = 4

Recommendation: For any given object A defined to be an array, never subscript A with a value other than 0 through n-1, where n is the maximum number of elements defined to be in A.

Recommendation: It is OK to use a negative subscript with a pointer expression provided that the expression maps into a predictable place.

The following example demonstrates the technique of having arrays begin at any arbitrary subscript. (Note though that this technique is not supported by Standard C and might not work on some implementations—those running on segmented memory architectures may cause it to fail because not all pointer arithmetic behaves in a “wraparound” manner.)

#include <stdio.h>

int main()
{
    int k[] = {1, 2, 3, 4, 5};
    int *p4 = &k[-1];
    int *yr = &k[-1983];

    printf("array p4 = %d %d %d %d %d\n",
        p4[1], p4[2], p4[3], p4[4], p4[5]);
    printf("array yr = %d %d %d %d %d\n",
        yr[1983], yr[1984], yr[1985], yr[1986], yr[1987]);
    return 0;
}

array p4 = 1 2 3 4 5
array yr = 1 2 3 4 5

By making p4 point to &k[-1], p4 has subscripts 1 to 5. It is irrelevant that no space has been allocated for the element k[-1] because we never try to access that element. All we have done is invent a pointer expression that points to the location where k[-1] would be if it existed. Then when we have an expression p4[1], which equals *(p4 + 1) or *(&k[-1] + 1), it gives *(&*(k - 1) + 1), *(k - 1 + 1), and finally *k, which is the same as k[0]. That is, p4[1] and k[0] are interchangeable, and p4[1] through p4[5] map into the array k.

The use of the pointer yr takes the same idea further and allows yr to be used like an array with subscripts ranging from 1983 to 1987. The same idea would allow an array to have subscripts -1004 to -1000, simply by initializing a pointer to &k[1004].

This works on some “well-behaved” machines having a linear address space. Here, address arithmetic is unsigned so that subtracting 10 from address 6 gives, not -4, but a large unsigned address. That is, the address arithmetic “wraps around” at both the high and the low end. While this may not be the case on every conceivable machine, it certainly works on many common ones.

Standard C says that if the result of an arithmetic operation on a pointer points inside an array or to the (non-existent) element one beyond the end, it’s OK. Otherwise, the behavior is undefined; that is, p - i + i need not result in p!

It is possible to portably calculate the size of each dimension of an array by knowing only the number of dimensions.

#include <stdio.h>

int main()
{
    int i[2][3][4];
    unsigned int dim1, dim2, dim3;

    dim3 = sizeof(i[0][0])/sizeof(i[0][0][0]);
    printf("dim3 = %u\n", dim3);

    dim2 = sizeof(i[0])/(dim3 * sizeof(i[0][0][0]));
    printf("dim2 = %u\n", dim2);

    dim1 = sizeof(i)/(dim2 * dim3 * sizeof(i[0][0][0]));
    printf("dim1 = %u\n", dim1);
    return 0;
}

dim3 = 4
dim2 = 3
dim1 = 2

i[0][0] is an array of four elements, so sizeof(i[0][0]) divided by sizeof(i[0][0][0]) is 4. Note that the type of i[0][0] is not int *, it is int (*p)[4]. That is, p is a pointer to an array of four ints and sizeof(*p) is 4 * sizeof(int).

Similarly, i[0] is an array of three elements each of which is an array of four ints. And finally, i is an array of two elements, each of which is an array of three elements, each of which is an array of four ints.

Function Calls

edit

C99 required that a function declaration be in scope for each call to that function.

If a function call has no function prototype declarator in scope, and the number of arguments, or their types, after the default conversions do not match those of the formal parameters, the behavior is undefined.

If a function that accepts a variable number of arguments is called, and no prototype declarator with the ellipsis notation is in scope, the behavior is undefined.

Recommendation: Whenever you use variable length argument lists in functions, document it thoroughly and use the stdarg (or varargs) header as appropriate. Always declare such functions using a prototype with the appropriate ellipsis notation before you call them.

The order in which function arguments are evaluated is unspecified. For example,

f(i, i++);

contains an unsafe argument list; i++ may be evaluated before i.

Now consider an extension of that example, which involves an array of function pointers:

(*table[i])(i, i++);

Not only is the order in which the arguments are evaluated unspecified, so too is the order in which the expression designating the function is called. Specifically, we can’t sure for sure which element of the array is used! What Standard C does guarantee is that there is a sequence point at the point of the function call; that is, after all three expressions have been evaluated.

Recommendation: Never rely on the order of evaluation of the arguments in a function call or the expression designating the function being called.

A function that has not been explicitly declared is treated as though it were declared with class extern and as returning type int.

Recommendation: When porting, take care that the headers in the target environment contain the necessary function declarations; otherwise, function calls will be interpreted as returning ints, whereas they would not be if a function prototype were in scope. For example, Standard C declares atof and atoi (and the malloc family) in stdlib.h.

Problems can occur when porting code that uses integral constants as function arguments. For example,

/* no prototype for g is in scope of the following call */

g(60000);

void g(int i) { /* … */ }

This program works properly on a machine with 32-bit ints. But on a 16-bit machine, the actual argument to g will be a long int, while g will be expecting an int, two quite different types.

Recommendation: Take care when passing integral constants as function arguments because the type of such a constant depends on its magnitude and the limits of the current implementation. This kind of problem may be difficult to find if the constant is hidden in a macro (such as NULL). Use casts to make sure argument types match, or call functions in the presence of a prototype.

Standard C permits structures and unions to be passed by value. However, maximum size of a structure or union that may be passed by value is implementation-defined.

C89 required that an implementation allow at least 31 arguments in a function call. C99 increased that to 127. K&R placed no minimum limit.

Standard C permits pointers to functions to be used to invoke functions using either (*pfunct)() or pfunct(). The latter format makes the call look like a normal function call, although presumably it will cause less sophisticated source cross-reference utilities to assume that pfunct is a function name rather than a function pointer.

Recommendation: When invoking a function via a pointer, use the format (*fp)() rather than fp(), because the latter is a Standard C invention.

Recommendation: A function prototype can be used to alter the argument widening and passing mechanisms used when a function is called. Make sure that the same prototype is in scope for all calls as well as the definition.

Recommendation: Standard C requires that a strictly conforming program always have a prototype in scope (with a trailing...) when calling a function with a variable number of arguments. Therefore, when using the printf and scanf family of routines, always #include <stdio.h>. If you don't, the behavior is undefined.

While C supports recursion, it is unspecified as to how many levels any function can recurse before stack (or other) resources might be exhausted.

Structure and Union Members

edit

Due to the addition in C89 of structure (and union) argument passing and returning by value and structure (and union) assignment, structure (and union) expressions can exist.

K&R stated that in x->y, x may be either a pointer to a structure (or union) or an absolute machine address. Standard C requires that each structure and union have its own member name space. This requires that the first operand of the. or -> operators must have type structure (or union) or pointer to structure (or union), respectively.

On some machines, the hardware I/O page is mapped into physical memory, so device registers look just like regular memory to any task that can map to this area. To access an offset—a structure member named status, for example—from a specific physical address, previously you could use an expression of the form

0xFF010->status

With each structure and union now having its own member name space, the status member can no longer be accessed in this way. Instead, the physical address must be converted to a structure pointer, so the offset reference is unambiguous, as follows:

((struct tag1 *) 0xFF010)->status
((union tag2 *) 0xFF010)->status

When a union is accessed using a member other than that used to store the immediately previous value, the result is implementation-defined. No assumptions can be made about the degree of overlap of members in a union unless a union contains several structures, each of which has the same initial member sequence. In this special case, members in the common sequence of any of the structures can be inspected provided that the union currently contains one of those structures. For example,

struct rectype1 {
    int rectype;
    int var1a;
};

struct rectype2 {
    int rectype;
    float var2a;
};

union record {
    struct rectype1 rt1;
    struct rectype2 rt2;
} inrec;

If the union currently contains a structure of type rectype1 or rectype2, the particular type being stored can reliably be determined by inspecting either inrec.rt1.rectype or inrec.rt2.rectype. Both members are guaranteed to map to the same area.

Standard C says, “Accessing a member of an atomic structure or union object results in undefined behavior.”

Postfix Increment and Decrement Operators

edit

Some (very old) implementations considered the result of post-increment and post-decrement operator expressions to be modifiable lvalues. This is not recognized by Standard C. Therefore, (i++)++ should generate an error.

Compound Literals

edit

C99 added support for compound literals.

C++ Consideration: Standard C++ does not support compound literals.

Unary Operators

edit

Prefix Increment and Decrement Operators

edit

Some (very old) implementations considered the result of pre-increment and pre-decrement operator expressions to be modifiable lvalues. This is not recognized by Standard C. Therefore, ++(++i) should generate an error.

Address and Indirection Operators

edit

If an invalid array reference (one with a subscript “out of range”), null pointer dereference, or dereference to an object declared with automatic storage duration in a terminated block occurs or if allocated space that has been freed is accessed, the behavior is undefined. Note that depending on how the null pointer is implemented, dereferencing it might cause catastrophic results. For example, on one implementation, an attempt to access a location within the first 512 bytes of an image generates a fatal “access violation.”

In Standard C, the use of the & operator with function names is superfluous.

When the passing of structures and unions by value was added to the language, using & with a structure or union name was no longer superfluous—its absence means “value” and its presence means “pointer to.”

Some implementations accept &bit-field and return the address of the object in which the bit field is packed. This is not permitted by K&R nor supported by Standard C.

Some implementations allow &register-variable, in which case the register class is ignored. This is not permitted by K&R nor supported by Standard C.

Some implementations allow you to take the address of a constant expression under special circumstances, such as in function argument lists. This is not permitted by K&R nor supported by Standard C.

Dereferencing a pointer may cause a fatal run-time error if the pointer was cast from some other pointer type and alignment criteria were violated. For example, consider a 16-bit machine that requires all scalar objects other than char be aligned on word (int) boundaries. As such, if you cast a char pointer containing an odd address to an int pointer, and you dereference the int pointer, a fatal “odd address” trap error will result.

Unary Arithmetic Operators

edit

The unary plus operator was a C89 invention.

Note carefully, that when using twos-complement representation for negative integers, negating INT_MIN quietly results in the same value, INT_MIN; there simply is no positive equivalent of that value! (Likewise, for LONG_MIN and LLONG_MIN.)

The sizeof Operator

edit

Until C99, the result of sizeof was a compile-time constant. However, starting with C99, if the operand is a variable-length array, the operand is evaluated at runtime.

What is the type of the result produced by sizeof? It would seem reasonable that one could use sizeof to find the size of the largest object an implementation supports, which could be an array of char that is very large, or perhaps an array with a very large number of large structs, for example. Certainly, it seems reasonable that sizeof produce an unsigned integer result, but which?

In very old implementations the type of sizeof was int (which is signed). C89 stated, “its type (an unsigned integral type) is size_t defined in the <stddef.h> header.” (See Common Definitions for more information.)

So how then to display the result using printf? Consider the following, where type is some arbitrary data type:

/*1*/ printf("sizeof(''type'') = %u\n", (unsigned int)sizeof(''type''));
/*2*/ printf("sizeof(''type'') = %lu\n", (unsigned long)sizeof(''type''));
/*3*/ printf("sizeof(''type'') = %llu\n", (unsigned long long)sizeof(''type''));
/*4*/ printf("sizeof(''type'') = %zu\n", sizeof(''type''));

Case 1 is portable for sizes up to UINT_MAX with value 65535; case 2 is portable for sizes up to ULONG_MAX with value 4294967295; case 3 is portable for sizes up to ULLONG_MAX with value 18446744073709551615; and case 4 is maximally portable, provided your implementation supports the length modifier z (introduced in C99).

Recommendation: Always use a prototype when calling functions that expect size_t type arguments so that the arguments you supply can be implicitly cast by the prototype if necessary. However, because the function prototype for printf contains an ellipsis for the trailing arguments, no implicit conversion can be specified.

The _Alignof Operator

edit

This was added by C11, which stated, “its type (an unsigned integral type) is size_t defined in the <stddef.h> header.”

The header <stdalign.h>) contains a macro called alignof that expands to _Alignof.

C++ Consideration: The equivalent (but different) keyword added in C++11 is alignof, which Standard C defines as a macro in <stdalign.h>.

Cast Operators

edit

The result of casting a pointer to an integer or vice versa (except for the value zero) is implementation-defined as is the result of casting one pointer type to a pointer type of more strict alignment.

For a detailed discussion on the conversions allowed between dissimilar data pointers and dissimilar function pointers, refer to Pointers.

C11 added the restriction that a pointer type cannot be converted to any floating type, and vice versa.

Explicit casting may be necessary to get the right answer because of “unsigned preserving” versus “value preserving” conversion rules (see Boolean, Characters, and Integers.

A number of the Standard C library functions return void * values, and this is reflected in their corresponding prototypes. As void * is compatible with all other data pointer types, you will not need to explicitly cast the value returned.

C++ Consideration: Converting from void * to a data pointer type requires an explicit cast.

Using an elaborate series of casts, it is possible to write a “fairly” portable expression that will produce the offset (in bytes) of a specific member of a structure. However, this might not work on some implementations, particularly those running on word architectures. For example:

#define OFFSET(struct_type, member) \
((size_t)(char *) &((struct_type *)0)->member)

Recommendation: Standard C provides the macro offsetof (in <stddef.h>) to portably find the offset of a member within a structure. This macro should be used instead of any home-grown mechanism, where possible.

Do not assume that zero cast to a pointer type results in a value that has all-bits zero. However, a pointer with the value 0 (produced either by assignment or casting) must compare equal to zero.

Multiplicative Operators

edit

According to Standard C, integer and floating-point division can result in undefined behavior. C89 introduced some implementation-defined behavior in the integer-division case, but that was removed in C99.

Additive Operators

edit

If an integer is added to or subtracted from a pointer that is not pointing to a member of an array object (or to the non-existent element one beyond the end, the result is undefined. Standard C permits an integer to be subtracted from a pointer pointing to the element immediately beyond the last element in an array, provided the resultant address maps into the same array.

The length of the integer required to hold the difference between two pointers to members of the same array is implementation-defined. Standard C provides the type synonym ptrdiff_t to represent the type of such a value. This signed integral type is defined in <stddef.h>.

Bitwise Shift Operators

edit

The result of a shift by a negative number or by an amount greater than or equal to the width in bits of the expression being shifted is undefined.

If the left-operand is signed and has a nonnegative value, and left-operand × 2right-operand is not representable in the result type, the behavior is undefined.

If the left-operand is signed and has a negative value, the resulting value is implementation-defined.

The widening rules of unsigned preserving and value preserving can cause different results with the >> operator. With UP rules, (unsigned short + int) >> 1 is the same as dividing by 2, whereas with VP, it is not because the type of the expression to be shifted is signed.

Relational Operators

edit

If you compare pointers that are not pointing to the same aggregate, the result is undefined. “Same aggregate” means members of the same structure or elements in the same array. Notwithstanding this, Standard C endorses the widespread practice of allowing a pointer to be incremented one place beyond an object.

The widening rules of unsigned preserving and value preserving can cause different results with the >, >=, < and <= operators.

Equality Operators

edit

A pointer may be compared to 0. However, the behavior is implementation-defined when a nonzero integral value is compared to a pointer.

Structures and unions may not be compared except by member. Depending on the presence of, and contents of, holes, structures might be able to be compared for equality using the library function memcmp.

Take care when using these operators with floating-point operands because most floating-point values can be stored only approximately.

Bitwise AND Operator

edit

By their very nature, the values of bit masks might depend on the size/representation of integers.

Bitwise Exclusive OR Operator

edit

By their very nature, the values of bit masks might depend on the size/representation of integers.

Bitwise Inclusive OR Operator

edit

By their very nature, the values of bit masks might depend on the size/representation of integers.

Logical AND Operator

edit

Standard C defines a sequence point between the evaluations of the first and second operands.

Logical OR Operator

edit

Standard C defines a sequence point between the evaluations of the first and second operands.

Conditional Operator

edit

Standard C defines a sequence point between the evaluation of the first operand and the evaluation of the second or third operand (whichever is evaluated).

Assignment Operators

edit

Simple Assignment

edit

Assigning a zero-valued integer constant expression to any pointer type is portable, but assigning any other arithmetic value is not.

The effect of assigning one pointer type to a more strictly aligned pointer type is implementation-defined.

Standard C requires an explicit cast to assign a pointer of one object type to a pointer of another object type. (A cast is not needed when assigning to or from a void pointer.)

Standard C permits a structure (or union) to be assigned only to a like typed structure (or union).

If an object is assigned to an overlapping object, the result is undefined. (This might be done with different members of a union, for example.) To assign one member of a union to another, go through a temporary variable.

Compound Assignment

edit

Assignment operators of the (very old) form =op are not supported by Standard C. (K&R hinted that they were already archaic back in 1978.)

The result of the following expression is unpredictable because the order of evaluation of operands is undefined.

x[i] = x[i++] + 10;

This can be resolved by using a compound assignment operator, as follows:

x[i++] += 10;

because you are guaranteed that the left-hand operand is evaluated only once.

Comma Operator

edit

Standard C defines a sequence point between the evaluations of the first and second operands.

Constant Expressions

edit

Static initializer expressions are permitted to be evaluated during program startup rather than at compile-time.

The translation environment must use at least as much precision as the execution environment. If it uses more, a static value initialized at compile-time may have a different value than if it were initialized during startup on the target machine.

C89 introduced float, long double, and unsigned integral constants. C99 introduced signed/unsigned long long integral constants.

C99 added support for floating-point constants with binary exponents.

Standard C permits an implementation to support forms of constant expression beyond those defined by the standard (to accommodate other/extended types). However, compilers differ in their treatment of those constant expressions: some are treated as integer constant expressions.

Declarations

edit

Perhaps the biggest impact C89 had on the C language was in the area of declarations. New type-related keywords were added along with terminology to classify them. The most significant aspect, from the programmer's viewpoint, was the adaptation of function prototypes (that is, new-style function declarations) from C++.

Ordering of Declaration Elements

edit

A declaration may contain one or more of the following elements: storage-class specifier, type specifier, type qualifier, function specifier, and alignment specifier. Standard C permits these to be in any order; however, it does require any identifier list to come at the right end. As such,

static const unsigned long int x = 123;

can be rewritten as

int long unsigned const static x = 123;

or in any other combination, so long as x and its initializer come at the end. Similarly.

typedef unsigned long int uType;

can be rewritten as

int long unsigned typedef uType;

Some older compliers might require a specific order. It is debatable whether K&R permitted arbitrary ordering of type specifiers. The grammar on page 192 of K&R indicates that they are supported, but on page 193, it states, “the following [type specifier] combinations are acceptable: short int, long int, unsigned int and long float.” It is unclear whether this should be taken as explicitly disallowing int short, int long, etc.

Position within a Block

edit

Prior to C99, at block scope, all declarations were required to precede all statements. However, that restriction was lifted in C99, which allowed them to be interspersed. C++ also allows this.

Storage-Class Specifiers

edit

The auto Storage Class

edit

It is rare to see auto actually used in code, as Standard C local variables without an explicit storage class default to auto storage class.

The method used to allocate, and the amount of storage available for, automatic variables is up to the implementation. Implementations that use a stack (or other) approach may place limits on the amount of space available for auto objects. For example, 16-bit machines may limit the stack to 64 KB or, if the entire address space is 64 KB, the sum of code, static data, and stack might be 64 KB. In that case, as the size of the code or static data grows, the stack size decreases, perhaps to the point where sufficient auto space cannot be allocated.

Some implementations check for the possibility of stack overflow when each function is entered. That is, they check the amount of stack space available before allocating that required for the function. And if insufficient space is available, they terminate the program. Some implementations actually call a function to perform the check, in which case, each time you call one of your functions having automatic class variables, you are implicitly calling another function as well.

Recommendation: On implementations that “probe the stack” each time a function is called, there might be a compile-time switch allowing such checking to be disabled. While the disabling of such checking can increase the amount of stack available, possibly to the extent of allowing a program to run when it wouldn't otherwise, it is strongly suggested you not do so during testing.

Consider the following auto declarations:

int i, values[10];
int j, k;

The location in memory of these four variables relative to each other is unspecified and can change between compilations on the same or different systems. However, we are guaranteed that the 10 elements in array values are contiguous, with addresses in ascending order.

Recommendation: Do not rely on an implementation to have a particular auto space allocation scheme. In particularly, don't rely on auto variables being allocated space in exactly the same order in which they are declared.

C++ Consideration: C++11 discontinued support for auto as a storage class specifier, and gave it new semantics, as a type specifier.

The register Storage Class

edit

The register storage-class is a hint to the implementation to place the object where it can be accessed as “fast as possible.” Such a location is typically a machine register. The number of register objects that can actually be placed in registers and the set of supported types are implementation-defined. An object with storage class register that cannot be stored in a register, for whatever reason, is treated as though it had storage class auto. Standard C permits any data declaration to have this storage class. It also allows this storage class to be applied to function parameters.

K&R stated “... only variables of certain types will be stored in registers; on the PDP-11, they are int, char and pointer.”

Recommendation: Given the advances in compiler optimization technology, the value of the register storage class on hosted implementations has largely evaporated. (In fact, this was predicted in K&R, which stated, “... future improvements in code generation may render [register declarations] unnecessary.”) It is, therefore, suggested you not use them at all, unless you can prove they are providing some value for one or more of your target implementations.

Standard C does not permit an implementation to widen the allocation space for a variable with the register storage class. That is, a register char cannot be treated as if it were register int. It must behave in all ways as a char, even if it is stored in a register whose size is wider than a char. (Some implementations can actually store more than one register char object in the same register.)

C++ Consideration: While support for this storage class existed through C++14, its use was deprecated. In C++17, the keyword register is unused, but it is reserved for future use (presumably with different semantics).

The static Storage Class

edit

A problem can occur when trying to forward-reference a static function, as follows:

void test()
{
    static void g(void);
    void (*pfi)(void) = &g;

    (*pfi)();
}

static void g()
{
    /* … */
}

Function test has a block scope declaration in which g is declared to be a static function. This allows test to call the static function g rather than any extern function by the same name. Standard C does not permit such declarations. It does, however, allow function declarations with file scope to have storage class static, as follows:

static void g(void);

void test()
{
    void (*pfi)(void) = &g;

    (*pfi)();
}

Recommendation: Do not use the static storage class on a block scope function declaration even if your compiler allows it.

The _Thread_local Storage Class

edit

This was added by C11. (See <threads.h>.)

C++ Consideration: The equivalent (but different) keyword added in C++11 is thread_local, which Standard C defines as a macro in <threads.h> (<threads.h>).

Here’s how to determine if your compiler supports thread-local storage duration:

#ifdef __cplusplus /* Are we using a C++? compiler */
    #if __cplusplus >= 201103L
        /* we have a C++ compiler that supports thread_local storage duration */
    #else
        #error "This C++ compiler does not support thread_local storage duration"
    #endif
#else /* we're using a C compiler */
    #ifdef __STDC_NO_THREADS__
        /* we have a C compiler that supports thread_local storage duration */
    #else
        #error "This C compiler does not support thread_local storage duration"
    #endif
#endif

Type Specifiers

edit

C89 added the following keywords for use in type specifiers: enum, signed, and void. These gave rise to the following base-type declarations:

void /* function returning, only */
signed char
signed short
signed short int
signed
signed int
signed long
signed long int
enum [tag] /* … */

C89 also added support for the following new type declarations (some implementations had already supported unsigned char and unsigned long):

unsigned char
unsigned short
unsigned short int
unsigned long
unsigned long int
long double

C99 added support for the following:

signed long long
signed long long int
unsigned long long
unsigned long long int

Standard C states that whether a plain char (one without the signed or unsigned modifier) is treated as signed or unsigned is implementation-defined.

While K&R permitted long float to be a synonym for double, this practice is not supported by Standard C.

Prior to C99, a type specifier could be omitted with int being assumed; for example, in the file-scope declarations

i = 1;
extern j = 1;

C99 prohibited this.

C99 added support for a Boolean type via the type specifier _Bool. (See <stdbool.h>, which includes a workaround if this header is not available.)

C++ Consideration: The equivalent (but different) keyword in Standard C is bool, which Standard C defines as a macro in <stdbool.h> (<stdbool.h>).

C99 added the type specifier _Complex, which gave rise to the types float _Complex, double _Complex, and long double _Complex. (See <complex.h>.)

C11 added the type specifier _Atomic, but made it optional; see the conditionally defined macro __STDC_NO_ATOMICS__ in Conditionally Defined Standard Macros. (See <stdatomic.h>.)

Representation, Size, and Alignment

edit

The macros in <limits.h> and <float.h> define the minimum range and precision for the arithmetic types. Standard C requires the following:

  • _Bool – large enough to store the values 0 and 1

  • char – at least 8 bits

  • short int – at least 16 bits

  • int – at least 16 bits

  • long int – at least 32 bits

  • long long int – at least 64 bits

  • The range and precision of float must be less than or equal to that for double, which in turn, must be less than or equal for long double. All three types could have the same size and representation, all different, or some overlap thereof.

For integer values, a conforming implementation is permitted to use ones-complement, two-complement, or signed magnitude representation. The minimum limits for signed integer types allow ones-complement. Although an implementation having 32-bit longs and using twos-complement can conform by defining LONG_MIN to have the value -2147483647, it is not unreasonable to expect it would instead use a value of -2147483648, to accurately reflect the type’s twos-complement nature.

Type float is often represented using 32-bit single precision, type double as 64-bit double precision, and type long double also as 64-bit double precision. However, on systems having a separate, extended precision, long double may be mapped to 80 or 128 bits.

Note carefully that it may be unreasonable to expect identical results from floating-point computations from a program even when it runs on multiple processors having the same size and representation of floating-point types (such as occurs with multiple IEEE-based systems). For example, on early Intel floating-point processors, all calculations were done in 80-bit extended mode, which can result in different values than if two doubles were added using strict (64-bit) double mode. Rounding modes also come into play.

Recommendation: With regard to floating-point calculations, set reasonable expectations for the reproducibility of results across different floating-point hardware and software libraries.

Standard  C does not require that sizeof be recognized as an operator in preprocessor #if arithmetic expressions.

Conditionally compiling based on machine word-size is common. Here, the example assumes, perhaps, that if it isn't running on a 16-bit system, it's on a 32-bit machine. To achieve the same result, you must now use something like

#include <limits.h>

#if INT_MAX < LONG_MAX
    long total;
#else
    int total;
#endif

The sizeof compile-time operator reports the number of chars-worth of memory that are occupied by an object of some given data type. If we multiply this by the <limits.h> macro CHAR_BIT we find the number of bits allocated. However, not all bits allocated for an object need be used to represent that object’s value! Following are some examples that demonstrate this:

Case 1: Early machines from Cray Research used a 64-bit, word-addressable architecture. When a short int was declared, although 64 bits were allocated (sizeof(short) resulted in 8), only 24 or 32 bits were actually used to represent the short’s value.

Case 2: Intel floating-point processors support 32-bit single precision, 64-bit double precision, and 80-bit extended precision. As such, a compiler targeting that machine might map float, double, and long double, respectively, to these three representations. If so, one might assume that sizeof(long double) would be 10, and that might be true. However, for performance reasons, a compiler might choose to align such objects on 32-bit boundaries, resulting in 12 bytes being allocated, with two of them going unused.

Case 3: During the deliberations of C89, the issue arose as to whether integer types required a binary representation, and the committee decided that they did. As such, the description was written something like “… each physically adjacent bit represents the next-highest power of two.” However, a committee member reported that his company had a 16-bit processor on which when two 16-bit words were used to represent a long int, the high bit of the low word was not used. Essentially, there was a 1-bit hole in the middle, and shifting left or right took that into account! (Even though 31 bits is insufficient to represent a long int in Standard C, the implementation in question was a viable one for applications targeting its intended market.)

Case 4: For alignment purposes, holes (unused bits, that is) might occur in a structure between fields or after the final field, and inside containers of bitfields.

Recommendation: Do not assume or hard code the size of any object type; obtain that size using sizeof, and use the macros in <limits.h>, <float.h>, and <stdint.h>, as appropriate.

Recommendation: Do not assume that the unused bits allocated to an object have a specific/predictable value.

Although it might be common for all data and function pointer types to have the same size and representation, which might also be that of an integer type, that is not required by Standard C. Some machines used addresses that look like signed integers, in which case, address zero is in the middle of the address space. (On such machines, the null pointer likely will not have a value of “all-bits-zero.” On some segmented-memory architectures, both near (16-bit) and far (32-bit) pointers might be supported. What Standard C requires is that all data and function pointer values can be represented by the type void *.

Recommendation: Unless you have a very specialized application, assume that every pointer type has a unique representation, which is different to that of any integer type, and don’t assume that the null value for any pointer type has a value of “all-bits-zero.”

Some programs inspect and perhaps manipulate the bits in an object by creating a union of it with some integer type. Obviously, this relies on implementation-defined behavior. According to Standard C, with respect to such type punning, “One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence, and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible.”

Structure and Union Specifiers

edit

While K&R did not restrict the types that could be used with bit-fields, C89 allowed int, unsigned int and signed int only, and stated, “Whether the high-order bit position of a “plain” int bit-field is treated as a sign bit is implementation-defined.”

C99 states, “A bit-field shall have a type that is a qualified or unqualified version of _Bool, signed int, unsigned int, or some other implementation-defined type.” C11 added, “It is implementation-defined whether atomic types are permitted.”

K&R required that consecutive bit-fields be packed into machine integers and that they not span word boundaries. Standard C declares that the container object in which bit-fields are packed is implementation-defined. The same is true for whether bit-fields span container boundaries. Standard C lets the order of allocation of bit-fields within a container be implementation-defined.

Standard C permits bit-fields to exist in unions without their first being declared as part of a structure, as follows:

union tag {
    int i;
    unsigned int bf1 : 6;
    unsigned int bf2 : 4;
};

K&R required that all the members in a union “begin at offset 0.” Standard C spells it out even more precisely by saying that a pointer to a union, suitably cast, points to each member, and vice versa. (If any of the members is a bit-field, the pointer points to the container in which that bit-field resides.)

C11 added support for anonymous structures, anonymous unions, and flexible array members.

Enumeration Specifiers

edit

Enumerated types were not part of K&R; however, some compilers implemented them well before C89 existed.

According to the C Standard, “Each enumerated type shall be compatible with char, a signed integer type, or an unsigned integer type. The choice of type is implementation-defined, but shall be capable of representing the values of all the members of the enumeration.”

Recommendation: Do not assume that an enumerated type is represented as an int—it may be any integral type.

Note that Standard C requires enumeration constants to have type int. Therefore, the type of an enumerated data object need not be the same as that of its members.

C99 added support for a trailing comma after an enumerator list, as in

enum Color { red, white, blue, }

C++ Consideration: Standard C++ extends enumerated types by allowing them to be given a specific base type (representation, that is), and by restricting the scope of an enumeration’s constants to just that enumeration type.

Atomic Type Specifiers

edit

These are not permitted unless the implementation supports atomic types, which is determined by testing if the conditionally defined macro __STDC_NO_ATOMICS__ is the integer constant 1. (See <stdatomic.h>.)

The _Atomic type specifier has the form _Atomic ( type-name ), and is not to be confused with the _Atomic type qualifier (Type Qualifiers), which involves just that name only.

C++ Consideration: C++ does not support _Atomic. However, it does define header <atomic>, which gives access to various kinds of atomic-related support.

Type Qualifiers

edit

C89 added the const type qualifier, borrowing it from C++. C89 also added the volatile type qualifier.

Attempting to modify a const object by means of a pointer to a type without the const qualifier, results in undefined behavior.

Recommendation: Do not attempt to modify a const object by means of a pointer to a type without the const qualifier.

Attempting to reference a volatile object by a pointer to a type without the volatile qualifier results in undefined behavior.

Recommendation: Do not access a volatile object by means of a pointer to a type without the volatile qualifier.

C99 added the restrict type qualifier, and applied it to various library functions, as appropriate.

C++ Consideration: C++ does not support restrict.

C11 added the type qualifier _Atomic, which is not to be confused with the _Atomic type specifier (Atomic Type Specifiers).

Function Specifiers

edit

C99 added the function specifier inline. This is a suggestion to the compiler, and the extent to which that hint is followed, is implementation-defined. Prior to C99, some compilers supported this capability via the keyword __inline__.

Standard C permits both an inline definition and an external definition for a function, in which case, it is unspecified whether a call to the function uses the inline definition or the external definition.

C11 added the function specifier _Noreturn. It also provided the header stdnoreturn.h>, which contains a macro called noreturn that expands to _Noreturn.

C++ Consideration: The equivalent (but different) approach to _Noreturn added in C++11 is the attribute noreturn.

Alignment Specifier

edit

C11 added support for alignment specifiers using the keyword _Alignas.

The header Alignment contains a macro called alignas that expands to _Alignas.

C++ Consideration: The equivalent (but different) keyword added in C++11 is alignas.

Standard C states that, “If declarations of an object in different translation units have different alignment specifiers, the behavior is undefined.”

Declarators

edit

General Information

edit

Both K&R and Standard C treat a declarator in parentheses as equivalent to one without. For example, the following is syntactically correct.

void f()
{
    int (i);
    int (g)();
    
}

The second declaration may be used to hide the function declaration from a macro with arguments that has the same name as that function.

Standard C requires that a declaration support at least 12 pointer, array, and function derived declarators modifying a base type. For example, ***p[4] has four modifiers. K&R gave no limit except to say that multiple type modifiers may be present. (The original Ritchie compiler supported only six type modifiers in a declarator.)

Standard C requires that an array dimension have a positive, nonzero value. That is, an array may not have size zero, as permitted by some implementations.

Array Declarators

edit

Standard C permits array declarations to be incomplete by omitting the size information as follows:

extern int i[];
int (*pi)[];

However, the use of such objects is restricted until size information is made available. For example, sizeof(i) and sizeof(*pi) are unknown and should generate an error.

C99 added the ability to have type qualifiers and the keyword static in a declaration of a function parameter with an array type.

C++ Consideration: Standard C++ does not support these things in array-type declarators.

C99 added support for variable-length arrays (VLAs) and required such support. However, C11 made VLAs conditional; see the conditionally defined macro __STDC_NO_VLA__ in Conditionally Defined Standard Macros.

C++ Consideration: Standard C++ does not support variable-length arrays.

Function Declarators

edit

Calling Non-C Functions

edit

Some implementations allow a fortran type specifier (extension) to be used in a function declaration to indicate that function linkage suitable for Fortran (call by reference) is to be generated or that different representations for external names are to be generated. Others provide pascal and cdecl keywords for calling Pascal and C routines, respectively. Standard C does not provide any external linkage mechanism.

C++ Consideration: Standard C++ defined an extern "C" linkage.

Function Prototypes

edit

Borrowing from C++, C89 introduced a new way of declaring and defining a function, which places the parameter information inside the parameter list. This approach uses what is colloquially called a function protype. For example, what used to be written as

int CountThings(); /* declaration with no parameter information */

int CountThings(table, tableSize, value) /* definition */
char table[][20];
int tableSize;
char* value;
{
    /* … */
}

can now be written as

/* function prototype – function declaration with parameter information */

int CountThings2(char table[][20], int tableSize, char* value);

int CountThings2(char table[][20], int tableSize, char *value) /* definition */
{
    /* … */
}

Standard C continues to support the old style.

C++ Consideration: Standard C++ requires function prototype notation.

While you may well have production source code using the old-style of function definitions, these can co-exist with new-style function prototypes. The only potential catch is with narrow types. For example, an old-style definition having parameters of type char, short, or float would expect arguments passed in their wider forms, int, int, and double, respectively, which might not be the case if a prototype were in scope containing the narrow types.

Recommendation: Whenever possible, use function prototypes, as they can make sure functions are called with the correct argument types. Prototypes can also perform conversion of arguments. For example, calling f(int *p) without a prototype in scope, and passing in 0, does not cause that zero to be converted to int *, which on some systems, might cause a problem.

Standard C requires that all calls to functions having a variable argument list be made only in the presence of a prototype. Specifically, the following well-known program from K&R is not a conforming program:

main()
{
    printf("hello, world\n");
}

The reason for this is that, in the absence of a prototype, the compiler is permitted to assume that the number of arguments is fixed. Therefore, it may use registers or some other (presumably) more efficient method of passing arguments than it would otherwise use. Clearly, the printf function is expecting a variable argument list. Typically, it would not be able to communicate properly with code calling printf, if the calling code were compiled with the fixed list assumption. To correct the above example, you must either #include <stdio.h> (the preferred approach) or explicitly write a prototype for printf (including the trailing ellipsis) in the example prior to the function's use. [The function should also be given an explicit return type of int.]

Recommendation: Always have a prototype in scope when calling a function having a variable argument list. Make sure the prototype contains the ellipsis notation.

It is permitted to have a dummy identifier name in prototype declarators; however, using them can cause problems as the following program demonstrates:

#define test 10

int f(int test);

Although the scope of the identifier test in the prototype begins at its declaration and ends at the end of the prototype, that name is seen by the preprocessor. Consequently, it is replaced by the constant 10, thus generating a syntax error. Even worse, if the macro test were defined as *, the prototype would be quietly changed from having a parameter of int to one of a pointer to an int. A similar problem can occur if your implementation's standard headers use dummy names that are part of the programmer's name space (i.e., without leading underscores.)

Recommendation: If you must put identifiers in prototypes, name them so that they won't conflict with macro names. This can be avoided if you always spell macros in uppercase and all other identifiers in lowercase.

The declaration int f(); tells the compile that f is a function return int, but no information is known about the number and type of its parameters. On the other hand, int f(void); indicates there are no parameters.

C++ Consideration: The declarations int f(); and int f(void); are equivalent.

Initialization

edit

If the value of an uninitialized object that has automatic storage duration is used before a value is assigned, the behavior is undefined.

External and static variables not explicitly initialized are assigned the value of 0, cast to their type. (This may differ from the area allocated by calloc (aligned_alloc), which is initialized to all-bits-zero.)

K&R did not allow automatic arrays, structures, and unions to be initialized. Standard C does, however, provided the initializing expressions in any initializer list are constant expressions, and no variable-length arrays are involved. An automatic structure or union can also be initialized with a (nonconstant) expression of the same type.

Standard C permits a union to be initialized explicitly. The value is stored in the union by casting it to the type of the first member specified, so member declaration order can be important! Using this rule, we see that if a static or external union is not explicitly initialized, it contains 0 cast into the first member (which may not result in all-bits-zero, as stated above).

Standard C permits automatic structures and unions to have initializers that are struct or union valued expressions.

Standard C permits bit-fields to be initialized. For example,

struct {
    unsigned int bf1 : 5;
    unsigned int bf2 : 5;
    unsigned int bf3 : 5;
    unsigned int bf4 : 1;
} bf = {1, 2, 3, 0};

K&R and Standard C require that the number of expressions in an initializer be less than or equal to the number expected, but never more. There is one case, however, where it is possible to specify implicitly one too many, yet not get a compilation error. For example,

char text[5] = "hello";

Here, the array text is initialized with the characters h, e, l, l, and o and does not contain a trailing '\0'.

Some implementations allow a trailing comma in an initialization list. This practice is endorsed by Standard C, and was permitted by K&R.

C99 added support for designated initializers.

C++ Consideration: Standard C++ does not support designated initializers.

Static Assertions

edit

C11 added support for static assertions. It also added to the header <assert.h> (<assert.h>) a macro called static_assert that expands to _Static_assert.

External Definitions

edit

Matching External Definitions and Their Declarations

edit

While K&R defined a model to define and reference external objects, numerous other models were also employed, and this led to some confusion. These models are described in subordinate sections below.

Standard C adopted a model that is a combination of the strict ref/def and initialization models. This approach was taken to accommodate as wide a range of environments and existing implementations as possible.

Standard C states that if an identifier with external linkage has incompatible declarations in two source files, the behavior is undefined.

Some implementations cause object modules to be loaded into an executable image simply if one or more of the external identifiers defined in them are declared in user code yet are not actually used. Standard C states that if an identifier with external linkage is not used in an expression, then there need be no external definition for it. That is, you can't force an object to be loaded simply by declaring it!

The Strict ref/def Model

edit
/* source file 1   source file 2 */

int i;             extern int i;
int main()         void sub()
{                  {
    i = 10;            /* … */
    /* … */        }
}

With this model, the declaration of i may occur once and only once without the keyword extern. All other references to that external must have the keyword extern. This is the model specified by K&R.

The Relaxed ref/def Model

edit
/* source file 1   source file 2 */

int i;             int i;
int main()         void sub()
{                  {
    i = 10;            /* … */;
    /* … */        }
}

In this case, neither declaration of i includes the extern keyword. If the identifier is declared (somewhere) with the extern class, a defining instance must occur elsewhere in the program. If the identifier is declared with an initializer, one and only one declaration must occur with an initializer in the program. This model is widely used in UNIX-like environments. Programs that adopt this model conform to Standard C, but are not maximally portable.

The Common Model

edit
/* source file 1   source file 2 */

extern int i;      extern int i;

int main()         void sub()
{                  {
    i = 10;            /* … */;
    /* … */        }
}

In this model, all declarations of the external variable i may optionally contain the keyword extern. This model is intended to mimic that of Fortran's COMMON blocks.

The Initializer Model

edit
/* source file 1   source file 2 */

int i = 0;         int i;

int main()         void sub()
{                  {
    i = 10;            /* … */;
    /* … */        }
}

Here, the defining instance is that containing an explicit initializer (even if that initializer is the default value).

Tentative Object Definitions

edit

Standard C introduced the notion of tentative object definitions. That is, a declaration may be a definition depending on what follows it. For example,

/* tentative definition, external */
int ei1;

/* definition, external */
int ei1 = 10;

/* tentative definition, internal */
static int si1;

/* definition, internal */
static int si1 = 20;

Here, the first references of ei1 and si1 are tentative definitions. If they were not followed by a declaration for the same identifier containing an initializer list, these tentative definitions would be treated as definitions. However, as shown, they are followed by such declarations, so they are treated as declarations. The purpose of this is to allow two mutually referential variables to be initialized to point to each other.

Statements

edit

Labeled Statements

edit

K&R had labels and “ordinary” identifiers sharing the same namespace. That is, a label name was hidden if an identifier with the same name was declared in a subordinate block. For example,

void f()
{
    label: ;
    {
        int label;
        
        goto label;
    }
}

would generate a compilation error because the target of the goto statement is an identifier declared to be an int variable, not a label.

In Standard C, labels have their own namespace allowing the above example to be compiled without error.

K&R specified that the length of significance in an internal identifier (such as a label) was eight characters.

Standard C requires at least 63 characters of significance in an internal identifier, such as a label.

Compound Statement (Block)

edit

Prior to C99, all declarations within a block had to precede all statements. However, starting with C99, the two can be interspersed.

C++ Consideration: C++ allows declarations and statements to be interspersed.

A goto or switch can be used to jump into a block. While doing so is portable, whether any “bypassed” automatic variables in the block are initialized predictably is not.

K&R permitted blocks to nest, but it gave no indication as to how deeply.

C89 required compound statements to nest to at least 15 levels. C99 increased this to 127.

Expression and Null Statements

edit

Consider the following example that uses the volatile type qualifier (added by C89):

extern volatile int vi[5];
void test2()
{
    volatile int *pvi = &vi[2];

    vi[0];
    pvi;
    *pvi;
    *pvi++;
}

Optimizers must tread very carefully when dealing with objects having the volatile qualifier, because they can make no assumptions about the current state of such an object. In the simplest case, an implementation might evaluate every expression containing a volatile expression simply because doing so might generate an action visible in the environment. For example, the statement *pvi; could generate code to access vi[2]. That is, it might place the address of vi[2] on the bus such that it can be seen by hardware waiting to synchronize on such an access. Note that even if an implementation does this, it should not generate code for the statement pvi; because pvi is not itself volatile and evaluating the expression pvi does not involve accessing a volatile object.

Recommendation: Do not rely on expressions statements such as i[0];, pi;, and *pi; to generate code. Even if i is a volatile object, it is not guaranteed that the volatile object would be accessed as a result.

Selection Statements

edit

Regarding nested limits on selection statements, see Compound Statement.

The if Statement

edit

As the controlling expression is a full expression, there is a sequence point immediately following it.

The switch Statement

edit

K&R required that the controlling expression and each case constant expression have type int.

Standard C requires that the controlling expression have some integral type. Each case expression must also be of integral type, and each expression’s value is converted to the type of the controlling expression, if necessary.

As Standard C supports enumerated data types (which are represented by an integral type), it permits their use in switch expressions and in case constant expressions. (Enumerated types are not defined in K&R.) Some implementations have a notation for specifying a range of values for a case constant expression. Note that because several different and incompatible syntaxes are in use, this feature is not supported by Standard C.

Standard C permits a character constant to contain multiple characters, as in 'ab' and 'abcd'. Character constants are permitted in case constant expressions.

Recommendation: As the internal representation of multi-character character constants is implementation-defined, they should not be used in case constant expressions.

K&R did not specify the maximum number of case values permitted in a switch statement. C89 required support for at least 257 cases for each switch statement. C99 increased that to 1023.

As the controlling expression is a full expression, there is a sequence point immediately following it.

Refer to Compound Statement for a discussion of transferring into compound statements within switch statements.

Iteration Statements

edit

The controlling expressions in while, do, and for statements may contain expressions of the form expr1 == expr2. If expr1 and expr2< are floating-point expressions, equality may be difficult or impossible to achieve due to the implementation-defined nature of floating-point representation, rounding, etc.

Recommendation: If the controlling expressions in while, do and for statements contain floating-point expressions, note that the results of floating-point equality tests are implementation-defined. It may be more desirable to have something like fabs(expr1 - expr2) < 0.1e-5 rather than expr1 == expr2, for example.

Some programs contain “idle” loops; that is, loops that are intended to simply pass time, perhaps as a crude approximation of actual wall-clock time. For example:

for (i = 0; i <= 1000000; ++i) { }

To address the utility of such constructs, C11 added the following: “An iteration statement whose controlling expression is not a constant expression, that performs no input/output operations, does not access volatile objects, and performs no synchronization or atomic operations in its body, controlling expression, or (in the case of a for statement) its expression-3 [the expression evaluated after each iteration], may be assumed by the implementation to terminate.” In lay terms, this means that the compiler can throw away the whole loop, provided it implements any other side effects that loop contains (in this case, making sure i finishes up with the value 1000001).

Recommendation: Don’t use “idle” loops to simply pass time. Even if such loops are not optimized away, their execution time is very much dependent on factors like task/thread priority and processor speed.

Standard C guarantees at least 15 levels of nesting of selection control structures, iteration control structures, and compound statements. C99 increased this to 127. K&R did not specify a minimum limit.

The while Statement

edit

As the controlling expression is a full expression, there is a sequence point immediately following it.

The do Statement

edit

As the controlling expression is a full expression, there is a sequence point immediately following it.

The for Statement

edit

As the three expressions are full expressions, there is a sequence point immediately following each one.

C99 added support for the first part of a for to be a declaration, as in int i = 0, rather than requiring i to already be defined.

C++ Consideration: Standard C++ supports this C99 feature as well. C++ differs from C regarding the scope of any variables declared in a for statement.

Jump Statements

edit

The goto Statement

edit

Refer to Labeled Statements for a discussion of the implications of a separate label namespace. Refer to Compound Statement to learn about the ramifications of jumping into compound statements.

The return Statement

edit

When the form return expression; is used, as expression is a full expression, there is a sequence point immediately following it.

If the value of a function call is used, but no value is returned, the result is undefined except since C99 for main, which has an implicit return 0;.

Standard C supports the void function type, which allows the compiler to ensure that a void function has no return value. K&R did not include the void type.

Standard C supports the ability to return structures and unions by value. It places no constraint on the size of the object being returned, although the size of such objects that can be passed to a function by value may be limited. K&R did not include the returning of structures and unions by value.

K&R (pp 68 and 70) shows the general form of the return statement to be return(expression); yet the formal definition on page 203 shows return expression;. This may appear to be a contradiction. Page 203 is correct—the parentheses are not part of the syntax; they are merely redundant grouping parentheses and are part of the expression. The confusion comes from the fact that most (if not all) of the examples using return in K&R have the returned value within parentheses. From a style point of view, the parentheses can be useful as they help to separate the return keyword from the expression, and they clearly delimit the expression if it is rather complex. However, they are never needed. (Note that in the second edition of K&R, the parentheses have been removed from the examples, and, often, main is terminated using return 0;.

C99 added the following constraint: “A return statement without an expression shall only appear in a function whose return type is void.”

The Preprocessor

edit

According to the original C Standard Rationale document (which was written as C89 was developed), “Perhaps the most undesirable diversity among existing C implementations can be found in preprocessing. Admittedly a distinct and primitive language superimposed upon C, the preprocessing commands accreted over time, with little central direction, and with even less precision in their documentation.”

General Information

edit

Preprocessor versus Compiler

edit

Many C compilers involve multiple passes, the first of which often contains the preprocessor. Using this knowledge, a compiler can often take short cuts by arranging information to be shared between the preprocessor and the various phases of the compiler proper. While this may be a useful feature for a particular implementation, you should keep in mind that other implementations may use completely separate, and noncooperating, programs for the preprocessor and the compiler.

Recommendation: Keep the ideas of preprocessing and compilation separate. One possible problem when you fail to do this will be demonstrated when the sizeof operator is used as discussed later.

Although C is a free-format language, the preprocessor need not be because, strictly speaking, it is not part of the C language. The language and the preprocessor each have their own grammars, constraints, and semantics. Both are defined by Standard C.

The Directive Name Format

edit

A preprocessing directive always begins with a # character. However, not all preprocessors require the # and the directive name to be a single token. That is, the # prefix may be separated from the directive name by spaces and/or horizontal tabs.

K&R shows the # as part of the directive name, with no intervening white space. No statement is made as to whether such white space is permitted.

Standard C permits an arbitrary number of horizontal tabs and spaces between the # and the directive name, which are considered to be separate preprocessing tokens.

Start Position of Directives

edit

Many preprocessors permit directives to be preceded by white space allowing indenting of nested directives. Less flexible preprocessors require the # character to be the first character of a source line.

K&R states that “Lines beginning with # communicate with this preprocessor.” No definition for “beginning with” is given.

Standard C permits an arbitrary amount of white space before the # character. This white space is not restricted to horizontal tabs and spaces—any white space is allowed.

White Space Within Directives

edit

Standard C requires that all white space appearing between the directive name and the directive's terminating new-line be horizontal tabs and/or spaces.

K&R makes no statement about the validity or nature of such embedded white space.

If you use at least one white space character to separate tokens in a directive, the actual number of such characters (and the mix of tabs and spaces) is almost always immaterial to the preprocessor. An exception has to do with benign redefinition of macros using the #define directive. This is discussed later in this chapter.

Macro Expansion Within a Directive

edit

According to Standard C, “The preprocessing tokens within a preprocessing directive are not subject to macro expansion unless otherwise stated. [For] example, in

#define EMPTY
EMPTY # include <file.h>

the sequence of preprocessing tokens on the second line is not a preprocessing directive, because it does not begin with a # at the start of translation phase 4 (see Phases of Translation), even though it will do so after the macro EMPTY has been replaced.”

Directive Continuation Lines

edit

K&R declared that macro definitions (with and without arguments) could be continued across multiple source lines if all lines to be continued contained a backslash immediately preceding the terminating new-line.

Standard C has generalized this notion and permits any token (not just those seen by the preprocessor, but by the language as well) to be split up/continued using the backslash/new-line sequence.

In the following case, the second source line starting with #define does not begin a macro definition directive because it is a continuation line and the #, therefore, is preceded by other than spaces and/or horizontal tabs.

#define NAME … \
#define …

Trailing Tokens

edit

Strictly speaking, the preprocessor should diagnose any tokens in excess of those expected. However, some implementations process only the tokens they expect, then ignore any tokens remaining on the directive line. If this is the case, the source line

#include <header.h> #define MAX 23

(which seems to indicate that a new-line was somehow omitted, perhaps lost during conversion for porting) would cause the header to be included. However, the macro definition will be ignored. Another example is

#ifdef DEBUG fp = fopen(name, "r");

In this case, the file is never opened regardless of whether DEBUG is defined.

K&R gives no indication as to what should happen in these cases.

Standard C requires a diagnostic if excess tokens are present.

Comments in Directives

edit

Delimited comments are treated as a single space so they can occur anywhere that white space can. As all, or various kinds of, white space can occur in preprocessing directives, so too can delimited comments. For example, in the directives

#define NUM_ELEMS 50 /* … */
#include "global.h" /* … */
/* … */ #define FALSE 0
/* … */ #ifdef SMALL
/* … */ #define TRUE 1 /* … */

each delimited comment is replaced by a single space during preprocessing. While the first two directives should port without error, the last three have leading horizontal white space, something not universally accepted, as noted earlier.

Of course, such delimited comments can occur between directive tokens.

Note that delimited comments can be continued indefinitely across source lines without requiring backslash/new-line terminators.

Line-oriented comments may also be used with directives.

Phases of Translation

edit

Standard C contains a detailed discussion of the manner and order in which source text is translated into tokens for processing by the compiler. Prior to C89 there were no hard and fast rules governing this area, allowing code such as the following to be interpreted in different ways by different preprocessors:

#ifdef DEBUG
    #define T
#else
    #define T /\
    *
#endif

T printf(...); /* … */

The intent here, perhaps, is to disable the printf function call by having T become the start of a comment whenever DEBUG is not defined. As one programmer put it, “To define T as /* we need to fool the preprocessor, because it detects comments before doing anything else. To do this, we place the asterisk on a continuation line. As the preprocessor doesn't see the token /*, everything works as expected. It works fine with C compilers in UNIX environments.”

But does the preprocessor detect comments before doing anything else? As the answer to this question varies by implementation, let's look at what Standard C says. The phases of translation, as they affect the preprocessor, follow:

  • Backslash/new-line pairs are removed so that continuation lines are spliced together.

  • The source is broken into preprocessing tokens and sequences of white space characters (including comments).

  • Each comment is replaced with a space character. However, whether consecutive white space characters are compressed to one such character is implementation-defined.

  • Preprocessing directives are executed, and macro invocations are expanded. For each header included here, the steps outlined, are followed over again.

A Standard C compiler, therefore, is obliged to diagnose an error when given the previous code because the #endif directive will be included in the comment started on the macro definition line.

Some implementations expand macros before looking for preprocessor commands thus accepting the following code:

#define d define
#d MAX 43

This is disallowed by Standard C.

Inspecting Preprocessor Output

edit

Some implementations have a preprocessor separate from the compiler, in which case, an intermediate text file is produced. Other implementations, which combine the preprocessor and compiler, have a listing option that allows the final effect of all directives to appear in the compilation listing file. They may also allow intermediate expansions of macros whose definitions contain other macros to be listed. Note that some implementations are not able to preserve comments or white space when saving the intermediate code because comments may already have been reduced to spaces prior to the preprocessor directives being processed. This is the case with Standard C's phases of translation.

Recommendation: See which of your implementations allows the output of the preprocessor to be saved. One particularly useful quality assurance step is to compare the output text files produced by each of your preprocessors. This allows you to check if they expand macros and conditionally include code in the correct manner. So, when you transport a source file to a new environment, you may also wish to transport the preprocessed version of that file.

Source File Inclusion

edit

The #include directive is used to treat the contents of the named header as if it were in-line as part of the source file being processed. A header need not correspond exactly to a text file (or be of the same name), although it often does.

Standard C requires that a header contain complete tokens. Specifically, you may not put only the start or finish of a comment, string literal, or character constant in a header. A header must also end with a new-line. This means that you cannot paste tokens together across #includes.

To help avoid name conflicts between standard headers and programmer code, Standard C requires implementers to begin their identifiers with two underscores or an underscore and an uppercase letter. The index in K&R contains only three macros—EOF, FILE, and NULL. It also lists some 20–30 library functions. No others are mentioned or required. Standard C, on the other hand, contains hundreds of reserved identifiers, most of which are macros or library function names. Add to that the system-related identifiers used by your compiler and those identifiers used by any third-party libraries, and you have a potential for naming conflicts.

Recommendation: For each of your target environments, generate a reserved identifier list in sorted alphabetical order by header, and across headers. Use this list of identifiers for two purposes: names to stay away from when inventing your own identifier names and to find the union of all sets so you know what names are common and can be used meaningfully in common code. Note that just because a macro of the same name appears in different environments does not mean it is used for the same purpose. For names you invent, use some unique prefix (not leading underscores), suffix, or naming style so the likelihood of conflict is reduced.

#include Directive Format

edit

K&R and Standard C define the following two forms of the #include directive. For directives of the form

#include "header-name"

K&R stated, “the header is searched for first in the directory of the original source file, then in a sequence of standard places.” Standard C states that the header is searched for in an implementation-defined manner.

K&R and Standard C require that only the implementation-defined standard places are searched for directives of the form

#include <header-name>

C89 added a third form,

#include identifier

provided identifier ultimately translates to the form "…" or <…>. As a macro name is an identifier, this format allows a header name to be constructed or otherwise defined either using the token-pasting preprocessing operator ## or by defining the macro on the compiler command line. Many compilers support a command-line argument of the form -Didentifier or /define=identifier, which is equivalent to having #define identifier 1 in the source being compiled.

If your target compilers support the -D (or /d) option discussed above and the #include identifier format, you can specify a header's full device/directory path name at compilation-time rather than hard-code that information into the #include directive.

One technique to help isolate hard-coded header locations follows. A master header contains

/* hnames.h - header names, fully qualified */

#define MASTER "DBA0:[data]master.h"
#define STRUCTURES "DUA2:[templates]struct.h"

Now, if this header is included in another header, these macro names can be used as follows:

#include "hnames.h"

#include MASTER
#include STRUCTURES

If you move the code to another system or you move the headers to a different location on the same system, you simply modify the hnames.h header and recompile all modules that include it.

Header Names

edit

The format and spelling of the header name in the "…" and <…> formats is implementation-dependent.

A peculiar problem occurs with file systems that use the backslash to separate subdirectory and file names in file pathnames. The completely qualified name of a DOS disk file has the following format:

\dir1\dir2\ ... \filename.ext

The problem arises with directory and filenames such as

\totals\accum.h
\summary\files.h
×\filecopy.h
\volume\backup.h

Here, either the directory or the file name (or both) begin with the sequence \x, where x is a recognizable special character sequence within a C literal string. The problem then becomes, “How do I name this header when including it?”

According to Standard C, although a header written as "…" looks like a string literal, it is not! As such, its contents must be taken verbatim.

Recommendation: If possible, avoid embedding file system device, directory, and subdirectory information in header names.

When creating a new header and headers map directly to file names on your system, keep in mind the limits on file naming across systems. For example, some file systems are case-sensitive, in which case STDIO.H, stdio.h, and Stdio.h could be three separate files.

C89 stated, “The implementation may ignore the distinctions of alphabetical case and restrict the mapping to six significant characters before the period.” C99 increased the number of significant characters to eight.

Nested Headers

edit

A header may contain #include directives. The level of header file nesting permitted is implementation-defined. K&R states that headers may be nested, but gives no minimum requirement. Standard C requires at least eight levels of header-nesting capability.

Recommendation: If headers are designed properly, they should be able to be included multiple times and in any order. That is, each header should be made self-sufficient by having it include any headers it relies on. Put only related things in a header and restrict nesting to three, or at most four, levels. Use #ifdef wrappers around the contents of a header so they are not included more than once in the same scope.

#include Path Specifiers

edit

K&R and Standard C provide only two (main) mechanisms to specify header location search paths, namely, "…" and <…>. Sometimes it is necessary or desirable to have more than this, or perhaps for testing purposes, you temporarily want to use some other location instead. Many implementations allow one or more include search paths to be specified as command-line arguments at compile-time. For example,

cc -I''path1'' -I''path2'' -I''path3'' ''source-file''

tells the preprocessor to first search for "…" format headers using path1 then path2 and path3 and, finally, in some system default location. The lack of this feature or the support of less than the required number of paths may cause problems when porting code. Even though your compiler may support a sufficient number of these arguments, the maximum size of the command-line buffer may be such that it will be too small to accommodate a number of verbose path specifiers.

Recommendation: If this capability is present in all your implementations, check the number of paths supported by each.

Modification of Standard Headers

edit

Recommendation: Do not modify standard headers by adding to them definitions, declarations, or other #includes. Instead, create your own miscellaneous or local header and include that in all the relevant places. When upgrading to new compiler versions or when moving to different compilers, no extra work need be done beyond making that local header available as before.

Macro Replacement

edit

The #define directive is used to associate a string definition with a macro name. As a macro name is an identifier, it is subject to the same naming constraints as other identifiers. K&R required eight characters of significance, and Standard C requires 31.

Recommendation: Use the lowest common denominator length of significance for macro names.

Recommendation: Regarding spelling macro names the most common convention is to use uppercase letters, digits, and underscores only.

Standard C requires that tokens specified as a part of a macro definition must be well formed (i.e., complete). Therefore, a macro definition cannot contain just the start or end part of a comment, literal string, or character constant.

Some compilers allow partial tokens in macros such that when the macro is expanded, it is pasted to the token preceding and/or following it.

Recommendation: Avoid having partial tokens in macro definitions.

The definition of a macro might contain an arithmetic expression such as

#define MIN 5
#define MAX MIN + 30

The preprocessor does not recognize this as an expression, but rather as a sequence of tokens that it substitutes wherever the macro is called. It is not permitted to treat the definition of MAX as if it were

#define MAX 35

Preprocessor arithmetic comes into play only with the conditional inclusion directive #if. However, the original definition of MAX above will be treated as an expression if the following code is used:

#if MAX

#endif

This will be expanded to

#if MIN + 30

#endif

then

#if 35

#endif

An implementation might place a limit on the size of a macro's definition.

Recommendation: If you plan on having macros whose definitions are longer than 80 characters, test your environments to see what their limits are.

Macros with Arguments

edit

A macro with arguments has the general form

#define name(arg1, arg2, ...,argn) definition

K&R does not state the maximum number of arguments allowed.

Standard C requires support for at least 31 arguments.

Recommendation: If you plan on using macros with more than four or five arguments, check the limits of your target implementations.

While no white space is permitted between the macro name and the left parenthesis that begins the argument list in a macro definition, this constraint is not present in macro calls.

There is no requirement that all of the arguments specified in a macro definition argument list must appear within that macro's definition.

C99 added support for macros with a variable number of arguments (via ellipses notation and the special identifier __VA_ARGS__).

Rescanning Macro Names

edit

A macro definition can refer to another macro, in which case, that definition is rescanned, as necessary.

Standard C requires the definition of a macro to be “turned off” for the duration of the expansion of that macro so that “recursive death” is not suffered. That is, a macro name that appears within its own definition is not re-expanded. This allows the name of a macro to be passed as an argument to another macro.

Replacement Within String Literals and Character Constants

edit

Some implementations allow macro arguments within string literals and character constants to be replaced as follows:

#define PR(a) printf("a = %d\n", a)

Then the macro call

PR(total);

is expanded to

printf("total = %d\n", total);

On implementations that do not allow this, the macro would be expanded to

printf("a = %d\n", total);

K&R states that “text inside a string or character constant is not subject to replacement.”

Standard C does not support the replacement of macro arguments within strings and character constants. However, it does supply the (C89 addition) stringize operator (#), so that the same effect can be achieved. For example,

#define PR(a) printf(#a " = %d\n", a)

PR(total);

expands to

printf("total" " = %d\n", total);

and because Standard C permits adjacent strings to be concatenated, this becomes

printf("total = %d\n", total);

Command-Line Macro Definition

edit

Many compilers allow macros to be defined using a command-line argument of the form -Didentifier or /define=identifier, which is equivalent to having #define identifier 1 in the source being compiled. Some compilers allow macros with arguments to be defined in this manner.

The size of the command-line buffer, or the number of command-line arguments, may be such that there is insufficient room to specify all the required macro definitions, particularly if you use this mechanism to specify identifiers used in numerous #include directives.

Recommendation: Qualify if this capability is present in all your implementations. At least five or six identifiers should be supported provided you keep the lengths of each to a minimum. (Note that if you use 31-character identifier names, you might exceed your command-line buffer size.)

Macro Redefinition

edit

Many implementations permit an existing macro to be redefined without its first being #undefed. The purpose of this (generally) is to allow the same macro definition to occur in multiple headers, all of which are included in the same scope. However, if one or more of the definitions is not the same as the others, a serious problem can occur. For example,

#define NULL 0

#define NULL 0L

causes the first part of the code to be compiled using zero as the value for NULL and the last part using zero long. This can cause serious problems when using f(NULL) because the size of the object passed to f might not the same as that expected by f.

Standard C allows a macro to be redefined provided the definitions are the same. This is known as benign redefinition. Just what does “the same” mean? Basically, it requires that the macro definitions be spelled EXACTLY the same, and depending on how white space between tokens is processed, multiple consecutive white space characters may be significant. For example,

1. #define MACRO a macro
2. #define MACRO a macro
3. #define MACRO a<tab>macro
4. #define MACRO a  macro
5. #define MACRO example

Macros 1 and 2 are the same. Macros 3 and 4 might also be the same as 1 and 2 depending on the handling of the white space. Macro 5 would definitely be flagged as an error. Note that this does not solve the problem of having different definitions for the same macro that are not in the same scope.

Recommendation: It is legitimate to have a macro defined exactly the same in multiple places (typically in headers). In fact, this idea is encouraged for reasons stated elsewhere. However, avoid having different definitions for the same macro. As the use of multiple, consecutive white space characters may result in different spellings (as in macros 3 and 4 above), you should separate tokens by only one white space character, and the character you use should be consistent. As horizontal tabs may be converted to spaces, space separators are suggested.

By macro redefinition, we mean either the redefinition of a macro without arguments to a macro of the same name, also without arguments, or the redefinition of a macro with arguments with the same macro name with the same number and spelling of arguments.

Recommendation: Even if your implementation allows it, do not redefine a macro without arguments to one with arguments or vice versa. This is not supported by Standard C.

Predefined Standard Macros

edit

Standard C specifies the following predefined macros:

  • __DATE__C89 – Date of compilation

  • __FILE__C89 – Name of the source file being compiled; however, no mention is made as to whether this name is a fully qualified path name

  • __LINE__C89 – Current line number in the source file being compiled

  • __STDC__C89 – Has the value 1 if the compiler conforms to some edition of Standard C (see __STDC_VERSION__). Don’t assume that the presence of this name implies conformance; that requires the value 1. An implementation might define this macro as 0 to indicate “not quite conforming,” or as 2 to indicate “contains extensions.” To determine if a compiler complies with C89, check that __STDC__ is defined to 1 and that __STDC_VERSION__ is not defined

  • __STDC_HOSTED__C99 – Indicates if the implementation is hosted or free-standing

  • __STDC_VERSION__C95 – The Standard C edition to which this compiler conforms (see __STDC__), as follows: C95 199409L, C99 199901L, C11 201112L, and C17 201710L.

  • __TIME__C89 – Time of compilation

Attempting to #define or #undef any of these predefined names results in undefined behavior.

Macro names beginning with __STDC_ are reserved for future standardization.

K&R contained no predefined macros. __LINE__ and __FILE__ were available in some implementations prior to C89, as were __DATE__ and __TIME__; however, the date string format varied.

Standard C requires that “any other predefined macro names begin with a leading underscore followed by an uppercase letter or a second underscore.” It also prohibits the definition of the macro __cplusplus (either predefined or in a standard header).

C++ Consideration: Standard C++ predefines __cplusplus, which expands much like __STDC_VERSION__ by encoding a version number. Also, whether a standard-conforming C++ implementation predefines __STDC__ or __STDC_VERSION__ is implementation-defined.

Macros defined via a compiler command line option are not considered to be predefined macros, even though conceptually they are defined prior to the source being processed.

Except for the macros specified by Standard C, all other predefined macros names are implementation-defined. There is no established set of names, but the GNU C compiler provides a large and rich set that other implementations may well emulate.

A conforming implementation might conditionally define other macros (see Conditionally Defined Standard Macros).

Conditionally Defined Standard Macros

edit

Standard C permits, but does not require, the following macros to also be predefined:

  • __STDC_ANALYZABLE__C11

  • __STDC_IEC_559__C99

  • __STDC_IEC_559_COMPLEX__C99 an implementation that defines __STDC_NO_COMPLEX__ must not also define __STDC_IEC_559_COMPLEX__

  • __STDC_ISO_10646__C99 Also defined by Standard C++

  • __STDC_LIB_EXT1__C11

  • __STDC_MB_MIGHT_NEQ_WC__C11

  • __STDC_NO_ATOMICS__C11

  • __STDC_NO_COMPLEX__C11

  • __STDC_NO_THREADS__C11

  • __STDC_NO_VLA__C11

  • __STDC_UTF_16__C11

  • __STDC_UTF_32__C11

Macro Definition Limit

edit

The maximum number of entries that can fit in an implementation's preprocessor symbol table may vary considerably as can the amount of total string space available for macro definitions.

C89 required that at least 1024 (4095 in C99 and later) macro identifiers be able to be simultaneously defined in a source file (including all included headers). While this guarantee may allow that many macros, a conforming implementation might require that each macro definition be restricted in length. It certainly does not guarantee that many macro definitions of unlimited length and complexity.

K&R makes no statement about the limit on the number or size of concurrent macro definitions.

Recommendation: If you expect to have a large number (greater than a few hundred) of concurrent macro definitions, write a program that can generate test headers containing macros of arbitrary number and complexity, to see what each of your implementations can handle. There is also some incentive to include only those headers that need be included and to modularize headers such that they contain only related material. It is perfectly acceptable to have the same macro definition in multiple headers. For example, some implementers define NULL in several headers just so the whole of stdio.h need not be preprocessed for one macro name.

Stacking Macro Definitions

edit

Some implementations allow the stacking of macros. That is, if a macro name is in scope and a macro of the same name is defined, the second definition hides the first. If the second definition is removed, the first definition is back in scope again. For example,

#define MAX 30

 /* MAX is 30 */


#define MAX 35

 /* MAX is 35 */

#undef MAX

 /* MAX is 30 */

Standard C does not permit the stacking of macro definitions.

K&R states that the use of #undef “causes the identifier's preprocessor definition to be forgotten,” presumably forgotten completely.

The # Stringize Operator

edit

This was a C89 invention.

C99 added support for empty macro arguments, which each result in the string "".

The order of evaluation of # and ## operators is unspecified.

The ## Token-Pasting Operator

edit

This was a C89 invention. It allows a macro expansion to construct a token that can then be rescanned. For example, with a macro definition of

#define PRN(x) printf("%d", value ## x)

the macro call

PRN(3);

generates the code

printf("%d", value3);

A common solution to this problem prior to Standard C follows:

#define M(a, b) a/* */b

Here, instead of the definition being a b (because the comment is replaced with a space), the implementation made it ab, thus forming a new token that was then rescanned. This practice is not supported by either K&R or Standard C. The Standard C approach to this is

#define M(a, b) a ## b

where the spaces around the ## operator are optional.

Standard C specifies that in A ## B ## C, the order of evaluation is implementation-defined.

An interesting situation exists in the following example:

#define INT_MIN -32767

int i = 1000-INT_MIN;

Here, the macro expands producing 1000--32767, which looks perhaps as if it should generate a syntax error because 1000 is not a lvalue. However, Standard C resolves this with its “phases of translation” by requiring that the preprocessing tokens - and 32767 retain their meaning when handed off to the compiler. That is, the two minus signs are not recognized as the autodecrement token, --, even though they are adjacent in the expanded text stream. A non-Standard implementation, however, might rescan the text producing a different token sequence by pasting the two tokens together.

Recommendation: To avoid such macro definitions being misinterpreted, surround them with parentheses, as in #define INT_MIN (-32767).

The order of evaluation of # and ## operators is unspecified.

Redefining Keywords

edit

Some implementations (including Standard C) permit C language keywords to be redefined. For example,

#if __STDC__ != 1
#define void int
#endif

Recommendation: Do not gratuitously redefine language keywords.

The #undef Directive

edit

#undef can be used to remove a library macro to get access to a real function. If a macro version does not exist, Standard C requires the #undef to be ignored because a nonexistent macro can be the subject of an #undef without error.

Refer to Stacking Macro Definitions for a discussion of the use of #undef in stacked macro implementations.

Standard C does not permit the predefined standard macros (Predefined Standard Macros) to be #undefed.

Conditional Inclusion

edit

This capability is one of the most powerful parts of a C environment available for writing code that is to run on different target systems.

Recommendation: Make as much use as possible of conditional inclusion directives. This is made easier if you have, or you establish, a meaningful set of macros that distinguish one target environment from another. See <limits.h> and <float.h> for details of host characteristics.

#if Arithmetic

edit

The target of an #if directive is a constant expression that is tested against the value 0.

Some implementations allow the sizeof operator to be used in the constant expression as follows:

#if sizeof(int) == 4
    int total;
#else
    long total;
#endif

Strictly speaking, the preprocessor is a macro processor and string substitution program and need not have any knowledge about data types or the C language. Remember that sizeof is a C language compile-time operator, and at this stage, we are preprocessing, not compiling.

K&R uses the same definition of constant expression for the preprocessor as it does for the language, thus implying that sizeof is permitted here. No mention is made of the use of casts in constant expressions (even in the language).

Standard C requires that the constant expression not contain a cast or an enumeration constant. With Standard C, whether the sizeof operator is supported in this context is implementation-defined. That is, while it is permitted, it is not guaranteed. Note that if an enumeration constant were present, it would be treated as an unknown macro and, as such, would default to a value of 0.

Recommendation: Do not use sizeof, casts, or enumeration constants in conditional constant expressions. To get around the inability of using sizeof, you may be able to determine certain attributes about your environment by using the header limits.h.

C89 states that, “… the controlling constant expression which is evaluated according to the rules of using arithmetic that has at least the ranges specified in Numerical Limits, except that int and unsigned int act as if they have the same representation as, respectively, long and unsigned long.”

C99 changed this to, “… the controlling constant expression, which is evaluated according to the rules of 6.6, except that all signed integer types and all unsigned integer types act as if they have the same representation as, respectively, the types intmax_t and uintmax_t defined in the header <stdint.h>.”

Floating-point constants are not permitted.

Recommendation: Do not rely on underflow or overflow because arithmetic properties vary widely on ones- and twos-complement and packed-decimal machines. Do not use the right-shift operator if signed operands are present because the result is implementation-defined when the sign bit is set.

A character constant can legitimately be part of a constant expression (where it is treated as an integer). Character constants can contain any arbitrary bit pattern (by using '\nnn' or '\xhhh'). Some implementations support character constants whose value was negative (e.g., '\377'and  '\xFA'have their high bits set).

Standard C states that whether or not a single-character character constant may have a negative value is implementation-defined. K&R makes no statement.

Some implementations support multi-character constants, as does Standard C.

Recommendation: Do not use character constants whose value may be negative. Also, because the order and meaning of characters in multi-character constants is implementation-defined, do not use them in #if constant expressions.

In Standard C, if the constant expression contains a macro name that is not currently defined, the macro is treated as if it were defined with the value 0. The macro name is only interpreted that way; it does not actually become defined with that value.

K&R makes no provision for this case.

Recommendation: Do not use the fact that undefined macros evaluate to 0 in constant expressions. If a macro definition is omitted, either from a header or from the command line at compile-time, then using this default rule results in its being erroneously interpreted as being defined with the value 0. It is not always practical to test if a macro is defined first before using it. However, for macros expected to be defined on the command line, it is worth the check because it is very easy to omit the macro definition if you are typing the compile command line manually. To further help avoid such problems use command procedures or scripts to compile code, particularly when numerous and lengthy include paths and macros are present on the command line.

It is possible for the constant expression to produce an error, for example, if division by 0 is encountered. (This is possible if a macro name used as a denominator has not been defined and defaults to 0.) Some implementations may flag this as an error, while others won't. Some may continue processing, assuming the value of the whole expression is 0.

Recommendation: Do not assume your implementation will generate an error if it determines the #if constant expression contains a mathematical error.

K&R does not include the! unary operator in the operators permitted within constant expressions. This is generally considered to be either an oversight or a typographical error.

The defined Operator

edit

Sometimes it is necessary to have nested conditional inclusion constructs such as

#ifdef VAX
    #ifdef VMS
        #ifdef DEBUG
            
        #endif
    #endif
#endif

This is supported by both K&R and Standard C. Standard  C (and some implementations prior to C89) provides the defined preprocessor unary operator to make this construct more elegant. For example,

#if defined(VAX) && defined(VMS) && defined(DEBUG)

#endif

Standard  C essentially reserves the identifier defined—it may not be used elsewhere as a macro name.

Recommendation: Do not use the defined operator unless all your environments support it.

The #elif Directive

edit

The following cumbersome construct is also commonly used in writing portable code. It is supported by K&R and Standard  C.

#if MAX >= TOTAL1
    
#else
    #if MAX >= TOTAL2
    
    #else
        #if MAX >= TOTAL3
            
        #else
            
        #endif
    #endif
#endif

The directive #elif simplifies nested #ifs greatly as follows.

#if MAX >= TOTAL1
    
#elif MAX >= TOTAL2
    
#elif MAX >= TOTAL3
    
#else
    
#endif

Recommendation: Do not use the #elif directive unless all your environments support it.

Nested Conditional Directives

edit

Standard C guarantees at least eight levels of nesting.

K&R states that these directives may be nested but gives no guaranteed minimum.

Recommendation: Use no more than two or three levels of nesting with conditional directives unless all of your implementations permit more.

Line Control

edit

The syntax of the #line directive is (ultimately) one of the following:

#line line-number
#line line-number filename

where the line number and filename are used to update the __LINE__ and __FILE__ predefined macros, respectively.

Standard C allows either a macro name or string literal in place of the filename. It also permits a macro in place of the line number, provided its value is a decimal-digit sequence (in which any leading zero is redundant and does not mean “octal”). In fact, any preprocessing token may follow #line provided that after macro expansion, one of the two forms is present.

Implementations differ in the value of __LINE__ if it is used in an item (preprocessor directive or macro invocation) that spans more than one physical line.

The Null Directive

edit

Standard C permits a null directive of the form

#

This directive has no effect and is typically found only in machine-generated code. While it has existed in implementations for many years, it was not defined in K&R.

The #pragma Directive

edit

#pragma was a C89 invention. The intent of this directive is to provide a mechanism for implementations to extend the preprocessor's syntax. This is possible because the preprocessor ignores any pragma it does not recognize. The syntax and semantics of a #pragma directive are implementation-defined, although the general format is

#pragma token-sequence

Possible uses of pragmas are to control compilation listing pagination and line format, to enable and disable optimization, and to activate and deactivate lint-like checking. Implementers can invent pragmas for whatever purpose they desire.

A pragma directive of the form

#pragma STDC token-sequence

is reserved for use by Standard C, such as the pragmas FP_CONTRACT, FENV_ACCESS, and CX_LIMITED_RANGE (all added by C99).

Pragma operator

edit

C99 added this unary, preprocessor-only operator, which has the following form:

_Pragma ( string-literal )

The #error Directive

edit

This is a C89 invention. Its format is

#error token-sequence

and it causes the implementation to generate a diagnostic message that includes the token sequence specified.

One possible use is to report on macros you expected to be defined, but which were found not to be. For example, you are porting code containing variable-length arrays (or threading), but the conditionally defined macros (Conditionally Defined Standard Macros) __STDC_NO_VLA__ (or __STDC_NO_THREADS__) is defined.

Non-Standard Directives

edit

Some implementations accept other preprocessor directives. As these extensions typically relate to the implementation's specific environment, they have little or no utility in other environments. Therefore, they must be identified in code that is to be ported and implemented in some other way, if at all.

Library Introduction

edit

Definition of Terms

edit

With K&R, a character had type char and a string was an array of char. C89 introduced the notion of multibyte strings and shift sequences, along with wide characters (having type wchar_t) and wide strings (having type wchar_t[]). The C89 library also included functions to process all of these, and subsequent editions of the standard added more headers and functions.

Prior to C89, the C library operated in a so-called “USA-English” mode, in which, for example, the decimal point used by printf was a period. C89 introduced the notion of a locale such that the traditional C environment up unto that time is defined by the "C" locale; it also defined the header <locale.h>. The behavior of some Standard C library functions is affected by the current locale; that is, they are locale-specific.

The Standard Headers

edit

Required Contents

edit

C89 defined the following headers: <assert.h>, <ctype.h>, <errno.h>, <float.h>, <limits.h>, <locale.h>, <math.h>, <setjmp.h>, <signal.h>, <stdarg.h>, <stddef.h>, <stdio.h>, <stdlib.h>, <string.h>, and <time.h>.

C95 added <iso646.h>, <wchar.h>, and <wctype.h>.

C99 added <complex.h>, <fenv.h>, <inttypes.h>, <stdbool.h>, <stdint.h>, and <tgmath.h>.

C11 added <stdalign.h>, <stdatomic.h>, <stdnoreturn.h>, <threads.h>, and <uchar.h>.

C17 added no new headers.

These headers are defined as having lowercase names and must be correctly located by a conforming implementation using the above spellings. Although some file systems support mixed-case file names, you should not spell the standard header names in any other way than defined by the standard.

Each header is self-contained. That is, it contains all the declarations and definitions needed to invoke the routines declared within it. That said, a header does not necessarily contain all the macro definitions whose value can be returned by its functions. For example, strtod in <stdlib.h> could return a value of HUGE_VAL, and ERANGE may be stored in errno; yet these macros are not defined in <stdlib.h>. To use them, both <errno.h> and <math.h> must be included as well.

In order to be self-contained, several headers define the same names (such as NULL and size_t).

All functions in the Standard C library are declared using function prototypes.

Standard headers may be included in any order and multiple times in the same scope without producing ill effects. The one exception is <assert.h>, which if included multiple times, can behave differently depending on the existence of the macro NDEBUG.

To be strictly conforming, Standard C prohibits a program from including a standard header from inside an external declaration or definition. This means you should not include a standard header from within a function because a function is an external definition.

Many of the prototypes for the standard library functions contain keywords and derived types invented or adopted by C89. These include const, fpos_t, size_t, and void *. Where these are applied to functions that have existed for a number of years, they remain compatible with calls to those functions in pre-C89 times.

Standard C requires that a hosted C implementation support all the standard headers defined for that edition of the standard. In C89, a freestanding implementation needed to provide only <float.h>, <limits.h>, <stdarg.h>, and <stddef.h>. C95 added <iso646.h>. C99 added <stdbool.h> and <stdint.h>. C11 added <stdalign.h> and <stdnoreturn.h>. C17 added no new requirements.

Optional Contents

edit

C11 added an annex called “Bounds-checking interfaces” that “specifies a series of optional extensions that can be useful in the mitigation of security vulnerabilities in programs, and comprise new functions, macros, and types declared or defined in existing standard headers.”

If an implementation defines the macro __STDC_LIB_EXT1__, it must provide all the optional extensions from that annex. These extensions apply to the following headers: <errno.h>, <stddef.h>, <stdint.h>, <stdio.h>, <stdlib.h>, <string.h>, <time.h>, and <wchar.h>.

An implementation that defines the macro __STDC_LIB_EXT1__ allows the associated library extensions to be excluded by having a program #define __STDC_WANT_LIB_EXT1__ to be 0 prior to #includeing a standard header containing such extensions. If instead __STDC_WANT_LIB_EXT1__ is defined to be 1, those extensions are enabled.

Reserved Identifiers

edit

All external identifiers declared in the standard headers are reserved, whether or not their associated header is referenced. That is, don't presume that just because you never include <time.h> that you can safely define your own external function called clock. Note that macros and typedef names are not included in this reservation because they are not external names.

External identifiers that begin with an underscore are reserved. All other library identifiers should begin with two underscores or an underscore followed by an uppercase letter.

Use of Library Functions

edit

Not all the Standard library routines validate their input arguments. In such cases, if you pass in an invalid argument, the behavior is undefined.

An implementation is permitted to implement any required routine as a macro, defined in the appropriate header, provided that macro expands “safely.” That is, no ill effects should be observed if arguments with side-effects are used. If you include a standard header, you should not explicitly declare any routine you plan to use from that header because any macro version of that routine defined in the header will cause your declaration to be expanded (probably incorrectly or producing syntax errors.)

You should take care when using the address of a library routine because it may currently be defined as a macro. Therefore, you should #undef that name first or reference it using (name) instead of just name. Note that it is possible to call both a macro and a function version of the same routine in the same scope without first having to use #undef.

When using a library routine, it is strongly suggested you include the appropriate header. If you choose not to, you should explicitly declare the function yourself using prototype notation, especially for routines such as printf, that take variable argument lists. The reason for doing this is that the compiler may pass arguments using a different mechanism when prototypes are used than when they are not. For example, with the correct prototype in scope, the compiler knows exactly how many arguments are expected and their types. And for fixed length argument lists, it may choose to pass the first two or three (or so) arguments in registers instead of on the stack. Therefore, if you compile your code without prototypes and the library is compiled with them, the linkage may fail.

Non-Standard Headers

edit

The de facto Standard C library originally provided with UNIX systems contains both general-purpose and operating system-specific routines. Almost all the general-purpose ones were adopted by C89, while most of the operating system-specific ones were picked up by the IEEE POSIX Committee. A few were wanted by both or neither groups, and were divided up amicably between the two groups. It should be noted that a (very) few macros and functions are defined and declared differently by both groups. In particular, their versions of <limits.h> are not identical. However, it is the intent of both standards groups that a C program be able to be both ISO C- and POSIX-conforming at the same time.

Numerous commonly provided headers are not included in the Standard C library. These include bios.h, conio.h, curses.h, direct.h, dos.h, fcntl/h, io.h, process.h, search.h, share.h, stat.h, sys/locking.h, sys/stat.h, sys/timeb.h, sys/types.h, and values.h.

Other headers whose names were not adopted by C89 have all or some of their capabilities made available through various Standard C headers. These include malloc.h, memory.h, and varargs.h, which were reborn in, or combined with, <stdlib.h>, <string.h>, and <stdarg.h>, respectively.

<assert.h> – Diagnostics

edit

C11 added support for static assertions (Static Assertions) part of which involved adding to this header a macro called static_assert.

C++ Consideration: The equivalent Standard C++ header is <cassert>.

Program diagnostics

edit

The assert Macro

edit

Standard C requires that assert be implemented as a macro, not as an actual function. If, however, that macro definition is #undefed to access an actual function, the behavior is undefined.

The format of the message output is implementation-defined. However, Standard C intends that the expression used as the argument to assert be output in its text form (as it exists in the source code) along with the source filename and line number (represented by __FILE__ and __LINE__, respectively) of the invocation of the failing assert. Specifically, the expression MAX - MIN should be output as 100 - 20, not 80 (assuming MAX is defined to be 100, and MIN, 20).

C89 required the argument passed to assert have type int. However, C99 broadened that to be any scalar type.

As assert is a macro, take care not to give it expressions that have side-effects—you cannot rely on the macro evaluating your expression only once.

<complex.h> – Complex Arithmetic

edit

C99 added this header and made support for complex types and operations optional.

The absence of the optional predefined macro __STDC_NO_COMPLEX__ indicates support for complex types and their associated arithmetic. Furthermore, the existence of the optional predefined macro __STDC_IEC_559_COMPLEX__ indicates that complex support conforms to IEC 60559, as described in an annex of the C Standard.

The following function names are reserved for possible future use by Standard C in this header: cerf, cerfc, cexp2, cexpm1, clog10, clog1p, clog2, clgamma, ctgamma, and those names with a suffix of f and l.

C++ Consideration: The equivalent Standard C++ header is <ccomplex>. Note that C++17 deprecated this header.

<ctype.h> – Character Handling

edit

In Standard C, all the functions made available via <ctype.h> take an int argument. However, the value passed must either be representable in an unsigned char or be the macro EOF. If an argument has any other value, the behavior is undefined.

C89 introduced the notion of a locale. By default, a C program runs in the "C" locale unless the setlocale function has been called (or the implementation's normal operating default locale is other than "C".) In the "C" locale, the ctype.h functions have the meaning they had prior to C89. When a locale other than "C" is selected, the set of characters qualifying for a particular character type test may be extended to include other implementation-defined characters. For example, implementations running in western Europe will likely include characters with diacritical marks, such as the umlaut, caret, and tilde. Therefore, whether ä, for example, tests true with isalpha is implementation-defined, based on the current locale.

Many implementations use a character representation that has more bits than are needed to represent the host character set; for example, 8-bit character systems supporting 7-bit ASCII. However, such implementations often support an extended character set using the otherwise unused bit(s). Also, a C programmer is at liberty to treat a char as a small integer, storing into it any bit pattern that will fit.

When a char contains a bit-pattern that represents something other than the machine's native character set, it should not be passed to a <ctype.h> function, unless permitted by your current locale. Even then, the results are implementation-defined. Also, you should determine whether a char is signed or not because an 8-bit char containing 0x80, for example, might be treated quite differently when it is signed versus unsigned.

Standard C requires that all <ctype.h> functions actually be implemented as functions. They may also be implemented as macros provided it is guaranteed they are safe macros. That is, their arguments are only evaluated once. Standard C allows #undef on any <ctype.h> name to get at the corresponding function version.

Recommendation: The actual value returned by the character testing functions when the argument tests true is implementation-defined. Therefore, you should use a logical, rather than an arithmetic, test on such values.

Standard C reserves all function names beginning with is or to followed by a lowercase letter (followed by any other identifier characters) for future additions to the run-time library.

The following functions have locale-specific behavior: isalpha, isblank, isgraph, islower, isprint, ispunct, isspace, isupper, toupper, and tolower.

Prior to C89, the following functions were widely available via this header: isascii, toascii, iscsym, and iscsymf. None of these are supported by Standard C.

C++ Consideration: The equivalent Standard C++ header is <cctype>.

Character Classification Functions

edit

The isalpha Function

edit

Use isalpha rather than something like the following:

if ((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z'))

because in some character sets (EBCDIC, for example) the upper- and lowercase letter groups do not occupy a contiguous range of internal values.

The isblank Function

edit

C99 added this function.

The islower Function

edit

See isalpha.

The isupper Function

edit

See isalpha.

Character Case Mapping Functions

edit

The tolower Function

edit

In non-"C" locales, the mapping from upper- to lowercase may not be one-for-one. For example, an uppercase letter might be represented as two lowercase letters taken together, or, perhaps, it may not even have a lowercase equivalent. Likewise, for toupper.

<errno.h> – Errors

edit

Historically, errno was declared as an extern int variable; however, Standard C requires that errno be a macro. (The macro could, however, expand to a call to a function of the same name.) Specifically, errno is a macro that expands to a modifiable lvalue of type int *. As such, errno could be defined as something like *_Errno(), where the implementation-supplied function _Errno returns a pointer to int. #undefing errno to try to get at the underlying object results in undefined behavior.

Various standard library functions are documented as setting errno to a nonzero value when certain errors are detected. Standard C requires that this value be positive. It also states that no library routine is required to clear errno (that is, giving it the value 0), and, certainly, you should never rely on a library routine doing so.

Historically, macros that define valid values for errno have been named starting with E. And although a set of names evolved for various systems, there was wide divergence on the spelling and meaning of some of those names. As a result, C89 defined only two macros: EDOM and ERANGE. C99 added EILSEQ. Additional macro definitions, beginning with E and a digit or an uppercase letter may be specified by a standards-conforming implementation.

Here are some common E* extension macros:

E2BIG /* arg list too long */
EACCES /* permission denied */
EAGAIN /* no more processes */
EBADF /* bad file number */
EBUSY /* mount device busy */
ECHILD /* no children */
EDEADLK /* deadlock avoided */
EEXIST /* file exists */
EFAULT /* bad address */
EFBIG /* file too large */
EINTR /* interrupted system call */
EINVAL /* invalid argument */
EIO /* i/o error */
EISDIR /* is a directory */
EMFILE /* too many open files */
EMLINK /* too many links */
ENFILE /* file table overflow */
ENODEV /* no such device */
ENOENT /* no such file or directory */
ENOEXEC /* exec format error */
ENOLCK /* no locks available */
ENOMEM /* not enough core */
ENOSPC /* no space left on device */
ENOTBLK /* block device required */
ENOTDIR /* not a directory */
ENOTTY /* not a typewriter */
ENXIO /* no such device or address */
EPERM /* not owner */
EPIPE /* broken pipe */
EROFS /* read-only file system */
ESPIPE /* illegal seek */
ESRCH /* no such process */
ETXTBSY /* text file busy */
EXDEV /* cross-device link */

See Optional Contents for requirements than an implementation may need to add to this header to support the annex called “Bounds-checking interfaces” added by C11.

C++ Consideration: The equivalent Standard C++ header is <cerrno>.

<fenv.h> – Floating-Point Environment

edit

C99 added this header.

Standard C reserves all macro names beginning with FE_ followed by an uppercase letter for future additions to this header.

C++ Consideration: The equivalent Standard C++ header is <cfenv>.

<float.h> – Characteristics of floating types

edit

This header defines the floating-point characteristics of the target system via a series of macros whose values are largely implementation-defined.

As of C17, almost all the macros were defined in C89. The exceptions are DECIMAL_DIG and FLT_EVAL_METHOD, added in C99; and FLT_DECIMAL_DIG, DBL_DECIMAL_DIG, LDBL_DECIMAL_DIG, FLT_HAS_SUBNORM, DBL_HAS_SUBNORM, LDBL_HAS_SUBNORM, FLT_TRUE_MIN, DBL_TRUE_MIN, and LDBL_TRUE_MIN, added in C11.

Although many systems use IEEE-754 format for floating-point types, when C89 was being developed, there were three other formats in common use, all of which were accommodated by C89.

Standard C defines the values -1 through 3 for the macro FLT_ROUNDS. All other values specify implementation-defined rounding behavior.

Standard C defines the values -1 through 2 for the macro FLT_EVAL_METHOD. All other negative values for FLT_EVAL_METHOD characterize implementation-defined behavior. See Floating Constants regarding the possible impact of this macro’s value on floating constants.

C++ Consideration: The equivalent Standard C++ header is <cfloat>.

<inttypes.h> – Format Conversion of Integer Types

edit

C99 added this header.

Standard C reserves all macro names beginning with PRI or SCN followed by a lowercase letter or X for future additions to this header.

C++ Consideration: The equivalent Standard C++ header is <cinttypes>.

<iso646.h> – Alternative Spellings

edit

C95 added this header.

C++ Consideration: The equivalent Standard C++ header is <ciso646>. The macros defined by Standard C in this header are keywords in Standard C++.

<limits.h> – Numerical Limits

edit

This header defines the integer characteristics of the target system via a series of macros whose values are largely implementation-defined.

Almost all the macros were defined in C89. The exceptions are LLONG_MIN, LLONG_MAX, and ULLONG_MAX, which were added in C99.

C++ Consideration: The equivalent Standard C++ header is <climits>.

<locale.h> – Localization

edit

Almost all the members of the type struct lconv were defined by C89. The exceptions are int_p_cs_precedes, int_n_cs_precedes, int_p_sep_by_space, int_n_sep_by_space, int_p_sign_posn, and int_n_sign_posn, which were added in C99. Implementations may add other members.

Standard C has reserved the space of names beginning with LC_ followed by an uppercase letter for use by implementations, so they may add extra locale subcategory macros.

The locales defined by Standard C are "C" and "", the latter being the locale-specific native environment. All other the strings used to identify all other locales are implementation-defined.

C++ Consideration: The equivalent Standard C++ header is <clocale>.

Locale Control

edit

The setlocale Function

edit

If you modify the contents of the string returned by setlocale, the behavior is undefined.

<math.h> – Mathematics

edit

C99 added the types float_t and double_t; the macros FP_FAST_FMA, FP_FAST_FMAF, FP_FAST_FMAL, FP_ILOGB0, FP_ILOGBNAN, FP_INFINITE, FP_NAN, FP_NORMAL, FP_SUBNORMAL, FP_ZERO, HUGE_VALF, HUGE_VALL, INFINITY, MATH_ERREXCEPT, math_errhandling, MATH_ERRNO, and NAN; some function-like macros, and many functions. C99 also added the FP_CONTRACT pragma.

The macros EDOM and ERANGE returned by some math function require <errno.h>.

In C89, the math function names created by adding a suffix of f or l were reserved for implementations of float and long double versions, respectively. However, a conforming implementation was required to support only the double set. Starting with C99, all three versions must be provided.

In the case of the float set, these functions must be called in the presence of an appropriate prototype; otherwise, float arguments will be widened to double. (Note though that specifying float in a prototype does not necessarily force such widening to be disabled; this aspect of prototypes is implementation-defined. However, it is necessary when supporting the float set.)

With the introduction of math_errhandling in C99, errno need not be set in certain circumstances.

A domain error occurs if an input argument is outside the domain over which the mathematical function is defined. In this case, an implementation-defined value is returned, and, prior to C99, errno is set to the macro EDOM.

A range error occurs if the result of the function cannot be represented as a double. If the result overflows, the function returns the value of HUGE_VAL, with the same sign as the correct value would have. Prior to C99, errno is set to the macro ERANGE. If the result underflows, the function returns 0 and errno may or may not be set to ERANGE, as the implementation defines.

C++ Consideration: The equivalent Standard C++ header is <cmath>.

<setjmp.h> – Non-Local Jumps

edit

Standard C requires that jmp_buf be an array of suitable size to store the “current program context,” whatever that may be. C99 added that this context, “does not include the state of the floating-point status flags, of open files, or of any other component of the abstract machine.”

C++ Consideration: The equivalent Standard C++ header is <csetjmp>.

Save calling environment

edit

The setjmp Macro

edit

Standard C states, “It is unspecified whether setjmp is a macro or an identifier declared with external linkage. If a macro definition is suppressed in order to access an actual function, or a program defines an external identifier with the name setjmp, the behavior is undefined.”

If setjmp is invoked outside the contexts defined by Standard C, the behavior is undefined.

Restore Calling Environment

edit

The longjmp Function

edit

If longjmp attempts to restore to a context that was never saved by setjmp, the result is undefined.

If longjmp attempts to restore to a context and the parent function, which called setjmp to save that context initially, has terminated, the results are undefined.

The behavior is undefined if longjmp is invoked from a nested signal handler. Do not invoke longjmp from an exit handler, such as those registered by the atexit function.

<signal.h> – Signal Handling

edit

C89 added type sig_atomic_t.

Standard C reserves names of the form SIG* and SIG_*, where * represents the trailing part of an identifier that begins with an uppercase letter, for other kinds of signals. The complete set of signals available in a given implementation, their semantics, and their default handling is implementation-defined.

C++ Consideration: The equivalent Standard C++ header is <csignal>.

Specify signal handling

edit

The signal Function

edit

signal returns a value equal to SIG_ERR if it cannot perform the requested operation. Prior to C89, signal returned -1. Do not test explicitly for a -1 return value—use the macro SIG_ERR instead. Always test the return value from signal—do not assume it did exactly as you requested.

Usually, when a signal is detected and given off to a handler, that signal will be handled in the “default” manner when next it occurs. That is, you must explicitly call signal to reset the signal mechanism from within the signal handler if you wish to continue to trap and handle the signal. (This is required by Standard C except in the case of SIGILL, where it is implementation-defined as to whether the signal is reset automatically.)

If a call to signal from within a handler returns SIG_ERR, the value of errno is indeterminate. In other circumstances, SIG_ERR is returned, and errno contains a positive value whose possible values are implementation-defined.

During program startup, an implementation is at liberty to specify that selected signals be ignored or handled by default means as appropriate. That is, the initial state of signal handling is implementation-defined.

Standard C makes no statement about the behavior when a second signal for the same handler occurs before the first is processed.

<stdalign.h> – Alignment

edit

C11 added this header.

C++ Consideration: The equivalent Standard C++ header is <cstdalign>. Note that C++17 deprecated this header.

<stdarg.h> – Variable Arguments

edit

This header was a C89 invention modeled closely on the UNIX <varargs.h> header. As Standard C uses a slightly different approach, the new header <stdarg.h> was defined rather than retaining <varargs.h> with a changed meaning.

C++ Consideration: The equivalent Standard C++ header is <cstdarg>.

Variable Argument List Access Macros

edit

The va_arg Macro

edit

Standard C requires that va_arg be a macro. If it is the subject of #undef, and an actual function of the same name is used instead, the behavior is undefined. It is unspecified whether va_end is a macro or a function.

The va_copy Macro

edit

C99 added this facility.

Standard C states, “It is unspecified whether va_copy is a macro or identifier declared with external linkage. If a macro definition is suppressed in order to access an actual function, or a program defines an external identifier with the same name, the behavior is undefined.”

The va_end Macro

edit

Standard C states, “It is unspecified whether va_end is a macro or identifier declared with external linkage. If a macro definition is suppressed in order to access an actual function, or a program defines an external identifier with the same name, the behavior is undefined.”

The va_start Macro

edit

Standard C requires that va_start be a macro. If it is the subject of #undef, and an actual function of the same name is used instead, the behavior is undefined.

If register is used with the second argument of va_start, or that argument has type function or array, the behavior is undefined.

<stdatomic> – Atomics

edit

C11 added this header.

Standard C reserves the following names for future addition to this header:

  • Macro names beginning with ATOMIC_ followed by an uppercase letter

  • Type names beginning with atomic_ or memory_, followed by a lowercase letter

  • For the memory_order type, enumeration constants beginning with memory_order_ followed by a lowercase letter

  • Function names beginning with atomic_ followed by a lowercase letter

C17 deprecated the use of the macro ATOMIC_VAR_INIT.

C++ Consideration: There is no equivalent header.

<stdbool.h> – Boolean Type and Values

edit

C99 added the type specifier _Bool and the corresponding header <stdbool.h>, which defines the type synonym bool and the macros true, false, and __bool_true_false_are_defined.

C++ Consideration: C++11 added <cstdbool>, to emulate <stdbool.h>’s behavior. C++17 changed the header name to <stdbool.h>, as used by Standard C. However, note that C++17 deprecated this header.

How can we write code that uses a Boolean type and port it across multiple C compilers that do and don’t support this header, or to a C++ compiler? We never use the C99 type _Bool and we don’t explicitly #include <stdbool.h>; we only ever use the names bool, true, and false. Here’s the relevant code to achieve this:

#ifndef __cplusplus /* in C mode, so no bool, true, and false keywords */
    #ifndef __bool_true_false_are_defined /* <stdbool.h> has not been #included */
        #ifdef true /* complain if any homegrown true macro defined */
            #error "A macro called >true< is defined"
        #else
            #ifdef false /* complain if any homegrown false macro defined */
                #error "A macro called >false< is defined"
            #else
                #ifdef bool /* complain if any homegrown bool macro defined */
                    #error "A macro called >bool< is defined"
                #else
                    #if __STDC_VERSION__ >= 199901L /* If <stdbool.h> exists #include it */
                        #include <stdbool.h>
                    #else
                        typedef int bool;
                        #define true 1
                        #define false 0
                        #define __bool_true_false_are_defined 1
                    #endif
                #endif
            #endif
        #endif
    #endif
#else /* in C++ mode, so have bool, true, and false keywords */
    #ifdef true /* complain if any homegrown true macro defined */
        #error "A macro called >true< is defined"
    #endif
    #ifdef false /* complain if any homegrown false macro defined */
        #error "A macro called >false< is defined"
    #endif
    #ifdef bool /* complain if any homegrown bool macro defined */
        #error "A macro called >bool< is defined"
    #endif
#endif

C++ Consideration: The equivalent Standard C++ header is <cstdbool>.

<stddef.h> – Common Definitions

edit

C89 added this header as a repository for several miscellaneous macro definitions and types. The macros are NULL and offsetof, and the types are ptrdiff_t, size_t, and wchar_t. C11 added max_align_t. All but NULL were C89 C inventions.

If the second argument to offsetof is a bit-field, the behavior is undefined.

See Optional Contents for requirements than an implementation may need to add to this header to support the annex called “Bounds-checking interfaces” added by C11.

C++ Consideration: The equivalent Standard C++ header is <cstddef>.

<stdint.h> – Integer Types

edit

C99 added this header.

Standard C reserves the following names for future addition to this header:

  • Macro names beginning with INT or UINT, and ending with _MAX, _MIN, or _C

  • Type names beginning with int or uint, and ending with _t

C++ Consideration: The equivalent Standard C++ header is <cstdint>.

<stdio.h> – Input/Output

edit

Files and File Systems

edit

Many aspects of file and directory systems are implementation-dependent. So much so that Standard C cannot even make a statement about the most basic thing, a filename. Just what filenames can and must an implementation support? And as for directory and device names, there is nothing close to a common approach. And while there are standard header names, they need not map directly to filenames of the same spelling.

Some implementations may permit filenames to contain wildcards. That is, the file specifier may refer to a group of files using a convention such as *.dat to refer to all files with a type of.dat. None of the standard I/O routines is required to support such a notion.

Numerous operating systems can limit the number of open files on a per user basis. Note, too, that not all systems permit multiple versions of the same filename in the same directory, and this has consequences when you use fopen with "w" mode, for example.

Some file systems also place disk quotas on users such that an I/O operation may fail when a file grows too big—you may not know this until an output operation fails.

Back to the filename issue. After extensive investigation, the Standard C committee found that the format of a portable filename is up to six alphabetic characters followed by a period and none or one letter. And given that some file systems are case-sensitive, these alphabetic characters should all be the same case. However, rather than restrict yourself to filenames of the lowest common denominator, you can use conditional compilation directives to deal with platform-specific file systems.

The whole concept of filename redirection at the command-line level is also implementation-dependent. If possible, it means that printf and fscanf, for example, may actually be dealing with devices other than the user's terminal. They could even be dealing with files. Note that gets behaves slightly differently to fgets from stdin, yet gets could be reading from a file if stdin were redirected.

The details of file buffering, disk sector sizes, etc., are also implementation-dependent. However, Standard C requires an implementation to be able to handle text files with lines containing at least 254 characters, including the trailing new-line.

On some systems, stdin, stdout, and stderr are special to the operating system and are maintained by it. On other systems, these may be established during program startup. Whether these files go against your maximum open file limit is implementation-dependent.

The macros BUFSIZ, FOPEN_MAX, FILENAME_MAX, and TMP_MAX expand to implementation-defined values.

See Optional Contents for requirements than an implementation may need to add to this header to support the annex called “Bounds-checking interfaces” added by C11.

C++ Consideration: The equivalent Standard C++ header is <cstdio>.

Operations on Files

edit

The remove Function

edit

On many systems, the file is actually deleted. However, it may be that you are removing a synonym for a file's name, rather than deleting the file itself. In such cases, when the last synonym is being removed, the file is typically deleted.

If the file being removed is currently open, the behavior is implementation-defined. (In a shared file system, another program may be accessing the file you are removing.)

The rename Function

edit

Standard C states that the old filename is removed (as if it had been the subject of a call to remove). Presumably, this permits a filename synonym to be renamed as well. As old is removed, if old is currently open, the behavior is implementation-defined. (In a shared file system, another program may be accessing the file you are renaming.)

If a file called with the new name already exists, the behavior is implementation-defined.

A file system with a hierarchical (or other) directory structure might not directly permit renaming of files across directories. In these cases, the rename might fail, or the file might actually be copied and the original removed. Standard C hints that if a file copy is needed, rename could fail; however, it does not require it to.

The tmpfile Function

edit

If the program terminates abnormally, the temporary file might not be removed.

The location and attributes (directory name, file name, access permission, etc.) of the file created are implementation-defined.

The tmpnam Function

edit

If you call tmpnam more than TMP_MAX times, the behavior is implementation-defined.

tmpnam has no way to communicate an error so if you give it a non-NULL address that points to an area smaller than L_tmpnam characters, the behavior is undefined.

While the filename is guaranteed to be unique at the time tmpnam was called, a file by that name may have been created before you get a chance to use it. If this is likely to be a problem, use tmpfile instead. And then if you need to open the file in a mode other than "wb+", use setvbuf or setbuf to change it.

The filename may include directory information. If it does, the name and attributes of the directory are implementation-defined.

File Access Functions

edit

The fclose Function

edit

If a program terminates abnormally, there is no guarantee that streams open for output will have their buffers flushed.

On some implementations it may not be possible to close an empty file successfully and have it retained by the file system—you might first have to write something to it.

The fflush Function

edit

If the stream was not open for output, or it was open for update with the immediately previous operation being other than output, the behavior is undefined. However, some implementations permit input streams to be fflushed reliably.

If a program terminates abnormally, there is no guarantee that streams open for output will have their buffers flushed.

It is permissible to flush the “special” files stdout and stderr. While Standard C states that flushing input files (including stdin) produces undefined behavior, some implementations permit it.

The fopen Function

edit

Some implementations may have difficulty seeking within text files; in which case, specifying mode '+' may also imply mode 'b'.

Some file systems permit only one version of a file by any given name; in which case, opening in 'w' mode will cause that file to be overwritten. On other systems, a new version of the file may be created.

The set, and meaning, of mode characters following the sequences is implementation-defined. Other mode characters might be provided by your implementation to specify various file attributes.

C11 added exclusive mode x.

Some file systems append trailing '\0' characters to the end of a binary file when it is closed. Subsequently, when you open such files for append, you may be positioned beyond the end of the last character you wrote.

A file is opened with full buffering only if the implementation can determine that it is not an interactive device.

If fopen succeeds, it returns a FILE pointer to the opened stream. On failure, it returns NULL. Note that an implementation may limit the number of currently open files—FOPEN_MAX specifies the number permitted—in which case fopen will fail if you attempt to exceed this number. Standard C does not specify whether errno is set.

The freopen Function

edit

If freopen succeeds, it returns the value of stream; otherwise, it returns NULL. Standard C does not specify if errno is set.

The setbuf Function

edit

setbuf returns no value. The responsibility is on the programmer to make sure stream points to an open file and that buf is either NULL or a pointer to a sufficiently large buffer.

Standard C does not require an implementation to be able to implement each of these types of buffering. And so, an implementation is at liberty to treat one or more of these buffering types to be equivalent. Therefore, there is not guarantee that setbuf will be able to honor your request, even though no error code can be returned.

The setvbuf Function

edit

mode may be one of the following: _IOFBF (fully buffered), _IOLBF (line buffered), or _IONBF (no buffering). Standard C requires that setvbuf accept these modes, although the underlying implementation need not be able to implement each of these types of buffering. And so, an implementation is at liberty to treat one or more of these buffering types to be equivalent.

When the programmer supplies the buffer, its contents are indeterminate at any particular time. (Standard C does not actually require an implementation to use the programmer's buffer if one is supplied.) The user-supplied buffer must remain in existence as long as the stream is open so take care if you use a buffer of class auto.

setvbuf returns zero on success and nonzero on failure. A failure could result from an invalid value for mode or for some other reason. Standard C does not specify that errno is set on error.

The size of buffers allocated by setvbuf is implementation-defined although some implementations of setvbuf use size to determine the size of the internal buffer used as well.

Formatted Input/Output Functions

edit

The fprintf Function

edit

Standard C defines the common output formatting behavior of the *printf family of functions under fprintf with all other family member descriptions pointing there.

If there are insufficient arguments for the format, the behavior is undefined.

C89 added the conversion specifiers i, n, and p. p outputs the value of the void pointer using an implementation-defined format.

C99 added the conversion specifiers F, a, and A, and the length modifiers hh, ll, j, t, and z. It also added support for infinities and NaNs using an implementation-defined format.

If a conversion specification is invalid, the behavior is undefined. (Note that K&R stated that any specification not recognized was treated as text and passed through to stdout. For example, %@ produced @.) Standard C has reserved all unused lowercase conversion specifiers for its own use in future versions.

The behavior is undefined if any argument is, or points to, a union, a structure, or an array except for arrays with %s and void pointers with %p.

Calling this function without having the appropriate prototype in scope results in undefined behavior.

The fscanf Function

edit

Standard C defines the common input formatting behavior of the *scanf family of functions under fscanf with all other family member descriptions pointing there.

If there are insufficient arguments for the format, the behavior is undefined.

C89 added the conversion specifiers i, n, and p. p expects an argument of type pointer to void, in an implementation-defined format.

C99 added the length modifiers hh, ll, j, t, and z. It also added support for infinities and NaNs.

If a conversion specification is invalid, the behavior is undefined. Standard C has reserved all unused lowercase conversion specifiers for its own use in future versions.

If an error occurs, EOF is returned. Standard C makes no mention of errno being set.

Calling this function without having the appropriate prototype in scope results in undefined behavior.

The printf Function

edit

The output formatting issues mentioned in fprintf also apply to this function.

Calling this function without having the appropriate prototype in scope results in undefined behavior.

printf maintains an internal buffer into which it builds the formatted string, and this buffer has a finite length. Historically, this length has been implementation-defined and not always documented, and implementations have varied widely. Standard C requires that an implementation be able to handle any single conversion of at least 509 characters.

The scanf Function

edit

The input formatting issues mentioned in fscanf also apply to this function.

Calling this function without having the appropriate prototype in scope results in undefined behavior.

The snprintf Function

edit

C99 added this function.

The output formatting issues mentioned in fprintf also apply to this function.

Calling this function without having the appropriate prototype in scope results in undefined behavior.

The sprintf Function

edit

The output formatting issues mentioned in fprintf also apply to this function.

Calling this function without having the appropriate prototype in scope results in undefined behavior.

The sscanf Function

edit

The input formatting issues mentioned in fscanf also apply to this function.

Calling this function without having the appropriate prototype in scope results in undefined behavior.

The vfprintf Function

edit

C99 added this function.

The output formatting issues mentioned in fprintf also apply to this function.

The vfscanf Function

edit

C99 added this function.

The input formatting issues mentioned in fscanf also apply to this function.

The vprintf Function

edit

C99 added this function.

The output formatting issues mentioned in fprintf also apply to this function.

The vscanf Function

edit

The input formatting issues mentioned in fscanf also apply to this function.

The vsnprintf Function

edit

C99 added this function.

The output formatting issues mentioned in fprintf also apply to this function.

The vsprintf Function

edit

C99 added this function.

The output formatting issues mentioned in fprintf also apply to this function.

The vsscanf Function

edit

C99 added this function.

The input formatting issues mentioned in fscanf also apply to this function.

Character Input/Output Functions

edit

The gets Function

edit

C11 removed this function.

The ungetc Function

edit

C99 deprecated the use of this function at the beginning of a binary file.

Direct Input/Output Functions

edit

The fread Function

edit

If an error occurs, the file position indicator's value is indeterminate.

If a partial field is read, its value is indeterminate.

Standard C makes no statement about the possible translation of CR/LF pairs to new-lines on input, although some implementations do so for text files.

The fwrite Function

edit

If an error occurs, the file position indicator's value is indeterminate.

Standard C makes no statement about the possible translation of new-lines to CR/LF pairs on output, although some implementations do so for text files.

File Positioning Functions

edit

The fgetpos Function

edit

C89 added this function.

On failure, a nonzero value is returned, and errno is set to an implementation-defined positive value.

The fsetpos Function

edit

C89 added this function.

On failure, a nonzero value is returned, and errno is set to an implementation-defined positive value.

Error-Handling Functions

edit

The perror Function

edit

The contents and format of the message are implementation-defined.

<stdlib.h> – General Utilities

edit

C89 defined this header.

C99 added the type lldiv_t.

The macros EXIT_SUCCESS and EXIT_FAILURE are Standard C inventions and are used as the implementation-defined success and failure exit code values used with exit.

Standard C reserves all function names beginning with str followed by a lowercase letter for future addition to this header.

See Optional Contents for requirements than an implementation may need to add to this header to support the annex called “Bounds-checking interfaces” added by C11.

C++ Consideration: The equivalent Standard C++ header is <cstdlib>.

Numeric Conversion Functions

edit

Standard C does not require atof, atoi and atol to set errno if an error occurs. If an error does occur, the behavior is undefined.

The atol Function

edit

C99 added this function.

The strtod Function

edit

The format of the floating-point number is locale-specific.

The strtof Function

edit

C99 added this function.

The format of the floating-point number is locale-specific.

The strtol Function

edit

The format of the integral value is locale-specific.

The strtoll Function

edit

C99 added this function.

The format of the integral value is locale-specific.

The strtold Function

edit

C99 added this function.

The format of the floating-point number is locale-specific.

The strtoul Function

edit

The format of the integral value is locale-specific.

The strtoull Function

edit

C99 added this function.

The format of the integral value is locale-specific.

Pseudo-Random Sequence Generation Functions

edit

The rand Function

edit

Standard C requires RAND_MAX to be at least 32767.

Memory Management Functions

edit

NULL is returned if the space requested cannot be allocated. NEVER, EVER assume an allocation request succeeds without checking for a NULL return value.

If a zero amount of space is requested, the behavior is implementation-defined, and either NULL or a unique pointer is returned.

The size of the heap available and the details of its management and manipulation are implementation-specific.

The aligned_alloc Function

edit

C11 added this function.

The calloc Function

edit

The space allocated is initialized to “all-bits-zero.” Note that this is not guaranteed to be the same representation as floating-point zero or a null pointer.

The free Function

edit

If ptr is NULL, free does nothing. Otherwise, if ptr is not a value previously returned by one of these three allocation functions, the behavior is undefined.

The value of a pointer that refers to space that has been freed is indeterminate, and such pointers should not be dereferenced.

Note that free has no way to communicate an error if one is detected.

The malloc Function

edit

The initial value of the space allocated is unspecified.

The realloc Function

edit

If ptr is NULL, realloc behaves like malloc. Otherwise, if ptr is not a value previously returned by calloc, malloc, or realloc, the behavior is undefined. The same is true if ptr points to space that has been freed.

Communication with the Environment

edit

The abort Function

edit

It is implementation-defined as to whether or not output streams are flushed, open streams are closed, or temporary files are removed.

The exit code of the program is some implementation-defined value that represents “failure.” It is generated by a call to raise using the argument SIGABRT.

The atexit Function

edit

Standard C requires that at least 32 functions can be registered. However, to get around any limitations in this regard, you can always register just one function and have it call the others directly. This way, the other functions can also have argument lists and return values.

The at_quick_exit Function

edit

C11 added this function.

The _Exit Function

edit

C99 added this function.

The getenv Function

edit

The environment list is maintained by the host environment, and the set of names available is implementation-specific.

The behavior is undefined if you attempt to modify the contents of the string pointed to by the return value.

Some implementations supply a third argument to main, called envp. envp is an array of pointers to char (just like argv) with each pointer pointing to an environment string. Standard C does not include this.

The quick_exit Function

edit

C11 added this function.

The system Function

edit

Standard C does not require that a command-line processor (or equivalent) exist, in which case an implementation-defined value is returned. To ascertain whether such an environment exists, call system with a NULL argument; if a nonzero value is returned, a command-line processor is available.

The format of the string passed is implementation-defined.

Searching and Sorting Utilities

edit

The bsearch Function

edit

If two members compare as equal, it is unspecified as to which member is matched.

The qsort Function

edit

If two members compare as equal, it is unspecified as to their order in the array.

Integer Arithmetic Functions

edit

The abs Function

edit

The behavior is undefined if the result cannot be represented.

abs could be implemented as a macro.

The div Function

edit

If the result cannot be represented, the behavior is undefined.

C89 added this function.

The labs Function

edit

The behavior is undefined if the result cannot be represented.

labs could be implemented as a macro.

The ldiv Function

edit

If the result cannot be represented, the behavior is undefined.

C89 added this function.

The llabs Function

edit

C17 added this function.

The lldiv Function

edit

If the result cannot be represented, the behavior is undefined.

C99 added this function.

Multibyte Character Functions

edit

The behavior of these functions is subject to the current locale, in particular, to the LC_CTYPE category.

Initial support for multibyte character processing was added by C89.

<stdnoreturn.h> – _Noreturn

edit

C11 added this header.

C++ Consideration: There is no equivalent header.

<string.h> – String Handling

edit

An implementation is at liberty to place certain alignment considerations on any of C's data types. Presumably, any copy you make in memory of such an aligned object should itself also be aligned appropriately. If this is not the case, it is possible that the created copy might not be accessible, or it may be misinterpreted. It is the programmer's responsibility to ensure the resultant object copy is in a format and memory location suitable for further and meaningful use.

Standard C reserves all function names beginning with str, mem, or wcs followed by a lowercase letter for future addition to this header.

See Optional Contents for requirements than an implementation may need to add to this header to support the annex called “Bounds-checking interfaces” added by C11.

C++ Consideration: The equivalent Standard C++ header is <cstring>.

Copying Functions

edit

The memcpy Function

edit

If the two strings overlap, the behavior is undefined.

The memmove Function

edit

C89 added this function.

The strcpy Function

edit

If the two strings overlap, the behavior is undefined.

The strncpy Function

edit

If the two strings overlap, the behavior is undefined.

Concatenation Functions

edit

The strcat Function

edit

If the two strings overlap, the behavior is undefined.

The strncat Function

edit

If the two strings overlap, the behavior is undefined.

Comparison Functions

edit

Recommendation: All the comparison functions return an integer indicating less than, greater than, or equal to zero. Do not assume the positive or negative values indicating greater than and less than, respectively, have any predictable value. Always compare the return value against zero, never against a specific nonzero value.

The strcoll Function

edit

The comparison is locale-specific.

C89 added this function.

The strxfrm Function

edit

C89 added this function.

The strstr Function

edit

C89 added this function.

Miscellaneous Functions

edit

The strerror Function

edit

The contents of the text of the message returned is implementation-defined.

The programmer should not attempt to write to the location pointed to by the returned value.

<tgmath.h> – Type-Generic Math

edit

C99 added this header.

C++ Consideration: The equivalent Standard C++ header is <ctgmath>. Note that C++17 deprecated this header.

<threads.h> – Threads

edit

C11 added this header.

If a C implementation supports the keyword _Thread_Local (see the conditionally defined macro __STDC_NO_THREADS__ mentioned in Conditionally Defined Standard Macros), it will also provide the header <threads.h>. As such, rather than using the keyword directly, do the following:

#include <threads.h>

void f()
{
    thread_local static int tlsI = 0;
    

where thread_local is a macro defined in that header as _Thread_Local, and that matches the equivalent C++ keyword.

Standard C reserves function names, type names, and enumeration constants beginning with cnd_, mtx_, thrd_, or tss_, followed by a lowercase letter, as possible additions to this header.

C++ Consideration: There is no equivalent header.

<time.h> – Date and Time

edit

Standard C reserves all macro names beginning with TIME_ followed by an uppercase letter for future addition to this header.

See Optional Contents for requirements than an implementation may need to add to this header to support the annex called “Bounds-checking interfaces” added by C11.

C++ Consideration: The equivalent Standard C++ header is <ctime>.

Components of Time

edit

C99 replaced the macro CLK_TCK with CLOCKS_PER_SEC.

C11 added the macro TIME_UTC, and the type struct timespec.

C11 added the members tv_sec and tv_nsec to struct tm.

Time Manipulation Functions

edit

The difftime Function

edit

C89 added this function.

The mktime Function

edit

C89 added this function.

The timespecget Function

edit

C11 added this function.

Time Conversion Functions

edit

The strftime Function

edit

C89 added this function.

C99 added the following conversion specifiers: C, D, e, F, g, G, h, n, r, R, t, T, u, V, and z.

<uchar.h> – Unicode Utilities

edit

C11 added this header.

C++ Consideration: The equivalent Standard C++ header is <cuchar>.

<wchar.h> – Extended Multibyte and Wide Character Utilities

edit

C95 added this header.

Standard C reserves all function names beginning with wcs followed by a lowercase letter for future addition to this header.

See Optional Contents for requirements than an implementation may need to add to this header to support the annex called “Bounds-checking interfaces” added by C11.

C++ Consideration: The equivalent Standard C++ header is <cwchar>.

<wctype.h> – Wide Character Classification and Mapping Utilities

edit

C95 added this header.

Standard C reserves all function names beginning with is or to followed by a lowercase letter for future addition to this header.

C++ Consideration: The equivalent Standard C++ header is <cwctype>.