Coding style conventions

edit

The use of a guide or set of convention gives programmers a set of rules for code normalization or coding style that establishes how to format code, name variables, place comments or any other non language dependent structural decision that is used on the code. This is very important, as you share a project with others. Agreeing to a common set of coding standards and recommendations saves time and effort, by enabling a greater understanding and transparency of the code base, providing a common ground for undocumented structures, making for easy debugging, and increasing code maintainability. These rules may also be referred to as Source Code Style, Code Conventions, Coding Standards or a variation of those.

Many organizations have published C++ style guidelines. A list of different approaches can be found on the C++ coding conventions Reference Section. The most commonly used style in C++ programming is ANSI or Allman while much C programming is still done in the Kernighan and Ritchie (K&R) style. You should be warned that this should be one of the first decisions you make on a project and in a democratic environment, a consensus can be very hard to achieve.

Video of the presentation of Bjarne Stroustrup about Style Conventions (not only C++14) https://github.com/isocpp/CppCoreGuidelines and link to the guidelines: https://github.com/isocpp/CppCoreGuidelines%7CCppCoreGuidelines Video of the presentation of Herb Sutter about Style Conventions (not only C++14) https://www.youtube.com/watch?v=hEx5DNLWGgA link for GSL (Guidelines Support Library) https://github.com/Microsoft/GSL


Programmers tend to stick to a coding style, they have it automated and any deviation can be very hard to conform with, if you don't have a favorite style try to use the smallest possible variation to a common one or get as broad a view as you can get, permitting you to adapt easily to changes or defend your approach. There is software that can help to format or beautify the code, but automation can have its drawbacks. As seen earlier, indentation and the use of white spaces or tabs are completely ignored by the compiler. A coding style should vary depending on the lowest common denominator of the needs to standardize.

Another factor, even if yet to a minimal degree, for the selection of a coding style convention is the IDE (or the code editor) and its capabilities, this can have for instance an influence in determining how verbose code should be, the maximum length of lines, etc. Some editors now have extremely useful features like word completion, refactoring functionalities and other that can make some specifications unnecessary or outright outdated. This will make the adoption of a coding style dependent also on the target code user available software.

Field impacted by the selection of a Code Style are:

  • Re-usability
    • Self documenting code
    • Internationalization
    • Maintainability
    • Portability
  • Optimization
  • Build process
  • Error avoidance
  • Security
Standardization is important

No matter which particular coding style you pick, once it is selected, it should be kept throughout the same project. Reading code that follows different styles can become very difficult. In the next sections we try to explain why some of the options are common practice without forcing you to adopt a specific style.

Note:
Using a bad Coding Style is worse than having no Coding Style at all, since you will be extending bad practices to all the code base.

25 lines 80 columns

edit

This rule is a commonly recommended, but often countered with argument that the rule is outdated. The rule originates from the time when text-based computer terminals and dot-matrix printers often could display at most 80 columns of text. As such, greater than 80-column text would either inconveniently wrap to the next line, or worse, not display at all.

The physical limitations of the devices asides, this rule often still suggested under the argument that if you are writing code that will go further than 80 columns or 25 lines, it's time to think about splitting the code into functions. Smaller chunks of encapsulated code helps in reviewing the code as it can be seen all at once without scrolling up or down. This modularizes, and thus eases, the programmer mental representation of the project. This practice will save you precious time when you have to return to a project you haven't been working on for 6 months.

For example, you may want to split long output statements across multiple lines:

    fprintf(stdout,"The quick brown fox jumps over the lazy dog. "
                   "The quick brown fox jumps over the lazy dog.\n"
                   "The quick brown fox jumps over the lazy dog - %d", 2);


This recommended practice relates also to the 0 means success convention for functions, that we will cover on the Functions Section of this book.

Whitespace and indentation

edit

Note:
Spaces, tabs and newlines (line breaks) are called whitespace. Whitespace is required to separate adjacent words and numbers; they are ignored everywhere else except within quotes and preprocessor directives

Conventions followed when using whitespace to improve the readability of code is called an indentation style. Every block of code and every definition should follow a consistent indention style. This usually means everything within { and }. However, the same thing goes for one-line code blocks.

Use a fixed number of spaces for indentation. Recommendations vary; 2, 3, 4, 8 are all common numbers. If you use tabs for indention you have to be aware that editors and printers may deal with, and expand, tabs differently. The K&R standard recommends an indentation size of 4 spaces.

The use of tab is controversial, the basic premise is that it reduces source code portability, since the same source code loaded into different editors with distinct setting will not look alike. This is one of the primary reasons why some programmers prefer the consistency of using spaces (or configure the editor to replace the use of the tab key with the necessary number of spaces).

For example, a program could as well be written using as follows:

// Using an indentation size of 2
if ( a > 5 )  { b=a; a++; }

However, the same code could be made much more readable with proper indentation:

// Using an indentation size of 2
if ( a > 5 )  {
  b = a;
  a++;
}

// Using an indentation size of 4
if ( a > 5 )
{
    b = a;
    a++;
}

Placement of braces (curly brackets)

edit

As we have seen early on the Statements Section, compound statements are very important in C++, they also are subject of different coding styles, that recommend different placements of opening and closing braces ({ and }). Some recommend putting the opening brace on the line with the statement, at the end (K&R). Others recommend putting these on a line by itself, but not indented (ANSI C++). GNU recommends putting braces on a line by itself, and indenting them half-way. We recommend picking one brace-placement style and sticking with it.

Examples:

if (a > 5) {
  // This is K&R style
}

if (a > 5) 
{
  // This is ANSI C++ style
}

if (a > 5) 
  {
    // This is GNU style
  }

Comments

edit

Comments are portions of the code ignored by the compiler which allow the user to make simple notes in the relevant areas of the source code. Comments come either in block form or as single lines.

  • Single-line comments (informally, C++ style), start with // and continue until the end of the line. If the last character in a comment line is a \ the comment will continue in the next line.
  • Multi-line comments (informally, C style), start with /* and end with */.

Note:
Since the 1999 revision, C also allows C++ style comments, so the informal names are largely of historical interest that serves to make a distinction of the two methods of commenting.

We will now describe how a comment can be added to the source code, but not where, how, and when to comment; we will get into that later.

C style comments

edit

If you use C style comments, try to use it like this:

Comment single line:

/*void EventLoop(); /**/

Comment multiple lines:

/*
void EventLoop();
void EventLoop();
/**/

This allows you to easily uncomment. For example:

Uncomment single line:

void EventLoop(); /**/

Uncomment multiple lines:

void EventLoop();
void EventLoop();
/**/

Note:
Some compilers may generate errors/warnings.
Try to avoid using C style inside a function because of the non nesting facility of C style (most editors now have some sort of coloring ability that prevents this kind of error, but it was very common to miss it, and you shouldn't make assumptions on how the code is read).

... by removing only the start of comment and so activating the next one, you did re-activate the commented code, because if you start a comment this way it will be valid until it finds the close of comment */.

Note:
Remember that C-style comments /* like this */ do not "nest", i.e., you can't write

int function() /* This is a comment */
{              
 return 0;  
}              and this is the same comment */
               so this isn't in the comment, and will give an error*/

because of the text so this is not in the comment */ at the end of the line, which is not inside the comment; the comment ends at the first */ sequence it finds, ignoring any interim /* sequence, which might look to human readers like the start of a nested comment.

C++ style comments

edit

Examples:

// This is a single one line comment

or

if (expression) // This needs a comment
{
  statements;   
}
else
{
  statements;
}

The backslash is a continuation character and will continue the comment to the following line:

// This comment will also comment the following line \
std::cout << "This line will not print" << std::endl;
Using comments to temporarily ignore code

Comments are also sometimes used to enclose code that we temporarily want the compiler to ignore. This can be useful in finding errors in the program. If a program does not give the desired result, it might be possible to track which particular statement contains the error by commenting out code.

Example with C style comments
/* This is a single line comment */

or

/*
   This is a multiple line comment
*/
C and C++ style

Combining multi-line comments (/* */) with c++ comments (//) to comment out multiple lines of code:

Commenting out the code:

/*
void EventLoop();
void EventLoop();
void EventLoop();
void EventLoop();
void EventLoop();
//*/

uncommenting the code chunk

//*
void EventLoop();
void EventLoop();
void EventLoop();
void EventLoop();
void EventLoop();
//*/

This works because a //* is still a c++ comment. And //*/ acts as a c++ comment and a multi-line comment terminator. However this doesn't work if there are any multi-line comments are used for function descriptions.

Note on doing it with preprocessor statements

Another way (considered bad practice) is to selectively enable disable sections of code:

#if(0)   // Change this to 1 to uncomments.
void EventLoop();
#endif

this is considered a bad practice because the code often becomes illegible when several #if's are mixed, if you use them don't forget to add a comment at the #endif saying what #if it correspond

#if (FEATURE_1 == 1)
do_something;
#endif //FEATURE_1 == 1

you can prevent illegibility by using inline functions (often considered better than macros for legibility with no performance cost) containing only 2 sections in #if #else #endif

inline do_test()
  {
    #if (Feature_1 == 1)
      do_something
    #endif  //FEATURE_1 == 1
  }

and call

do_test();

in the program

Note:
The use of one-line C-style comments should be avoided as they are considered outdated. Mixing C and C++ style single-line comments is considered poor practice. One exception, that is commonly used, is to disable a specific part of code in the middle of a single line statement for test/debug purposes, in release code any need for such action should be removed.

Naming identifiers

edit

C++'s restriction about the names of identifiers and its keywords have already been covered, on the Code Section. They leave a lot of freedom in naming, one could use specific prefixes or suffixes, start names with an initial upper or lower case letter, keep all the letters in a single case or, with compound words, use a word separator character like "_" or flip the case of the first letter of each component word.

Note:
It is also important to remember to avoid collisions with the OS's APIs (depending on the portability requirements) or other standards. For instance POSIX's keywords terminate in "_t".

Hungarian notation
edit

Hungarian notation, now also referred to as Apps Hungarian, was invented by Charles Simonyi (a programmer who worked at Xerox PARC circa 1972-1981, and who later became Chief Architect at Microsoft); and has been until recently the preeminent naming convention used in most Microsoft code. It uses prefixes (like "m_" to indicate member variables and "p" to indicate pointers), while the rest of the identifier is normally written out using some form of mixed capitals. We mention this convention because you will very probably find it in use, even more probable if you do any programming in Windows, if you are interested on learning more you can check Wikipedia's entry on this notation.

This notation is considered outdated, since it is highly prone to errors and requires some effort to maintain without any real benefit in today's IDEs. Today refactoring is an everyday task, the IDEs have evolved to provide help with identifier pop-ups and the use of color schemes. All these informational aids reduce the need for this notation.

Leading underscores
edit

In most contexts, leading underscores are better avoided. They are reserved for the compiler or internal variables of a library, and can make your code less portable and more difficult to maintain. Those variables can also be stripped from a library (i.e. the variable is not accessible anymore, it is hidden from external world) so unless you want to override an internal variable of a library, do not do it.

Reusing existing names
edit

Do not use the names of standard library functions and objects for your identifiers as these names are considered reserved words and programs may become difficult to understand when used in unexpected ways.

Sensible names
edit

Always use good, unabbreviated, correctly-spelled meaningful names.

Prefer the English language (since C++ and most libraries already use English) and avoid short cryptic names. This will make it easier to read and to type a name without having to look it up.

Note:
It is acceptable to ignore this rule for loop variables and variables used within a small scope (~20 lines), they may be given short names to save space if the purpose of that variable is obvious enough. Historically the most commonly used variable name in this cases is "i".

The "i" may derive from the word "increment" or "index". The "i" is very commonly found in for loops that does fit nicely the specification for the use of such variable names.

In early Fortran compilers (upto F77), variables beginning with the letters i through n implicitly represented integers - and by convention the first few (i, j, k) were often used as loop counters.

Names indicate purpose
edit

An identifier should indicate the function of the variable/function/etc. that it represents, e.g. foobar is probably not a good name for a variable storing the age of a person.

Identifier names should also be descriptive. n might not be a good name for a global variable representing the number of employees. However, a good medium between long names and lots of typing has to be found. Therefore, this rule can be relaxed for variables that are used in a small scope or context. Many programmers prefer short variables (such as i) as loop iterators.

Capitalization
edit

Conventionally, variable names start with a lower case character. In identifiers which contain more than one natural language words, either underscores or capitalization is used to delimit the words, e.g. num_chars (K&R style) or numChars (Java style). It is recommended that you pick one notation and do not mix them within one project.

Constants
edit

When naming #defines, constant variables, enum constants. and macros put in all uppercase using '_' separators; this makes it very clear that the value is not alterable and in the case of macros, makes it clear that you are using a construct that requires care.

Note:
There is a large school of thought that names LIKE_THIS should be used only for macros, so that the name space used for macros (which do not respect C++ scopes) does not overlap with the name space used for other identifiers. As is usual in C++ naming conventions, there is not a single universally agreed standard. The most important thing is usually to be consistent.

Functions and member functions
edit

The name given to functions and member functions should be descriptive and make it clear what it does. Since usually functions and member functions perform actions, the best name choices typically contain a mix of verbs and nouns in them such as CheckForErrors() instead of ErrorCheck() and dump_data_to_file() instead of data_file(). Clear and descriptive names for functions and member functions can sometimes make guessing correctly what functions and member functions do easier, aiding in making code more self documenting. By following this and other naming conventions programs can be read more naturally.

People seem to have very different intuitions when using names containing abbreviations. It is best to settle on one strategy so the names are absolutely predictable. Take for example NetworkABCKey. Notice how the C from ABC and K from key are confused. Some people do not mind this and others just hate it so you'll find different policies in different code so you never know what to call something.

Prefixes and suffixes are sometimes useful:

  • Min - to mean the minimum value something can have.
  • Max - to mean the maximum value something can have.
  • Cnt - the current count of something.
  • Count - the current count of something.
  • Num - the current number of something.
  • Key - key value.
  • Hash - hash value.
  • Size - the current size of something.
  • Len - the current length of something.
  • Pos - the current position of something.
  • Limit - the current limit of something.
  • Is - asking if something is true.
  • Not - asking if something is not true.
  • Has - asking if something has a specific value, attribute or property.
  • Can - asking if something can be done.
  • Get - get a value.
  • Set - set a value.
Examples
edit

In most contexts, leading underscores are also better avoided. For example, these are valid identifiers:

  • i loop value
  • numberOfCharacters number of characters
  • number_of_chars number of characters
  • num_chars number of characters
  • get_number_of_characters() get the number of characters
  • get_number_of_chars() get the number of characters
  • is_character_limit() is this the character limit?
  • is_char_limit() is this the character limit?
  • character_max() maximum number of a character
  • charMax() maximum number of a character
  • CharMin() minimum number of a character

These are also valid identifiers but can you tell what they mean?:

  • num1
  • do_this()
  • g()
  • hxq

The following are valid identifiers but better avoided:

  • _num as it could be used by the compiler/system headers
  • num__chars as it could be used by the compiler/system headers
  • main as there is potential for confusion
  • cout as there is potential for confusion

The following are not valid identifiers:

  • if as it is a keyword
  • 4nums as it starts with a digit
  • number of characters as spaces are not allowed within an identifier


Explicitness or implicitness

edit

This can be defended both ways. If defaulting to implicitness, this means less typing but also may create wrong assumptions on the human reader and for the compiler (depending on the situation) to do extra work, on the other hand if you write more keywords and are explicit on your intentions the resulting code will be clearer and reduces errors (enabling hidden errors to be found), or more defined (self documented) but this may also lead to added limitations to the code's evolution (like we will see with the use of const). This is a thin line where an equilibrium must be reached in accord to the projects nature, and the available capabilities of the editor, code completion, syntax coloring and hovering tooltips reduces much of the work. The important fact is to be consistent as with any other rule.

The choice of using of inline even if the member function is implicitly inlined.

const

edit

Unless you plan on modifying it, you're arguably better off using const data types. The compiler can easily optimize more with this restriction, and you're unlikely to accidentally corrupt the data. Ensure that your methods take const data types unless you absolutely have to modify the parameters. Similarly, when implementing accessors for private member data, you should in most cases return a const. This will ensure that if the object that you're operating on is passed as const, methods that do not affect the data stored in the object still work as they should and can be called. For example, for an object containing a person, a getName() should return a const data type where as walk() might be non-const as it might change some internal data in the Person such as tiredness.

It is common practice to avoid using the typedef keyword since it can obfuscate code if not properly used or it can cause programmers to accidentally misuse large structures thinking them to be simple types. If used, define a set of rules for the types you rename and be sure to document them.

volatile

edit

This keyword informs the compiler that the variable it is qualifying as volatile (can change at anytime) is excluded from any optimization techniques. Usage of this variable should be reserved for variables that are known to be modified due to an external influence of a program (whether it's hardware update, third party application, or another thread in the application).

Since the volatile keyword impacts performance, you should consider a different design that avoids this situation: most platforms where this keyword is necessary provide an alternative that helps maintain scalable performance.

Note that using volatile was not intended to be used as a threading or synchronization primitive, nor are operations on a volatile variable guaranteed to be atomic.

Pointer declaration

edit

Due to historical reasons some programmers refer to a specific use as:

// C code style
int *z;

// C++ code style
int* z;

The second variation is by far the preferred by C++ programmers and will help identify a C programmer or legacy code.

One argument against the C++ code style version is when chaining declarations of more than one item, like:

// C code style
int *ptrA, *ptrB;

// C++ code style
int* ptrC, ptrD;

As you can see, in this case, the C code style makes it more obvious that ptrA and ptrB are pointers to ints, and the C++ code style makes it less obvious that ptrD is an int, not a pointer to an int.

It is rare to use chains of multiple objects in C++ code with the exception of the basic types and even so it is not often used and it is extremely rare to see it used in pointers or other complex types, since it will make it harder to for a human to visually parse the code.

// C++ code style
int* ptrC;
int D;

References

edit