C++ Programming
Type
editSo far we explained that internally data is stored in a way the hardware can read as zeros and ones, bits. That data is conceptually divided and labeled in accordance to the number of bits in each set. We must explain that since data can be interpreted in a variety of sets according to established formats as to represent meaningful information. This ultimately required that the programmer is capable of differentiate to the compiler what is needed, this is done by using the different types.
A variable can refer to simple values like integers called a primitive type or to a set of values called a composite type that are made up of primitive types and other composite types. Types consist of a set of valid values and a set of valid operations which can be performed on these values. A variable must declare what type it is before it can be used in order to enforce value and operation safety and to know how much space is needed to store a value.
Major functions that type systems provide are:
- Safety - types make it impossible to code some operations which cannot be valid in a certain context. This mechanism effectively catches the majority of common mistakes made by programmers. For example, an expression "Hello, Wikipedia"/1 is invalid because a string literal cannot be divided by an integer in the usual sense. As discussed below, strong typing offers more safety, but it does not necessarily guarantee complete safety (see type-safety for more information).
- Optimization - static type checking might provide useful information to a compiler. For example, if a type says a value is aligned at a multiple of 4, the memory access can be optimized.
- Documentation - using types in languages also improves documentation of code. For example, the declaration of a variable as being of a specific type documents how the variable is used. In fact, many languages allow programmers to define semantic types derived from primitive types; either composed of elements of one or more primitive types, or simply as aliases for names of primitive types.
- Abstraction - types allow programmers to think about programs in higher level, not bothering with low-level implementation. For example, programmers can think of strings as values instead of a mere array of bytes.
- Modularity - types allow programmers to express the interface between two subsystems. This localizes the definitions required for interoperability of the subsystems and prevents inconsistencies when those subsystems communicate.
Data types
editType | Size in Bits | Comments | Alternate Names |
---|---|---|---|
Primitive Types | |||
char | ≥ 8 |
|
— |
signed char | same as char |
|
— |
unsigned char |
same as char |
|
— |
short | ≥ 16, ≥ size of char |
|
short int, signed short, signed short int |
unsigned short |
same as short |
|
unsigned short int
|
int | ≥ 16, ≥ size of short |
|
signed, signed int |
unsigned int |
same as int |
|
unsigned
|
long | ≥ 32, ≥ size of int |
|
long int, signed long, signed long int |
unsigned long |
same as long |
|
unsigned long int
|
bool | ≥ size of char, ≤ size of long |
|
— |
wchar_t | ≥ size of char, ≤ size of long |
|
— |
float | ≥ size of char |
|
— |
double | ≥ size of float |
|
— |
long double | ≥ size of double |
|
— |
User Defined Types | |||
struct or class | ≥ sum of size of each member |
|
— |
union | ≥ size of the largest member |
|
— |
enum | ≥ size of char |
|
— |
typedef |
same as the type being given a name |
|
— |
template | ≥ size of char | — | — |
Derived Types[4] | |||
type& (reference) |
≥ size of char |
|
— |
type* (pointer) |
≥ size of char |
|
— |
type [integer] (array) |
≥ integer × size of type |
|
— |
type (comma-delimited list of types/declarations) (function) |
— |
|
— |
type aggregate_type::* (member pointer) |
≥ size of char |
|
— |
[1] -128 can be stored in two's-complement machines (i.e. almost all machines in existence). In other memory models (e.g. 1's complement) a smaller range is possible, e.g. -127 ←→ +127. | ||
[2] -32768 can be stored in two's-complement machines (i.e. most machines in existence). | ||
[3] -2147483648 can be stored in two's-complement machines (i.e. most machines in existence). | ||
[4] The precedences in a declaration are: | [], () (left associative) | — Highest |
&, *, ::* (right associative) | — Lowest |
Standard types
editThere are five basic primitive types called standard types, specified by particular keywords, that store a single value. These types stand isolated from the complexities of class type variables, even if the syntax of utilization at times brings them all in line, standard types do not share class properties (i.e.: don't have a constructor).
The type of a variable determines what kind of values it can store:
- bool - a boolean value: true; false
- int - Integer: -5; 10; 100
- char - a character in some encoding, often something like ASCII, ISO-8859-1 ("Latin 1") or ISO-8859-15: 'a', '=', 'G', '2'.
- float - floating-point number: 1.25; -2.35*10^23
- double - double-precision floating-point number: like float but more decimals
The float and double primitive data types are called 'floating point' types and are used to represent real numbers (numbers with decimal places, like 1.435324 and 853.562). Floating point numbers and floating point arithmetic can be very tricky, due to the nature of how a computer calculates floating point numbers.
Definition vs. declaration
editThere is an important concept, the distinction between the declaration of a variable and its definition, two separated steps involved in the use of variables. The declaration announces the properties (the type, size, etc.), on the other hand the definition causes storage to be allocated in accordance to the declaration.
Variables as function, classes and other constructs that require declarations may be declared many times, but each may only be defined one time.
This concept will be further explained and with some particulars noted (such as inline
) as we introduce other components. Here are some examples, some include concepts not yet introduced, but will give you a broader view:
int an_integer; // defines an_integer
extern const int a = 1; // defines a
int function( int b ) { return b+an_integer; } // defines function and defines b
struct a_struct { int a; int b; }; // defines a_struct, a_struct::a, and a_struct::b
struct another_struct { // defines another_struct
int a; // defines nonstatic data member a
static int b; // declares static data member b
another_struct(): a(0) { } }; // defines a constructor of another_struct
int another_struct::b = 1; // defines another_struct::b
enum { right, left }; // defines right and left
namespace FirstNamespace { int a; } // defines FirstNamespace and FirstNamespace::a
namespace NextNamespace = FirstNamespace ; // defines NextNamespace
another_struct MySruct; // defines MySruct
extern int b; // declares b
extern const int c; // declares c
int another_function( int ); // declares another_function
struct aStruct; // declares aStruct
typedef int MyInt; // declares MyInt
extern another_struct yet_another_struct; // declares yet_another_struct
using NextNamespace::a; // declares NextNamespace::a
Declaration
editC++ is a statically typed language. Hence, any variable cannot be used without specifying its type. This is why the type figures in the declaration. This way the compiler can protect you from trying to store a value of an incompatible type into a variable, e.g. storing a string in an integer variable. Declaring variables before use also allows spelling errors to be easily detected. Consider a variable used in many statements, but misspelled in one of them. Without declarations, the compiler would silently assume that the misspelled variable actually refers to some other variable. With declarations, an "Undeclared Variable" error would be flagged. Another reason for specifying the type of the variable is so the compiler knows how much space in memory must be allocated for this variable.
The simplest variable declarations look like this (the parts in []s are optional):
[specifier(s)] type variable_name [ = initial_value];
To create an integer variable for example, the syntax is
int sum;
where sum is the name you made up for the variable. This kind of statement is called a declaration. It declares sum as a variable of type int, so that sum can store an integer value. Every variable has to be declared before use and it is common practice to declare variables as close as possible to the moment where they are needed. This is unlike languages, such as C, where all declarations must precede all other statements and expressions.
In general, you will want to make up variable names that indicate what you plan to do with the variable. For example, if you saw these variable declarations:
char firstLetter;
char lastLetter;
int hour, minute;
you could probably make a good guess at what values would be stored in them. This example also demonstrates the syntax for declaring multiple variables with the same type in the same statement: hour and minute are both integers (int type). Notice how a comma separates the variable names.
int a = 123;
int b (456);
Those lines also declare variables, but this time the variables are initialized to some value. What this means is that not only is space allocated for the variables but the space is also filled with the given value. The two lines illustrate two different but equivalent ways to initialize a variable. The assignment operator '=' in a declaration has a subtle distinction in that it assigns an initial value instead of assigning a new value. The distinction becomes important especially when the values we are dealing with are not of simple types like integers but more complex objects like the input and output streams provided by the iostream class.
The expression used to initialize a variable need not be constant. So the lines:
int sum;
sum = a + b;
can be combined as:
int sum = a + b;
or:
int sum (a + b);
Declare a floating point variable 'f' with an initial value of 1.5:
float f = 1.5 ;
Floating point constants should always have a '.' (decimal point) somewhere in them. Any number that does not have a decimal point is interpreted as an integer, which then must be converted to a floating point value before it is used.
For example:
double a = 5 / 2;
will not set a to 2.5 because 5 and 2 are integers and integer arithmetic will apply for the division, cutting off the fractional part. A correct way to do this would be:
double a = 5.0 / 2.0;
You can also declare floating point values using scientific notation. The constant .05 in scientific notation would be . The syntax for this is the base, followed by an e, followed by the exponent. For example, to use .05 as a scientific notation constant:
double a = 5e-2;
Below is a program storing two values in integer variables, adding them and displaying the result:
// This program adds two numbers and prints their sum.
#include <iostream>
int main()
{
int a;
int b;
int sum;
sum = a + b;
std::cout << "The sum of " << a << " and " << b << " is " << sum << "\n";
return 0;
}
or, if you like to save some space, the same above statement can be written as:
// This program adds two numbers and prints their sum, variation 1
#include <iostream>
#include <ostream>
using namespace std;
int main()
{
int a = 123, b (456), sum = a + b;
cout << "The sum of " << a << " and " << b << " is " << sum << endl;
return 0;
}
The register keyword is a request to the compiler that the specified variable is to be stored in a register of the processor instead of memory as a way to gain speed, mostly because it will be heavily used. The compiler may ignore the request.
The keyword fell out of common use when compilers became better at most code optimizations than humans. Any valid program that uses the keyword will be semantically identical to one without it, unless they appear in a stringized macro (or similar context), where it can be useful to ensure that improper usage of the macro will cause a compile-time error. This keywords relates closely to auto
.
register int x=99;
Modifiers
editThere are several modifiers that can be applied to data types to change the range of numbers they can represent.
const
editA variable declared with this specifier cannot be changed (as in read only). Either local or class-level variables (scope) may be declared const indicating that you don't intend to change their value after they're initialized. You declare a variable as being constant using the const keyword. Global const variables have static linkage. If you need to use a global constant across multiple files the best option is to use a special header file that can be included across the project.
const unsigned int DAYS_IN_WEEK = 7 ;
declares a positive integer constant, called DAYS_IN_WEEK, with the value 7. Because this value cannot be changed, you must give it a value when you declare it. If you later try to assign another value to a constant variable, the compiler will print an error.
int main(){
const int i = 10;
i = 3; // ERROR - we can't change "i"
int &j = i; // ERROR - we promised not to
// change "i" so we can't
// create a non-const reference
// to it
const int &x = i; // fine - "x" is a const
// reference to "i"
return 0;
}
The full meaning of const is more complicated than this; when working through pointers or references, const can be applied to mean that the object pointed (or referred) to will not be changed via that pointer or reference. There may be other names for the object, and it may still be changed using one of those names so long as it was not originally defined as being truly const.
It has an advantage for programmers over #define command because it is understood by the compiler, not just substituted into the program text by the preprocessor, so any error messages can be much more helpful.
With pointers it can get messy...
T const *p; // p is a pointer to a const T
T *const p; // p is a const pointer to T
T const *const p; // p is a const pointer to a const T
If the pointer is a local, having a const pointer is useless. The order of T and const can be reversed:
const T *p;
is the same as
T const *p;
volatile
editA hint to the compiler that a variable's value can be changed externally; therefore the compiler must avoid aggressive optimization on any code that uses the variable.
Unlike in Java, C++'s volatile specifier does not have any meaning in relation to multi-threading. Standard C++ does not include support for multi-threading (though it is a common extension) and so variables needing to be synchronized between threads need a synchronization mechanisms such as mutexes to be employed, keep in mind that volatile implies only safety in the presence of implicit or unpredictable actions by the same thread (or by a signal handler in the case of a volatile sigatomic_t object). Accesses to mutable volatile variables and fields are viewed as synchronization operations by most compilers and can affect control flow and thus determine whether or not other shared variables are accessed, this implies that in general ordinary memory operations cannot be reordered with respect to a mutable volatile access. This also means that mutable volatile accesses are sequentially consistent. This is not (as yet) part of the standard, it is under discussion and should be avoided until it gets defined.
mutable
editThis specifier may only be applied to a non-static, non-const member variables. It allows the variable to be modified within const member functions.
mutable is usually used when an object might be logically constant, i.e., no outside observable behavior changes, but not bitwise const, i.e. some internal member might change state.
The canonical example is the proxy pattern. Suppose you have created an image catalog application that shows all images in a long, scrolling list. This list could be modeled as:
class image {
public:
// construct an image by loading from disk
image(const char* const filename);
// get the image data
char const * data() const;
private:
// The image data
char* m_data;
}
class scrolling_images {
image const* images[1000];
};
Note that for the image class, bitwise const and logically const is the same: If m_data changes, the public function data() returns different output.
At a given time, most of those images will not be shown, and might never be needed. To avoid having the user wait for a lot of data being loaded which might never be needed, the proxy pattern might be invoked:
class image_proxy {
public:
image_proxy( char const * const filename )
: m_filename( filename ),
m_image( 0 )
{}
~image_proxy() { delete m_image; }
char const * data() const {
if ( !m_image ) {
m_image = new image( m_filename );
}
return m_image->data();
}
private:
char const* m_filename;
mutable image* m_image;
};
class scrolling_images {
image_proxy const* images[1000];
};
Note that the image_proxy does not change observable state when data() is invoked: it is logically constant. However, it is not bitwise constant since m_image changes the first time data() is invoked. This is made possible by declaring m_image mutable. If it had not been declared mutable, the image_proxy::data() would not compile, since m_image is assigned to within a constant function.
short
editThe short specifier can be applied to the int data type. It can decrease the number of bytes used by the variable, which decreases the range of numbers that the variable can represent. Typically, a short int is half the size of a regular int -- but this will be different depending on the compiler and the system that you use. When you use the short specifier, the int type is implicit. For example:
short a;
is equivalent to:
short int a;
long
editThe long specifier can be applied to the int and double data types. It can increase the number of bytes used by the variable, which increases the range of numbers that the variable can represent. A long int is typically twice the size of an int, and a long double can represent larger numbers more precisely. When you use long by itself, the int type is implied. For example:
long a;
is equivalent to:
long int a;
The shorter form, with the int implied rather than stated, is more idiomatic (i.e., seems more natural to experienced C++ programmers).
Use the long specifier when you need to store larger numbers in your variables. Be aware, however, that on some compilers and systems the long specifier may not increase the size of a variable. Indeed, most common 32-bit platforms (and one 64-bit platform) use 32 bits for int and also 32 bits for long int.
The unsigned
keyword is a data type specifier, that makes a variable only represent non-negative integer numbers (positive numbers and zero). It can be applied only to the char
, short
,int
and long
types. For example, if an int
typically holds values from -32768 to 32767, an unsigned int
will hold values from 0 to 65535. You can use this specifier when you know that your variable will never need to be negative. For example, if you declared a variable 'myHeight' to hold your height, you could make it unsigned because you know that you would never be negative inches tall.
signed
editThe signed specifier makes a variable represent both positive and negative numbers. It can be applied only to the char, int and long data types. The signed specifier is applied by default for int and long, so you typically will never use it in your code.
The static keyword can be used in four different ways:
- to create permanent storage for local variables in a function.
- to specify internal linkage.
- to declare member functions that act like non-member functions.
- to create a single copy of a data member.
Permanent storage
editUsing the static modifier makes a variable have static lifetime and on global variables makes them require internal linkage (variables will not be accessible from code of the same project that resides in other files).
- static lifetime
- Means that a static variable will need to be initialized in the file scope and at run time, will exist and maintain changes across until the program's process is closed, the particular order of destruction of static variables is undefined.
static
variables instances share the same memory location. This means that they keep their value between function calls. For example, in the following code, a static variable inside a function is used to keep track of how many times that function has been called:
void foo() {
static int counter = 0;
cout << "foo has been called " << ++counter << " times\n";
}
int main() {
for( int i = 0; i < 10; ++i ) foo();
}
Enumerated data type
editIn programming it is often necessary to deal with data types that describe a fixed set of alternatives. For example, when designing a program to play a card game it is necessary to keep track of the suit of an individual card.
One method for doing this may be to create unique constants to keep track of the suit. For example one could define
const int Clubs=0;
const int Diamonds=1;
const int Hearts=2;
const int Spades=3;
int current_card_suit=Diamonds;
Unfortunately there are several problems with this method. The most minor problem is that this can be a bit cumbersome to write. A more serious problem is that this data is indistinguishable from integers. It becomes very easy to start using the associated numbers instead of the suits themselves. Such as:
int current_card_suit=1;
...and worse to make mistakes that may be very difficult to catch such as a typo...
current_card_suit=11;
...which produces a valid expression in C++, but would be meaningless in representing the card's suit.
One way around these difficulty is to create a new data type specifically designed to keep track of the suit of the card, and restricts you to only use valid possibilities. We can accomplish this using an enumerated data type using the C++ enum
keyword.
The enum
keyword is used to create an enumerated type named name that consists of the elements in name-list. The var-list argument is optional, and can be used to create instances of the type along with the declaration.
- Syntax
enum name {name-list} var-list;
For example, the following code creates the desired data type:
enum card_suit {Clubs,Diamonds,Hearts,Spades};
card_suit first_cards_suit=Diamonds;
card_suit second_cards_suit=Hearts;
card_suit third_cards_suit=0; //Would cause an error, 0 is an "integer" not a "card_suit"
card_suit forth_cards_suit=first_cards_suit; //OK, they both have the same type.
The line of code creates a new data type "card_suit
" that may take on only one of four possible values: "Clubs
", "Diamonds
", "Hearts
", and "Spades
". In general the enum
command takes the form:
enum new_type_name { possible_value_1,
possible_value_1,
/* ..., */
possible_value_n
} Optional_Variable_With_This_Type;
While the second line of code creates a new variable with this data type and initializes it to value to Diamonds
". The other lines create new variables of this new type and show some initializations that are (and are not) possible.
Internally enumerated types are stored as integers, that begin with 0 and increment by 1 for each new possible value for the data type.
enum apples { Fuji, Macintosh, GrannySmith };
enum oranges { Blood, Navel, Persian };
apples pie_filling = Navel; //error can't make an apple pie with oranges.
apples my_fav_apple = Macintosh;
oranges my_fav_orange = Navel; //This has the same internal integer value as my_favorite_apple
//Many compilers will produce an error or warning letting you know your comparing two different quantities.
if(my_fav_apple == my_fav_orange)
std::cout << "You shouldn't compare apples and oranges" << std::endl;
While enumerated types are not integers, they are in some case converted into integers. For example, when we try to send an enumerated type to standard output.
For example:
enum color {Red, Green, Blue};
color hair=Red;
color eyes=Blue;
color skin=Green;
std::cout << "My hair color is " << hair << std::endl;
std::cout << "My eye color is " << eyes << std::endl;
std::cout << "My skin color is " << skin << std::endl;
if (skin==Green)
std::cout << "I am seasick!" << std::endl;
Will produce the output:
My hair color is 0 My eye color is 2 My skin color is 1 I am seasick!
We could improve this example by introducing an array that holds the names of our enumerated type such as:
std::string color_names[3]={"Red", "Green", "Blue"};
enum color {Red, Green, Blue};
color hair=Red;
color eyes=Blue;
color skin=Green;
std::cout << "My hair color is " << color_names[hair] << std::endl;
std::cout << "My eye color is " << color_names[eyes] << std::endl;
std::cout << "My skin color is " << color_names[skin] << std::endl;
In this case hair is automatically converted to an integer when it is index arrays. This technique is intimately tied to the fact that the color Red is internally stored as "0", Green is internally stored as "1", and Blue is internally stored as "2". Be Careful! One may override these default choices for the internal values of the enumerated types.
This is done by simply setting the value in the enum
such as:
enum color {Red=2, Green=4, Blue=6};
In fact it is not necessary to an integer for every value of an enumerated type. In the case the value, the compiler will simply increase the value of the previous possible value by one.
Consider the following example:
enum colour {Red=2, Green, Blue=6, Orange};
Here the internal value of "Red
" is 2, "Green
" is 3, "Blue
" is 6 and "Orange
is 7.
Be careful to keep in mind when using this that the internal values do not need to be unique.
Enumerated types are also automatically converted into integers in arithmetic expressions. Which makes it useful to be able to choose particular integers for the internal representations of an enumerated type.
One may have enumerated for the width and height of a standard computer screen. This may allow a program to do meaningful calculations, while still maintaining the benefits of an enumerated type.
enum screen_width {SMALL=800, MEDIUM=1280};
enum screen_height {SMALL=600, MEDIUM=768};
screen_width MyScreenW=SMALL;
screen_height MyScreenH=SMALL;
std::cout << "The number of pixels on my screen is " << MyScreenW*MyScreenH << std::endl;
It should be noted that the internal values used in an enumerated type are constant, and cannot be changed during the execution of the program.
It is perhaps useful to notice that while the enumerated types can be converted to integers for the purpose arithmetic, they cannot be iterated through.
For example:
enum month { JANUARY=1, FEBRUARY, MARCH, APRIL, MAY, JUNE, JULY, AUGUST, SEPTEMBER, OCTOBER, NOVEMBER, DECEMBER};
for( month cur_month = JANUARY; cur_month <= DECEMBER; cur_month=cur_month+1)
{
std::cout << cur_month << std::endl;
}
This will fail to compile. The problem is with the for
loop. The first two statements in the loop are fine. We may certainly create a new month variable and initialize it. We may also compare two months, where they will be compared as integers. We may not increment the cur_month variable. "cur_month+1
" evaluates to an integer which may not be stored into a "month
" data type.
In the code above we might try to fix this by replacing the for
loop with:
for( int monthcount = JANUARY; monthcount <= DECEMBER; monthcount++)
{
std::cout << monthcount << std::endl;
}
This will work because we can increment the integer "monthcount
".
typedef keyword is used to give a data type a new alias.
typedef existing-type new-alias;
The intent is to make it easier the use of an awkwardly labeled data type, make external code conform to the coding styles or increase the comprehension of source code as you can use typedef to create a shorter, easier-to-use name for that data type. For example:
typedef int Apples;
typedef int Oranges;
Apples coxes;
Oranges jaffa;
The syntax above is a simplification. More generally, after the word "typedef", the syntax looks exactly like what you would do to declare a variable of the existing type with the variable name of the new type name. Therefore, for more complicated types, the new type name might be in the middle of the syntax for the existing type. For example:
typedef char (*pa)[3]; // "pa" is now a type for a pointer to an array of 3 chars
typedef int (*pf)(float); // "pf" is now a type for a pointer to a function which
// takes 1 float argument and returns an int
This keyword also covered in the Coding style conventions Section.
Derived types
edit
Type conversion
editType conversion or typecasting refers to changing an entity of one data type into another.
Implicit type conversion
editImplicit type conversion, also known as coercion, is an automatic and temporary type conversion by the compiler. In a mixed-type expression, data of one or more subtypes can be converted to a supertype as needed at runtime so that the program will run correctly.
For example:
double d;
long l;
int i;
if (d > i) d = i;
if (i > l) l = i;
if (d == l) d *= 2;
As you can see d
, l
and i
belong to different data types, the compiler will then automatically and temporarily converted the original types to equal data types each time a comparison or assignment is executed.
Explicit type conversion
editExplicit type conversion manually converts one type into another, and is used in cases where automatic type casting doesn't occur.
double d = 1.0;
printf ("%d\n", (int)d);
In this example, d would normally be a double and would be passed to the printf function as such. This would result in unexpected behavior, since printf would try to look for an int. The typecast in the example corrects this, and passes the integer to printf as expected.