Programming Fundamentals/String Data Type

Overview

edit

A string data type is commonly a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding.[1]

Discussion

edit

Depending on the programming language and precise data type used, a variable declared to be a string may either cause storage in memory to be statically allocated for a predetermined maximum length or employ dynamic allocation to allow it to hold a variable number of elements. When a string appears literally in source code, it is known as a string literal or an anonymous string.[2]

The character data type represents individual or single characters. Characters comprise a variety of symbols such as the alphabet (both upper and lower case) the numeral digits (0 to 9), punctuation, etc. All computers store character data in a one-byte field as an integer value. Because a byte consists of 8 bits, this one-byte field has 28 or 256 possibilities using the positive values of 0 to 255.

C++, C#, and Java differentiate between single characters and strings using single quotes and double quotes, respectively. JavaScript, Python, and Swift do not differentiate between characters and strings and use either single quotes or double quotes to define string literals.

Language Reserved Word Example
C++ char 'A'
C++ string "Hello world!"
C# char 'A'
C# String "Hello world!"
Java char 'A'
Java String "Hello world!"
JavaScript String 'Hello world!', "Hello world!"
Python str() 'Hello world!', "Hello world!"
Swift Character "A"
Swift String "Hello world!"

Most computing devices use the ASCII (stands for American Standard Code for Information Interchange and is pronounced “ask-key”) Character Set which has established values for 0 to 127. For the values of 128 to 255 they usually use the Extended ASCII Character Set. When we hit the capital A on the keyboard, the keyboard sends a byte with the bit pattern equal to an integer 65. When the byte is sent from the memory to the monitor, the monitor converts the integer value of 65 to into the symbol of the capital A to display on the monitor.

For now, we will address only the use of strings and characters as constants. Most modern compilers that are part of an Integrated Development Environment (IDE) will color the source code to help the programmer see different features more readily. Beginning programmers will use string constants to send messages to standard output.

Key Terms

edit
ASCII
American Standard Code for Information Interchange
Character
A data type representing single text characters like the alphabet, numeral digits, punctuation, etc.
Double quote marks
Used to create string type data within most programming languages.
Single quote marks
Used to create character type data within languages that differentiate between string and character data types.
String
A series or array of characters as a single piece of data.

References

edit