Pascal Programming/Strings
The data type string(…)
is used to store a finite sequence of char
values.
It is a special case of an array
, but unlike an array[…] of char
the data type string(…)
has some advantages facilitating its effective usage.
The data type string(…)
as presented here is an Extended Pascal extension, as defined in the ISO standard 10206.
Due to its high relevance in practice, this topic has been put into the Standard Pascal part of this Wikibook, right after the chapter on arrays.
Many compilers have a different conception of what constitutes a string . Consult their manual for their idiosyncratic differences. Rest assured, the GPC supports string(…) as explained here.
|
Properties
editCapacity
editDefinition
editThe declaration of a string
data type always entails a maximum capacity:
program stringDemo(output);
type
address = string(60);
var
houseAndStreet: address;
begin
houseAndStreet := '742 Evergreen Trc.';
writeLn('Send complaints to:');
writeLn(houseAndStreet);
end.
After the word string
follows a positive integer number surrounded by parenthesis.
This is not a function call.[fn 1]
Implications
editVariables of the data type address
as defined above will only be able to store up to 60
independent char
values.
Of course it is possible to store less, or even 0
, but once this limit is set it cannot be expanded.
Inquiry
editString
variables “know” about their own maximum capacity:
If you use writeLn(houseAndStreet.capacity)
, this will print 60
.
Every string
variable automatically has a “field” called capacity
.
This field is accessed by writing the respective string
variable’s name and the word capacity
joined by a dot (.
).
This field is read-only:
You cannot assign values to it.
It can only appear in expressions.
Length
editAll string
variables have a current length.
This is the total number of legit char
values every string
variable currently contains.
To query this number, the EP standard defines a new function called length
:
program lengthDemo(output);
type
domain = string(42);
var
alphabet: domain;
begin
alphabet := 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
writeLn(length(alphabet));
end.
The length
function returns a non-negative integer
value denoting the supplied string’s length.
It also accepts char
values.[fn 2]
A char
value has by definition a length of 1
.
It is guaranteed that the length
of a string
variable will always be less than or equal to its corresponding capacity
.
Compatibility
editYou can copy entire string values using the :=
operator provided the variable on the LHS has the same or a greater capacity than the RHS string expression.
This is different than a regular array
’s behavior, which would require dimensions and size to match exactly.
program stringAssignmentDemo;
type
zipcode = string(5);
stateCode = string(2);
var
zip: zipcode;
state: stateCode;
begin
zip := '12345';
state := 'QQ';
zip := state; // ✔
// zip.capacity > state.capacity
// ↯ state := zip; ✘
end.
As long as no clipping occurs, i. e. the omission of values because of a too short capacity, the assignment is fine.
Index
editIt is worth noting that otherwise strings are internally regarded as arrays.[fn 3]
Like a character array you can access (and alter) every array element independently by specifying a valid index surrounded by brackets.
However, there is a big difference with respect to validity of an index.
At any time, you are only allowed to specify indices that are within the range 1..length
.
This range may be empty, specifically if length
is currently 0
.
It is not possible to change the current length by manipulating individual string components:
program stringAccessDemo;
type
bar = string(8);
var
foo: bar;
begin
foo := 'AA'; { ✔ length ≔ 2 }
foo[2] := 'B'; { ✔ }
foo[3] := 'C'; { ↯: 3 > length }
end.
|
Standard routines
editIn addition to the length
function, EP also defines a few other standard functions operating on strings.
Manipulation
editThe following functions return strings.
Substring
editIn order to obtain just a part of a string
(or char
) expression, the function subStr(stringOrCharacter, firstCharacter, count)
returns a sub-string of stringOrCharacter
having the non-negative length count
, starting at the positive index firstCharacter
.
It is important that firstCharacter + count - 1
is a valid character index in stringOrCharacter
, otherwise the function causes an error.[fn 4]
program substringDemo(output);
begin
writeLn(subStr('GCUACGGAGCUUCGGAGUUAG', 7, 3));
{ char index: 1 4 7 … }
end.
GAG
firstCharacter
index. Here we wanted to extract the third codon. However, firstCharacter
is not simply 2 * 3
but 2 * 3 + 1
. Indexing characters in a string
variable start at 1
. Note, a sophisticated implementation for encoding codons would not make use of string
, but define a custom enumeration data type.For string
-variables, the subStr
function is the same as specifying myString[firstCharacter..firstCharacter+count]
.[fn 5]
Evidently, if the firstCharacter
value is some complicated expression, the subStr
function should be preferred to prevent any programming mistakes.
string
.
program substringOverwriteDemo(output);
var
m: string(35);
begin
m := 'supercalifragilisticexpialidocious ';
m[21..35] := '-yadi-yada-yada';
writeLn(m);
end.
supercalifragilistic-yadi-yada-yada
string
.Furthermore, the third parameter to subStr
can be omitted:
This will simply return the rest of the given string
starting at the position indicated by the second parameter.[fn 6]
Remove trailing spaces
editThe trim(source)
function returns a copy of source
without any trailing space characters, i. e. ' '
.
In LTR scripts any blanks to the right are considered insignificant, yet in computing they take up (memory) space.
It is advisable to prune strings before writing them, for example, to a disk or other long-term storage media, or transmission via networks.
Concededly memory requirements were a more relevant issue prior to the 21st century.
First occurrence of substring
editThe function index(source, pattern)
finds the first occurrence of pattern
in source
and returns the starting index.
All characters from pattern
match the characters in source
at the returned offset:
1 | 2 | 3 | ✘ | |||||
pattern
|
X
|
Y
|
X
|
|||||
---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | ✘ | |||||
pattern
|
X
|
Y
|
X
|
|||||
1 | 2 | 3 | ✔ | |||||
pattern
|
X
|
Y
|
X
|
|||||
source
|
Z
|
Y
|
X
|
Y
|
X
|
Y
|
X
| |
1 | 2 | 3 | 4 | 5 | 6 | 7 |
Note, to obtain the second or any subsequent occurrence, you need to use a proper substring of the source
.
Because the “empty string” is, mathematically speaking, present everywhere, index(characterOrString, '')
always returns 1
.
Conversely, because any non-empty string cannot occur in an empty string, index('', nonEmptyStringOrCharacter)
always returns 0
, in the context of strings an otherwise invalid index.
The value zero is returned if pattern
does not occur in source
.
This will always be the case if pattern
is longer than source
.
Operators
editThe EP standard introduced an additional operator for strings of any length, including single characters.
The +
operator concatenates two strings or characters, or any combination thereof.
Unlike the arithmetic +
, this operator is non-commutative, that means the order of the operands matters.
expression | result |
---|---|
'Foo' + 'bar'
|
'Foobar'
|
'' + ''
|
''
|
'9' + chr(ord('0') + 9) + ' Luftballons'
|
'99 Luftballons'
|
Concatenation is useful if you intend to save the data somewhere.
Supplying concatenated strings to routines such as write
/writeLn
, however, may possibly be disadvantageous:
The concatenation, especially of long strings, first requires to allocate enough memory to accommodate for the entire resulting string.
Then, all the operands are copied to their respective location.
This takes time.
Hence, in the case of write
/writeLn
it is advisable (for very long strings) to use their capability of accepting an infinite number of (comma-separated) parameters.
Note, the common LOC
stringVariable := 'xyz' + someStringOrCharacter + …;
is equivalent to
writeStr(stringVariable, 'xyz', someStringOrCharacter, …);
The latter is particularly useful if you also want to pad the result or need some conversion.
Writing foo:20
(minimum width of 20
characters possibly padded with spaces ' '
to the left) is only acceptable using write
/writeLn
/writeStr
. WriteStr
is an EP extension.
The GPC, the FPC and Delphi are also shipped with a function concat
performing the very same task.
Read the respective compiler’s documentation before using it, because there are some differences, or just stick to the standardized +
operator.
Sophisticated comparison
editAll functions presented in this subsection return a Boolean
value.
Order
editSince every character in a string has an ordinal value, we can think of a method to sort them. There are two flavors of comparing strings:
- One uses the relational operators already introduced, such as
=
,>
or<=
. - The other one is to use dedicated functions like
LT
, orGT
.
The difference lies in their treatment of strings that vary in length.
While the former will bring both strings to the same length by padding them with space characters (' '
), the latter simply clips them to the shortest length, but taking into account which one was longer (if necessary).
function name | meaning | operator |
---|---|---|
EQ |
equal | =
|
NE |
not equal | <>
|
LT |
less than | <
|
LE |
less than or equal to | <=
|
GT |
greater than | >
|
GE |
greater than or equal to | >=
|
All these functions and operators are binary, that means they expect and accept only exactly two parameters or operands respectively. They can produce different results if supplied with the same input, as you will see in the next two sub-subsections.
Equality
editLet’s start with equality.
- Two strings (of any length) are considered equal by the
EQ
function if both operands are of the same length and the value, i. e. the character sequence that actually make up the strings, are the same. - An
=
‑comparison, on the other hand, augments any “missing” characters in the shorter string by using the padding character space (' '
).[fn 7]
program equalDemo(output);
const
emptyString = '';
blankString = ' ';
begin
writeLn(emptyString = blankString);
writeLn(EQ(emptyString, blankString));
end.
True
False
emptyString
got padded to match the length of blankString
, before the actual character-by-character =
‑expression took place.To put this relationship in other words, Pascal terms you already know:
(foo = bar) = EQ(trim(foo), trim(bar))
The actual implementation is usually different, because trim
can be, especially for long strings, quite resource-consuming (time, as well as memory).
As a consequence, an =
‑comparison is usually used if trailing spaces are insignificant, but are still there for technical reasons (e. g. because you are using an array[1..8] of char
).
Only EQ
ensures both strings are lexicographically the same.
Note that the capacity
of either string is irrelevant.
The function NE
, short for not equal, behaves accordingly.
Less than
editA string is determined to be “less than” another one by sequentially reading both strings simultaneously from left to right and comparing corresponding characters. If all characters match, the strings are said to be equal to each other. However, if we encounter a differing character pair, processing is aborted and the relation of the current characters determines the overall string’s relation.
first operand | 'A'
|
'B'
|
'C'
|
'D'
|
---|---|---|---|---|
second operand | 'A'
|
'B'
|
'E'
|
'A'
|
determined relation | =
|
=
|
<
|
⨯ |
If both strings are of equal length, the LT
function and the <
‑operator behave the same.
LT
actually even builds on top of <
.
Things get interesting if the supplied strings differ in length.
- The
LT
function first cuts both strings to the same (shorter) length. (substring) - Then a regular comparison is performed as demonstrated above. If the shortened versions, common length versions turn out to be equal, the (originally) longer string is said to be greater than the other one.
<
‑comparison, on the other, compares all remaining “missing” characters to ' '
, the space character. This can lead to differing results:
program lessThanDemo(output);
var
hogwash, malarky: string(8);
begin
{ ensure ' ' is not chr(0) or maxChar }
if not (' ' in [chr(1)..pred(maxChar)]) then
begin
writeLn('Character set presumptions not met.');
halt; { EP procedure immediately terminating the program }
end;
hogwash := '123';
malarky := hogwash + chr(0);
writeLn(hogwash < malarky, LT(hogwash, malarky));
malarky := hogwash + '4';
writeLn(hogwash < malarky, LT(hogwash, malarky));
malarky := hogwash + maxChar;
writeLn(hogwash < malarky, LT(hogwash, malarky));
end.
False True
True True
True True
<
‑comparison, the “missing” fourth character in hogwash
is presumed to be ' '
. The fourth character in malarky
is compared against ' '
.The situation above has been provoked artificially for demonstration purposes, but this can still become an issue if you are frequently using characters that are “smaller” than the regular space character, like for instance if you are programming on an 1980s 8‑bit Atari computer using ATASCII.
The LE
, GT
, and GE
functions act accordingly.
Details on string
literals
edit
Inclusion of delimiter
editIn Pascal string
literals start with and are terminated by the same character.
Usually this is a straight (typewriter’s) apostrophe ('
).
Troubles arise if you want to actually include that character in a string
literal, because the character you want to include into your string is already understood as the terminating delimiter.
Conventionally, two straight typewriter’s apostrophes back-to-back are regarded as an apostrophe image.
In the produced computer program, they are replaced by a single apostrophe.
program apostropheDemo(output);
var
c: char;
begin
for c := '0' to '9' do
begin
writeLn('ord(''', c, ''') = ', ord(c));
end;
end.
Each double-apostrophe is replaced by a single apostrophe.
The string still needs delimiting apostrophes, so you might end up with three consecutive apostrophes like in the example above, or even four consecutive apostrophes (''''
) if you want a char
-value consisting of a single apostrophe.
Non-permissible characters
editA string
is a linear sequence of characters, i. e. along a single dimension.
As such the only illegal “character” in strings is the one marking line breaks (new lines). The string literal in the following piece of code is unacceptable, because it spans across multiple (source code) lines.
welcomeMessage := 'Hello!
All your base are belong to us.';
|
You are nevertheless allowed to use the OS-specific code indicating EOLs, yet the only cross-platform (i. e. guaranteed to work regardless of the used OS) procedure is writeLn
.
Although not standardized, many compilers provide a constant representing the environment’s character/string necessary to produce line breaks.
In FPC it is called lineEnding
.
Delphi has sLineBreak
, which is also understood by the FPC for compatibility reasons.
The GPC’s standard module GPC
supplies the constant lineBreak
.
You will first need to import
this module before you can use that identifier.
Remainder operator
editThe final Standard Pascal arithmetic operator you are introduced to, after learning to divide, is the remainder operator mod
(short for modulo).
Every integer
division (div
) may yield a remainder.
This operator evaluates to this value.
i
|
-3
|
-2
|
-1
|
0
|
1
|
2
|
3
|
---|---|---|---|---|---|---|---|
i mod 2
|
1
|
0
|
1
|
0
|
1
|
0
|
1
|
i mod 3
|
0
|
1
|
2
|
0
|
1
|
2
|
0
|
Similar to all other division operations, the mod
operator does not accept a zero value as the second operand.
Moreover, the second operand to mod
must be positive.
There are many definitions, among computer scientists and mathematicians, as regards to the result if the divisor was negative.
Pascal avoids any confusion by simply declaring negative divisors as illegal.
The mod
operator is frequently used to ensure a certain value remains in a specific range starting at zero (0..n
).
Furthermore, you will find modulo in number theory.
For example, the definition of prime numbers says “not divisible by any other number”.
This expression can be translated into Pascal like that:
expression | is divisible by |
---|---|
mathematical notation | |
Pascal expression | x mod d = 0
|
odd(x) is shorthand for x mod 2 <> 0 .[fn 8]
|
Tasks
editarray[n..m] of string(c)
?string(…)
is basically a special case of an array
(namely one consisting of char
values), you can access a single character from it just like usual: v[i, p]
where i
is a valid index in the range n..m
and p
refers to the character index within 1..length(v[i])
.
true
if, and only if a given string(…)
contains non-blank characters (i. e. other characters than ' '
).program spaceTest(input, output);
type
info = string(20);
{**
\brief determines whether a string contains non-space characters
\param s the string to inspect
\return true if there are any characters other than ' '
*}
function containsNonBlanks(s: info): Boolean;
begin
containsNonBlanks := length(trim(s)) > 0;
end;
// … remaining code for testing purposes only …
Note, that this function (correctly) returns false
if supplied with an empty string (''
). Alternatively you could have written:
containsNonBlanks := '' <> s;
string(…)
data type to work properly. Remember, in these exercises there is no “best” solution.
program
that reads a string(…)
and transposes every letter in it by 13 positions with respect to the original character’s place in the English alphabet, and then outputs the modified version. This algorithm is known as “Caesar cipher”. For simplicity assume all input is lower-case.program rotate13(input, output);
const
// we will only operate ("rotate") on these characters
alphabet = 'abcdefghijklmnopqrstuvwxyz';
offset = 13;
type
integerNonNegative = 0..maxInt;
sentence = string(80);
var
secret: sentence;
i, p: integerNonNegative;
begin
readLn(secret);
for i := 1 to length(secret) do
begin
// is current character in alphabet?
p := index(alphabet, secret[i]);
// if so, rotate
if p > 0 then
begin
// The `+ 1` in the end ensures that p
// in the following expression `alphabet[p]`
// is indeed always a valid index (i.e. not zero).
p := (p - 1 + offset) mod length(alphabet) + 1;
secret[i] := alphabet[p];
end;
end;
writeLn(secret);
end.
array[chr(0)..maxChar] of char
) would have been acceptable, too, but care must be taken in properly populating it.
Note, it is not guaranteed that expressions such as succ('A', 13)
will yield the expected result. The range 'A'..'Z'
is not necessarily contiguous, so you should not make any assumptions about it. If your solution makes use of that, you must document it (e. g. “This program only runs properly on computers using the ASCII character set.”).
string
is a palindrome, that means it can be read forward and backwards producing the same meaning/sound provided word gaps (spaces) are adjusted accordingly. For simplicity assume all characters are lower-case and there are no punctuation characters (other than whitespace).program palindromes(input, output);
type
sentence = string(80);
{
\brief determines whether a lower-case sentence is a palindrome
\param original the sentence to inspect
\return true iff \param original can be read forward and backward
}
function isPalindrome(original: sentence): Boolean;
var
readIndex, writeIndex: integer;
derivative: sentence;
check: Boolean;
begin
check := true;
// “sentences” that have a length of one, or even zero characters
// are always palindromes
if length(original) > 1 then
begin
// ensure `derivative` has the same length as `original`
derivative := original;
// the contents are irrelevant, alternatively [in EP] you could’ve used
//writeStr(derivative, '':length(original));
// which would’ve saved us the “fill the rest with blanks” step below
writeIndex := 1;
// strip blanks
for readIndex := 1 to length(original) do
begin
// only copy significant characters
if not (original[readIndex] in [' ']) then
begin
derivative[writeIndex] := original[readIndex];
writeIndex := writeIndex + 1;
end;
end;
// fill the rest with blanks
for writeIndex := writeIndex to length(derivative) do
begin
derivative[writeIndex] := ' ';
end;
// remove trailing blanks and thus shorten length
derivative := trim(derivative);
for readIndex := 1 to length(derivative) div 2 do
begin
check := check and (derivative[readIndex] =
derivative[length(derivative) - readIndex + 1]);
end;
end;
isPalindrome := check;
end;
var
mystery: sentence;
begin
writeLn('Enter a sentence that is possibly a palindrome (no caps):');
readLn(mystery);
writeLn('The sentence you have entered is a palindrome: ',
isPalindrome(mystery));
end.
original
string
. For demonstration purposes the example shows if not (original[readIndex] in [' ']) then
. In fact an explicit set list would have been more adequate, i. e. if original[readIndex] in ['a', 'b', 'c', …, 'z']) then
. Do not worry if you simply wrote something to the effect of if original[readIndex] <> ' ' then
, this is just as fine given the task’s requirements.
LT('', '')
?
function
that determines whether a year in the Gregorian calendar is a leap year. Every fourth year is a leap year, but every hundredth year is not, unless it is the fourth century in a row.mod
operator you just saw:
{
\brief determines whether a year is a leap year in Gregorian calendar
\param x the year to inspect
\return true, if and only if \param x meets leap year conditions
}
function leapYear(x: integer): Boolean;
begin
leapYear := (x mod 4 = 0) and (x mod 100 <> 0) or (x mod 400 = 0);
end;
function
isLeapYear
in Delphi’s and the FPC’s sysUtils
unit
or in GPC’s GPC
module
. Whenever possible reuse code already available.
function
returning the leap year property of a year, write a binary function
returning the number of days in a given month and year.case
-statement. Recall that there must be exactly one assignment to the result variable:
type
{ a valid day number in Gregorian calendar month }
day = 1..31;
{ a valid month number in Gregorian calendar year }
month = 1..12;
{
\brief determines the number of days in a given Gregorian year
\param m the month whose day number count is requested
\param y the year (relevant for leap years)
\return the number of days in a given month and year
}
function daysInMonth(m: month; y: integer): day;
begin
case m of
1, 3, 5, 7, 8, 10, 12:
begin
daysInMonth := 31;
end;
4, 6, 9, 11:
begin
daysInMonth := 30;
end;
2:
begin
daysInMonth := 28 + ord(leapYear(y));
end;
end;
end;
dateUtils unit
provide a function
called daysInAMonth
. You are strongly encouraged to reuse it instead of your own code.More exercises can be found in:
Notes:
- ↑ In fact this is a discrimination of, what EP calls “schema”. Schemata will be explained in detail in the Extensions Part of this Wikibook.
- ↑ This functionality is useful if you are handling constants you or someone might change at some point. Per definition the literal value
' '
is achar
value, whereas''
(“null-string”) or'42'
are string literals. In order to write generic code,length
accepts all kinds of values that could denote a finite sequence ofchar
values. - ↑ In fact the definition essentially is
packed array[1..capacity] of char
. - ↑ This means, in the case of empty strings, only the following function call could be legal
subStr('', 1, 0)
. It goes without saying that such a function call is very useless. - ↑ The string variable may not be
bindable
when using this notation. - ↑ Omitting the third parameter in the case of empty strings or characters is not allowed.
subStr('', 1)
is illegal, because there is no “character1
” in an empty string. Also,subStr('Z', 1)
is not allowed, because'Z'
is achar
-expression and as such always has a length of1
, rendering any need for a “give me the rest of/subsequent characters of” function obsolete. - ↑ If you are a GPC user, you will need to ensure you are in a fully-EP-compliant mode for example by specifying
‑‑extended‑pascal
on the command line. Otherwise, no padding occurs. The Standard (unextended) Pascal, as per ISO standard 7185, does not define any padding algorithm. - ↑ The actual implementation of
odd
may be different. On many processor architectures it is usually something to the effect of the x86 instructionand x, 1
.