Fortran/strings
Modern Fortran has a wide range of facilities for handling string or text data but some of these language-defined facilities have not been widely implemented by the compiler developers. It should be remembered that Fortran is designed for scientific computing and is probably not a good choice for writing a new word processor.
Character type
editThe main feature in Fortran that supports strings is the intrinsic data type character
. A character literal constant can be delimited by either single or double quotes, and, where necessary, these can be escaped by using two consecutive single or double quotes. The concatenation operator is //
(but this cannot be used to concatenate character entities of different KIND). Character scalar variables and arrays are allowed. Character variables have a sub-string notation to refer to and extract sub-strings.
Example
program string_1
implicit none
! Declarations
character (len=6) :: word1
character (len=2) :: word2
word1 = "abcdef" ! Assignment
word2 = word1(5:6) ! Substring
word1 = 'Don''t ' ! Escape with a double quote
write (*,*) word2//word1 ! Concatenation
end program string_1
In the above example, the two character
variables word1
and word2
are declared to have length 6 and 2 characters respectively.
In character
assignment operations, if the right hand side of the assignment is shorter than the left hand side, the remaining characters on the left hand side are filled with blanks. If the right hand side is longer than the left hand side, then the right hand side is truncated. In neither case is an error raised either by the compiler or at run time.
character
arrays and coarrays are permitted and can be declared and accessed in the same way as any other Fortran array. Where the array index and substring notations are to be combined, the array indices appear first and the substring expression appears second as illustrated in the final line of the following example:
character (len=120), dimension (10) :: text
text(1) = 'This is the first element of the array "text"'
text(2:3) = ' ' ! Elements 2 and 3 are blank.
text(4)(20:20) = '!' ! Character 20 of element 4.
Unlike some programming languages, Fortran character
data and variables do not require an explicit character to terminate a string. Also, unlike C-type languages, Fortran character
data do not accommodate embedded and escaped control characters (e.g. /n) and all processing of output control is done via an extensive format
sub-system.
Character collating sequence
editInternally, Fortran maintains a collating sequence for all the permitted characters. Non-printing characters may be included in the collating sequence. The collating sequence is not specified by the language standard but most vendors support either ASCII or EBCDIC. This collating sequence means that lexical comparisons can be performed to ascertain whether e.g. 'a'<'b'
, but the outcome is essentially vendor specific. Hence there is a difference between functions such as ichar
and iachar
that is described below.
Character kind
editcharacter
can also have a kind
, but this is vendor-specific. It can allow compilers to support unicode, or the Russian alphabet or Japanese characters etc. It is not necessary to specify the length or kind of a character
variable. If a character
variable is declared with neither, the result is a variable of default kind and one character long. A single number is to indicate length, and two numbers indicate length and kind in that order. It is generally much clearer, but slightly more verbose to be explicit, as shown in lines 6-8 of the following example. The compiler vendor has control over which kinds of character are supported and the integer values assigned to access the corresponding character sets.
program string_2
implicit none
character :: one
character (5) :: english_name
character (5,2) :: japanese_name
character (len=80) :: line
character (len=120, kind=3) :: unicode_line
character (kind=4, len=256) :: ebcdic_string
!...
end program string_2
The intrinsic function selected_char_kind(name)
returns the positive integer kind value of the character set with the corresponding name (e.g default, ascii, kanji, iso_10646 etc) but the only character set that must be supported is default
, and if the name is not supported then -1 will be returned. Disappointingly, vendors generally have been slow to implement more than the default kind but gfortran, for instance, is a notable exception.
Language-defined Intrinsic Functions and Subprograms
editFortran has a fairly limited set of intrinsic functions to support character manipulation, searching and conversion. But the basic set is enough to construct some powerful features as required. There are some strange absences such as the ability to convert from lower-case to upper-case but this can be understood and forgiven since these concepts may not exist in many of the languages or character sets that may be represented by different character
kinds. Functions such as size
, lbound
and ubound
which apply to arrays of any data type, including character type, are not described here.
achar
editachar(i, kind)
returns the ith character in the ASCII collating sequence for the characters of the specified kind. The integer i
must be in the range 0 < i < 127. Kind is an optional integer. If kind is not specified the default kind is assumed. achar(72)
has the value 'H'. One really useful feature of achar
is that it permits access to the non-printing ASCII characters such as return (achar(13)
). achar
will always return the ASCII character even if the processor's collating sequence is not ASCII. If kind is present, the kind parameter of the result is that specified by kind; otherwise, the kind parameter of the result is that of default character. If the processor cannot represent the result value in the kind of the result, the result is undefined. Using achar
is highly recommended in preference to char
, described below, because it is portable from one processor to another.
adjustl
editadjustl(string)
left justifies by removing leading (left) blanks from string and filling the right of string with blanks so that the result has the same length as the input string.
adjustr
editadjustr(string)
right justifies by removing trailing (right) blanks from string and filling the left of the string with blanks so that the result has the same length as the input string.
char
editchar(i, kind)
returns the ith character in the processor collating sequence for the characters of the specified kind. The integer i
does not have to be in the range 0 < i < 127. Kind is an optional integer. If kind is not specified the default kind is assumed. If the processor cannot represent the result value in the kind of the result, the result is undefined.
iachar
editiachar(c, kind)
is the inverse of achar
described above. c is a single input character and iachar(c)
returns the position of c in the ASCII character set as a default integer. Kind is an optional input integer and if kind is specified, it specifies the kind of the integer returned by iachar
.
ichar
editichar(c, kind)
is the inverse of CHAR described above. c is a single input character and ichar(c)
returns the position of c in the selected character set as a default integer. Kind is an optional input integer and if kind is specified, it specifies the kind of the integer returned by ichar
.
index
editindex(string, substring)
returns a default integer representing the position of the first instance of substring in string searching from left to right. There are two optional arguments: back and kind. If the logical back is set true the search is conducted from right to left, and if the integer kind is specified, then the integer returned by index
will be of that kind. If substring does not appear in string the result is 0.
len
editlen(c, kind)
returns an integer representing the declared length of character c. This can be extremely useful in subprograms which receive character dummy arguments. c
can be a character array. Kind is an optional integer which controls the kind of the integer returned by len
.
len_trim
editlen_trimc, kind)
returns the length of c excluding any trailing blanks (but including leading blanks). If c is only blanks the result is 0. Hence expressions like len_trim(adjustl(c))
can be used to count the number of characters in c between the first and last non-blank characters. Kind is an optional integer which controls the kind of the integer returned by len_trim
.
new_line
editnew_line(c)
is a character function that returns the new line character for the current processor. The kind of the returned character will be the same as the kind of c
. A blank character may be returned if the character kind from which c
is drawn does not contain a relevant newline character. This function is not likely to be used except in some very specific circumstances.
repeat
editrepeat(string, ncopies)
concatenates integer ncopies of the string. Hence repeat('=',72)
is a string of 72 equals signs. String must be scalar but can be of any length. Trailing blanks in string are included in the result.
scan
editscan(string, set, back, kind)
returns a default integer (or an integer of the optional kind) that represents the first position that any character in set appears in string. To search right to left, the optional logical back must be set true. string can be an array in which case, the result in an integer array. If string is an array then set can be an array of the same size and shape as string and each element of set is scanned for in the corresponding element of string. index
, described above, is a special case of scan
, because every character of set must be found and in the order of the characters in set.
selected_char_kind
editselected_char_kind(name)
is an integer function that returns the kind value of the character set named. The only set that must be supported by the language standard is name='DEFAULT'
. If name is not supported the result is -1.
trim
edittrim(string)
is a character valued function that returns a string with the trailing blanks removed. If string is all blanks the result has zero length.
verify
editverify(string, set, back, kind)
is an integer function that returns the position of the first character in string that is not in set. So verify
is roughly the obverse of scan
. In verify
back and kind are both optional and have the same role as described in scan
above. If every character in string is also in set (or string has zero length), then the function returns 0.
Regular expressions
editFortran does not have any language-defined regex or sorting capability for character data. Fortran does not have a language-defined text tokenizer but, with a little ingenuity, list directed input can provide a partial solution. However, there are Fortran libraries that wrap C regex libraries.
I/O of character data
editread formatting
editread
for character data can be list-directed or formated using the "a" or "an" forms of this edit descriptor. In the "a" form, the width is taken from the width of the corresponding item in the list. In the "an" form, the integer n specifies the number of characters to transfer. The general edit description "gn" can also be used.
Example
character (120) :: line
open (10,"test.dat")
read (10,'(a)') line ! Read up to 120 characters into line
read (10,'(a5)') line(115:) ! Read 5 character and put them at the end of line
write Formatting
editThe a and g edit descriptors exist for write
as described above. The "a" form will write the whole character variable including all the trailing blanks so it is common to use trim
or adjustl
or both.
Example
character (len=512) :: line
!...
write (10,'(a)') trim(adjustl(line))
Internal Read and Write
editFortran has many hidden secrets and one of the most useful is that read
and write
statements can be used on character variables as if they were files. Hence the otherwise mystifying lack of functions to convert numbers to strings and vice versa. The character variable is treated as an 'internal file'
Example
character (120) :: text_in, text_out
integer :: i
real :: x
!...
write (text_in,'(A,I0)') 'i = ', i ! Formatted
!...
read (text_out,*) x ! List-directed
In addition to type conversion, this internal read/write can be used as a very flexible and bullet proof method of reading files where the contents may be of uncertain format. The external file is read line by line into a character variable, scan
and verify
can be used on the line to determine what is present and then an internal file read is done on the character variable to convert to real
, integer
, complex
etc as appropriate.
Recent Extensions
editcharacter(:), allocatable
editThe size of character scalar data can be deferred (or "allocatable") and therefore free from being required to be declared of a specific length. The resulting scalar can then be formally allocated, or it can be automatically allocated as shown in the following example.
Example
character (:), allocatable :: string
!...
string = 'abcdef'
!...
string = '1234567890'
!...
string = trim(line)
!...
It is even possible to declare an array of assumed length elements, as illustrated below.
Example
character (:), dimension (:), allocatable :: strings
However, this feature should be used carefully and some restrictions apply
Actual/Dummy arguments of type character
editIt is frequently the case that a procedure may be written with a character dummy argument where the length of that argument is not known in advance. Modern Fortran allows dummy arguments to be declared with assumed length using len=*
. Functions of type character can be written so that the result assumed a length related to the length of the dummy arguments.
Example
call this('Hello')
call this('Goodbye')
!...
subroutine this(string)
implicit none
character (len=*), intent (in) :: string
character (len=len(string)+5) :: temp
!...
end subroutine
In the above example, the character
variable temp
is declared to have 5 more characters than string, no matter how long the actual argument is. In the next example, a function return a string, the length of which is related to the length of one or more arguments.
Example
string = that('thing', 7)
!...
function that(in_string, n) result (out_string)
implicit none
character (len=*), intent (in) :: in_string
integer, intent(in) :: n
character (len=len(in_string)*n) :: out_string
!...
end function
In circumstances where the character function has to return a string and the length of this string is not simply related to the inputs, the assumed length, allocatable form described above can be used, and is illustrated in the case conversion examples below.
character parameters
editcharacter
parameters can be declared without explicitly stating the length, for example;
character (*), parameter :: place = 'COEFF_LIST_initialise'
Approaches to Case Conversion
editHere are some further examples of the ideas above, but directed to the case conversion for languages where case conversion as a concept exists. In the first example, the ASCII character set functions iachar
and achar
are used to check each character in a string consecutively.
Example
function up_case(in) result (out)
implicit none
character (*), intent (in) :: in
character (:), allocatable :: out
integer :: i, j
out = in ! Transfer whole array
do i = 1, LEN_TRIM(out) ! Each character
j = iachar(out(i:i)) ! Get the ASCII position
select case (j)
case (97:122) ! The lower case characters
out(i:i) = ACHAR(j-32) ! Offset to the upper case
end select
end do
end function up_case
An alternative approach that does not rely on the ASCII representation function could be as follows:
Example
function to_upper(in) result (out)
implicit none
character (*), intent (in) :: in
character (:), allocatable :: out
integer :: i, j
character (*), parameter :: upp = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
character (*), parameter :: low = 'abcdefghijklmnopqrstuvwxyz'
out = in ! Transfer all characters
do i = 1, len_trim(out) ! All non-blanks
j = index(low, out(i:i)) ! Is ith character in low
if (j>0) out(i:i) = upp(j:j) ! Yes, then subst with upp
end do
end function to_upper
Which routine is quicker will depend on the relative speed of the index
and iachar
intrinsics. In one less than very scientific test, the first method above seemed to be slightly more than twice as fast as the second method, but this will vary from vendor to vendor.