MATLAB Programming/Strings


Declaring Strings

edit

Strings are declared using single quotes ( ' ):

 >> fstring = 'hello'
 fstring =
 hello

Including a single quote in a string is done this way:

 >> fstring = ''''
 fstring = 
 '
>> fstring = 'you''re'
 fstring =
 you're

Concatenate Strings

edit

In MATLAB , multiple strings can be concatenated (linked together as a chain) using square brackets.

<concatstring>=[<str1>,<str2>,<str3>,...];

Here are the random greeting message that are using this concatenate functions.

>> subject='The quick brown fox ' %sentence beginning

subject =
    'The quick brown fox '
    
>> verb = 'jumps over '

verb =
    'jumps over '
    
>> object='the lazy dog'

object =
    'the lazy dog'
    
>> phrase=[subject,verb,object]

phrase =
    'The quick brown fox jumps over the lazy dog'

Inputting strings

edit

To let user input , we can use input functions

>> name=input('Enter your names: ','s')
Enter your names: Matlab_User

name =
    'Matlab_User'

String manipulations

edit

Count the repeating words

edit
 
Tounge twister of wood chuck

Consider the following tounge-twister

How much wood would a woodchuck chuck

if a woodchuck could chuck wood?

He would chuck, he would, as much as he could,

and chuck as much wood as a woodchuck would

if a woodchuck could chuck wood.

We would like to know how many times of the word wood appeared in that tounge twister. We can use the count function.

>>%Declare woodchuck twister as an characther vectors
>> twister = 'How much wood would a woodchuck chuck if a woodchuck could chuck wood? He would chuck, he would, as much as he could, and chuck as much wood as a woodchuck would if a woodchuck could chuck wood.'

twister =
    'How much wood would a woodchuck chuck if a woodchuck could chuck wood? He would chuck, he would, as much as he could, and chuck as much wood as a woodchuck would if a woodchuck could chuck wood.'

>> count(twister,"wood")

ans =
     8
 

Note that the function count are counting occurrences of pattern in strings .

Therefore, it will counting the word "wood" inside the word "woodchuck"

Now, we have have another examples to count the word " the" of the famous proverbs of-all-time

The quick brown fox jumps over the lazy dog

phrase = 'The quick brown fox jumps over the lazy dog'
%count function is case-sensitive by default . It did not count The with capital 'T'
>> count(phrase,'the')

ans =
     1

%need to use IgnoreCase to turn off the case-sensitive words
>> count(phrase,'the','IgnoreCase',true)

ans =
     2

Finding lengths of string

edit

At times, you might be needing to find the length of words in a sentence, here is the length(string') functions comes to the rescue.

>> length(phrase)

ans =
    43

As we can see in next section, it can be seen that there are exactly 43 characters inside the string.

Extracting words from Strings

edit

To extracting certain words in the string, need to stringName(indexnumberfirst:indexnumberlast) .

We using the same example phrase as above.

Note that even empty space is consider as a string.

Index Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
String T h e q u i c k b r o w n f o x j u m p s o vv e r t h e l a z y d o g

Based on this example, if we wanted to extract the word brown fox and lazy dog,

we can see that each word is represented by index number (11:19) and index number (36:43) respectively. In MATLAB , we can type following commands:

>> phrase(11:19)

ans =
    'brown fox'

>> phrase(36:43)

ans =
    'lazy dog'

Lowercase and Uppercase of Strings

edit

For the string manipulations such as converting the strings to upper and lower cases, we can use lower and upper functions. This will make the strings all in lowercase and uppercase characthers respectively.

>>  upper(phrase)
>> %Convert the string to uppercase
ans =
    'THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG'

>> lower(phrase)
>> %Convert the string to lowercase
ans =
    'the quick brown fox jumps over the lazy dog'

Reverse Strings

edit

To reverse the string , we can use reverse / flip function. This will make the string reverse from the last index number to first index number and vice versa.

>> reverse(phrase)

ans =
    'god yzal eht revo spmuj xof nworb kciuq ehT'

Replace Characters in Strings

edit

To replace certain words in the strings, we can use replace functions

The syntax of replace functions are as followed: replace(stringName,oldword,newword)

>>% We don't want brown fox, we want to change to orange fox
>> replace(phrase,'brown','orange')

ans =
    'The quick orange fox jumps over the lazy dog'

There are at times, you might want to replace multiple words in one go , therefore we need to declare the multiple strings in vector. But before that make sure, the order/sequences for old and new words are in correct order.

>>%declare vector where the old words are going to be replaced
>> old={'fox','dog'}

old =
  1×2 cell array
    {'fox'}    {'dog'}

>>%declare vector where the new words are going to do the replaing
>> new={'cheetah','sloth'}

new =
  1×2 cell array
    {'cheetah'}    {'sloth'}
    
>> % Replace old words (fox) and (dog) into (cheetah) and (sloth) . Make sure sequence is in correct order    
>> replace(phrase,old,new)

ans =
    'The quick brown cheetah jumps over the lazy sloth'

Strings as a Character Array

edit

Strings in MATLAB are an array of characters. To see this, executing the following code:

 >> fstring = 'hello';
 >> class(fstring)
 ans = char

Because strings are arrays, many array manipulation functions work including: size, transpose, and so on. Strings may be indexed to access specific elements.

Performing arithmetic operations on character arrays converts them into doubles.

 >> fstring2 = 'world';
 >> fstring + fstring2
 ans = 223   212   222   216   211

These numbers are from the ASCII standard for each character in the array. These values are obtained using the double function to turn the array into an array of doubles.

 >> double(fstring)
 ans = 104   101   108   108   111

The 'char' function can turn an array of integer-valued doubles back into characters. Attempting to turn a decimal into a character causes MATLAB to round down:

 >> char(104)
 ans = h
 >> char(104.6)
 ans = h

Special String Functions

edit

Since MATLAB strings are character arrays, some special functions are available for comparing entire strings rather than just its components:

deblank

edit

deblank removes white spaces from the string.

findstr

edit

findstr(bigstring, smallstring) looks to see if a small string is contained in a bigger string, and if it is returns the index of where the smaller string starts. Otherwise it returns [].

strrep

edit

strrep(string1, replaced, replacement) replaces all instances of replaced in string1 with replacement

strcmp

edit

Strings, unlike rational arrays, do not compare correctly with the relational operator. To compare strings use the strcmp function as follows:

 >> string1 = 'a';
 >> strcmp(string1, 'a')
 ans = 1
 >> strcmp(string1, 'A')
 ans = 0

Note that MATLAB strings are case sensitive so that 'a' and 'A' are not the same. In addition the strcmp function does not discard whitespace:

 >> strcmp(string1, ' a')
 ans = 0

The strings must be exactly the same in every respect.

If the inputs are numeric arrays then the strcmp function will return 0 even if the values are the same. Thus it's only useful for strings. Use the == operator for numeric values.

 >> strcmp(1,1)
 ans = 0.

num2str

edit

Convert numbers to character . This functions is useful when you want to use function disp values to limit the display of decimal point.

>>%Limit the display of pi value to 9 decimal points
>> num2str(pi,'%1.9f')

ans =
    '3.141592654'

Displaying values of string variables

edit

If all you want to do is display the value of a string, you can omit the semicolon as is standard in MATLAB.

If you want to display a string in the command window in combination with other text, one way is to use array notation combined with either the 'display' or the 'disp' function:

 >> fstring = 'hello';
 >> display( [ fstring 'world'] )
 helloworld

MATLAB doesn't put the space in between the two strings. If you want one there you must put it in yourself.

This syntax is also used to concatenate two or more strings into one variable, which allows insertion of unusual characters into strings:

 >> fstring = ['you' char(39) 're']
 fstring = you're

Any other function that returns a string can also be used in the array.

You can also use the "strcat" function to concatenate strings, which does the same thing as above when you use two strings, but it is especially useful if you are using a cell array of strings because it lets you concatenate the same thing to all of the strings at once. Unfortunately you can't use it to add white space (strcat discards what MATLAB considers extraneous whitespace). Here's the syntax for this use.

 >> strCell = {'A', 'B'};
 >> strcat(strCell, '_');
 ans =
 A_
 B_

Finally, although MATLAB doesn't have a printf function you can do essentially the same thing by using 1 as your file identifier in the fprintf function. The format identifiers are essentially the same as they are in C.

 >> X = 9.2
 >> fprintf(1, '%1.3f\n', X);
 9.200

The "9.200" is printed to the screen. fprintf is nice compared to display because you don't have to call num2str on all of the numbers in a string - just use the appropriate format identifer in the place you want it.

 >> X = 9.2
 >> fprintf(1, 'The value of X is %1.3f meters per second \n', X);
 The value of X is 9.200 meters per second

Cell arrays of strings

edit

In many applications (particularly those where you are parsing text files, reading excel sheets with text, etc.) you will encounter cell arrays of strings.

You can use the function "iscellstr" to tell if all of the elements in a given cell array are strings or not.

 >> notStrCell = {'AA', []};
 >> iscellstr(notStrCell)
 ans = 0

This is useful since functions that work with cell arrays of strings will fail if provided with something that's not a cell array of strings. In particular, they all fail if any elements of the provided cell array are the empty array ( [] ) which is somewhat frustrating if the provided text file contains empty cells. You must catch this exception before calling cellstr manipulation functions.

Searching a cell array of strings can be done with the "strmatch", "strfind", and "regexp" functions. Strmatch looks for a string within a cell array of strings whose first characters exactly match the string you pass to it, and returns the index of all strings in the array for which it found a match. If you give it the 'exact' option, it will only return the indexes of elements that are exactly the same as what you passed. For example:

 >> strCell = {'Aa', 'AA'};
 >> strmatch('A', strCell);
 ans = 1, 2
 >> strmatch('A', strCell, 'exact');
 ans = []
 >> strmatch('Aa', strCell, 'exact');
 ans = 1

Strfind looks for a specific string within a cell array of strings, but it tries to find it in any part of each string. For each element x of the given cell array of strings, it will return an empty array if there is no match found in x and the starting index (remember, strings are arrays of characters) of all matches in x if a match to the query is found.

 >> strCell = {'Aa', 'AA'};
 >> strfind(strCell, 'A');
 ans = % answer is a cell array with two elements (same size as strCell): 
   1         % Index of the beginning of string "A" in the first cell
   1  2      % Index of each instance of the beginning of string "A" in the second cell
 >> strfind(strCell, 'a');
 ans =
   2
   [] % 'a' is not found

The "cellfun" / "isempty" combination is very useful for identifying cases where the string was or was not found. You can use the find function in combination with these two functions to return the index of all the cells in which the query string was found.

 >> strCell = {'Aa', 'AA'};
 >> idxCell = strfind(strCell, 'a');
 >> isFound = ~cellfun('isempty', idxCell); % Returns "0" if idxCell is empty and a "1" otherwise
 >> foundIdx = find(isFound)
 foundIdx = 2

The strfind function also has some other options, such as the option to only return the index of the first or last match. See the documentation for details.

The regexp function works the same way as strfind but instead of looking for strings literally, it tries to find matches within the cell array of strings using regular expressions. Regular expressions are a powerful way to match patterns within strings (not just specific strings within strings). Entire books have been written about regular expressions, so they cannot be covered in as much detail here. However, some good resources online include regular-expresions.info and the MATLAB documentation for the matlab-specific syntax. Note that MATLAB implements some, but not all, of the extended regular expressions available in other languages such as Perl.

Unfortunately, MATLAB does not innately have functions to do common string operations in some other languages such as string splitting. However, it is quite possible to find many of these functions in a google search.