MATLAB Programming/Strings
Declaring Strings
editStrings are declared using single quotes ( ' ):
>> fstring = 'hello'
fstring =
hello
Including a single quote in a string is done this way:
>> fstring = ''''
fstring =
'
>> fstring = 'you''re'
fstring =
you're
Concatenate Strings
editIn MATLAB , multiple strings can be concatenated (linked together as a chain) using square brackets.
<concatstring>=[<str1>,<str2>,<str3>,...];
Here are the random greeting message that are using this concatenate functions.
>> subject='The quick brown fox ' %sentence beginning
subject =
'The quick brown fox '
>> verb = 'jumps over '
verb =
'jumps over '
>> object='the lazy dog'
object =
'the lazy dog'
>> phrase=[subject,verb,object]
phrase =
'The quick brown fox jumps over the lazy dog'
Inputting strings
editTo let user input , we can use input functions
>> name=input('Enter your names: ','s')
Enter your names: Matlab_User
name =
'Matlab_User'
String manipulations
editCount the repeating words
editConsider the following tounge-twister
How much wood would a woodchuck chuck
if a woodchuck could chuck wood?
He would chuck, he would, as much as he could,
and chuck as much wood as a woodchuck would
if a woodchuck could chuck wood.
We would like to know how many times of the word wood appeared in that tounge twister. We can use the count function.
>>%Declare woodchuck twister as an characther vectors
>> twister = 'How much wood would a woodchuck chuck if a woodchuck could chuck wood? He would chuck, he would, as much as he could, and chuck as much wood as a woodchuck would if a woodchuck could chuck wood.'
twister =
'How much wood would a woodchuck chuck if a woodchuck could chuck wood? He would chuck, he would, as much as he could, and chuck as much wood as a woodchuck would if a woodchuck could chuck wood.'
>> count(twister,"wood")
ans =
8
Note that the function count are counting occurrences of pattern in strings .
Therefore, it will counting the word "wood" inside the word "woodchuck"
Now, we have have another examples to count the word " the" of the famous proverbs of-all-time
The quick brown fox jumps over the lazy dog
phrase = 'The quick brown fox jumps over the lazy dog'
%count function is case-sensitive by default . It did not count The with capital 'T'
>> count(phrase,'the')
ans =
1
%need to use IgnoreCase to turn off the case-sensitive words
>> count(phrase,'the','IgnoreCase',true)
ans =
2
Finding lengths of string
editAt times, you might be needing to find the length of words in a sentence, here is the length(string') functions comes to the rescue.
>> length(phrase)
ans =
43
As we can see in next section, it can be seen that there are exactly 43 characters inside the string.
Extracting words from Strings
editTo extracting certain words in the string, need to stringName(indexnumberfirst:indexnumberlast) .
We using the same example phrase as above.
Note that even empty space is consider as a string.
Index Number | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 |
String | T | h | e | q | u | i | c | k | b | r | o | w | n | f | o | x | j | u | m | p | s | o | vv | e | r | t | h | e | l | a | z | y | d | o | g |
Based on this example, if we wanted to extract the word brown fox and lazy dog,
we can see that each word is represented by index number (11:19) and index number (36:43) respectively. In MATLAB , we can type following commands:
>> phrase(11:19)
ans =
'brown fox'
>> phrase(36:43)
ans =
'lazy dog'
Lowercase and Uppercase of Strings
editFor the string manipulations such as converting the strings to upper and lower cases, we can use lower and upper functions. This will make the strings all in lowercase and uppercase characthers respectively.
>> upper(phrase)
>> %Convert the string to uppercase
ans =
'THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG'
>> lower(phrase)
>> %Convert the string to lowercase
ans =
'the quick brown fox jumps over the lazy dog'
Reverse Strings
editTo reverse the string , we can use reverse / flip function. This will make the string reverse from the last index number to first index number and vice versa.
>> reverse(phrase)
ans =
'god yzal eht revo spmuj xof nworb kciuq ehT'
Replace Characters in Strings
editTo replace certain words in the strings, we can use replace functions
The syntax of replace functions are as followed: replace(stringName,oldword,newword)
>>% We don't want brown fox, we want to change to orange fox
>> replace(phrase,'brown','orange')
ans =
'The quick orange fox jumps over the lazy dog'
There are at times, you might want to replace multiple words in one go , therefore we need to declare the multiple strings in vector. But before that make sure, the order/sequences for old and new words are in correct order.
>>%declare vector where the old words are going to be replaced
>> old={'fox','dog'}
old =
1×2 cell array
{'fox'} {'dog'}
>>%declare vector where the new words are going to do the replaing
>> new={'cheetah','sloth'}
new =
1×2 cell array
{'cheetah'} {'sloth'}
>> % Replace old words (fox) and (dog) into (cheetah) and (sloth) . Make sure sequence is in correct order
>> replace(phrase,old,new)
ans =
'The quick brown cheetah jumps over the lazy sloth'
Strings as a Character Array
editStrings in MATLAB are an array of characters. To see this, executing the following code:
>> fstring = 'hello';
>> class(fstring)
ans = char
Because strings are arrays, many array manipulation functions work including: size, transpose, and so on. Strings may be indexed to access specific elements.
Performing arithmetic operations on character arrays converts them into doubles.
>> fstring2 = 'world';
>> fstring + fstring2
ans = 223 212 222 216 211
These numbers are from the ASCII standard for each character in the array. These values are obtained using the double function to turn the array into an array of doubles.
>> double(fstring)
ans = 104 101 108 108 111
The 'char' function can turn an array of integer-valued doubles back into characters. Attempting to turn a decimal into a character causes MATLAB to round down:
>> char(104)
ans = h
>> char(104.6)
ans = h
Special String Functions
editSince MATLAB strings are character arrays, some special functions are available for comparing entire strings rather than just its components:
deblank
editdeblank removes white spaces from the string.
findstr
editfindstr(bigstring, smallstring) looks to see if a small string is contained in a bigger string, and if it is returns the index of where the smaller string starts. Otherwise it returns [].
strrep
editstrrep(string1, replaced, replacement) replaces all instances of replaced in string1 with replacement
strcmp
editStrings, unlike rational arrays, do not compare correctly with the relational operator. To compare strings use the strcmp function as follows:
>> string1 = 'a';
>> strcmp(string1, 'a')
ans = 1
>> strcmp(string1, 'A')
ans = 0
Note that MATLAB strings are case sensitive so that 'a' and 'A' are not the same. In addition the strcmp function does not discard whitespace:
>> strcmp(string1, ' a')
ans = 0
The strings must be exactly the same in every respect.
If the inputs are numeric arrays then the strcmp function will return 0 even if the values are the same. Thus it's only useful for strings. Use the == operator for numeric values.
>> strcmp(1,1)
ans = 0.
num2str
editConvert numbers to character . This functions is useful when you want to use function disp values to limit the display of decimal point.
>>%Limit the display of pi value to 9 decimal points
>> num2str(pi,'%1.9f')
ans =
'3.141592654'
Displaying values of string variables
editIf all you want to do is display the value of a string, you can omit the semicolon as is standard in MATLAB.
If you want to display a string in the command window in combination with other text, one way is to use array notation combined with either the 'display' or the 'disp' function:
>> fstring = 'hello';
>> display( [ fstring 'world'] )
helloworld
MATLAB doesn't put the space in between the two strings. If you want one there you must put it in yourself.
This syntax is also used to concatenate two or more strings into one variable, which allows insertion of unusual characters into strings:
>> fstring = ['you' char(39) 're']
fstring = you're
Any other function that returns a string can also be used in the array.
You can also use the "strcat" function to concatenate strings, which does the same thing as above when you use two strings, but it is especially useful if you are using a cell array of strings because it lets you concatenate the same thing to all of the strings at once. Unfortunately you can't use it to add white space (strcat discards what MATLAB considers extraneous whitespace). Here's the syntax for this use.
>> strCell = {'A', 'B'};
>> strcat(strCell, '_');
ans =
A_
B_
Finally, although MATLAB doesn't have a printf function you can do essentially the same thing by using 1 as your file identifier in the fprintf function. The format identifiers are essentially the same as they are in C.
>> X = 9.2
>> fprintf(1, '%1.3f\n', X);
9.200
The "9.200" is printed to the screen. fprintf is nice compared to display because you don't have to call num2str on all of the numbers in a string - just use the appropriate format identifer in the place you want it.
>> X = 9.2
>> fprintf(1, 'The value of X is %1.3f meters per second \n', X);
The value of X is 9.200 meters per second
Cell arrays of strings
editIn many applications (particularly those where you are parsing text files, reading excel sheets with text, etc.) you will encounter cell arrays of strings.
You can use the function "iscellstr" to tell if all of the elements in a given cell array are strings or not.
>> notStrCell = {'AA', []};
>> iscellstr(notStrCell)
ans = 0
This is useful since functions that work with cell arrays of strings will fail if provided with something that's not a cell array of strings. In particular, they all fail if any elements of the provided cell array are the empty array ( [] ) which is somewhat frustrating if the provided text file contains empty cells. You must catch this exception before calling cellstr manipulation functions.
Searching a cell array of strings can be done with the "strmatch", "strfind", and "regexp" functions. Strmatch looks for a string within a cell array of strings whose first characters exactly match the string you pass to it, and returns the index of all strings in the array for which it found a match. If you give it the 'exact' option, it will only return the indexes of elements that are exactly the same as what you passed. For example:
>> strCell = {'Aa', 'AA'};
>> strmatch('A', strCell);
ans = 1, 2
>> strmatch('A', strCell, 'exact');
ans = []
>> strmatch('Aa', strCell, 'exact');
ans = 1
Strfind looks for a specific string within a cell array of strings, but it tries to find it in any part of each string. For each element x of the given cell array of strings, it will return an empty array if there is no match found in x and the starting index (remember, strings are arrays of characters) of all matches in x if a match to the query is found.
>> strCell = {'Aa', 'AA'};
>> strfind(strCell, 'A');
ans = % answer is a cell array with two elements (same size as strCell):
1 % Index of the beginning of string "A" in the first cell
1 2 % Index of each instance of the beginning of string "A" in the second cell
>> strfind(strCell, 'a');
ans =
2
[] % 'a' is not found
The "cellfun" / "isempty" combination is very useful for identifying cases where the string was or was not found. You can use the find function in combination with these two functions to return the index of all the cells in which the query string was found.
>> strCell = {'Aa', 'AA'};
>> idxCell = strfind(strCell, 'a');
>> isFound = ~cellfun('isempty', idxCell); % Returns "0" if idxCell is empty and a "1" otherwise
>> foundIdx = find(isFound)
foundIdx = 2
The strfind function also has some other options, such as the option to only return the index of the first or last match. See the documentation for details.
The regexp function works the same way as strfind but instead of looking for strings literally, it tries to find matches within the cell array of strings using regular expressions. Regular expressions are a powerful way to match patterns within strings (not just specific strings within strings). Entire books have been written about regular expressions, so they cannot be covered in as much detail here. However, some good resources online include regular-expresions.info and the MATLAB documentation for the matlab-specific syntax. Note that MATLAB implements some, but not all, of the extended regular expressions available in other languages such as Perl.
Unfortunately, MATLAB does not innately have functions to do common string operations in some other languages such as string splitting. However, it is quite possible to find many of these functions in a google search.