Ada Programming/Libraries/GNAT.String Split
Introduction
editExploding a string into several components based on a set of separators can be done in many different ways. In this article we're going to focus on a solution involving the GNAT.String_Split
package.
Caveat
editIf you use the following example in a program of your own, the result will be a less portable program. The GNAT packages are only found in the [GPL] and in the [GCC GNAT] compilers, meaning that your program probably won't compile with other Ada compilers.
The Problem
editYou want to split a string into a set of individual components, such as
This is a string
into
This is a string
And this is exactly what you can do with the GNAT.String_Split
package.
The GNAT.String_Split Solution
editLet's dive straight into the code necessary to solve our string split problem. Create a file named explode.adb
and add this to it:
-- A procedure to illustrate the use of the GNAT.String_Split package. This
-- is just the simplest, most basic usage; the package can do a lot more, like
-- splitting on a char set, re-split the string with new separators, and
-- return the separators found before and after each substring. Left as an
-- exercise for the reader. ;)
with Ada.Characters.Latin_1;
with Ada.Text_IO;
with GNAT.String_Split;
procedure Explode is
use Ada.Characters;
use Ada.Text_IO;
use GNAT;
Data : constant String :=
"This becomes a " & Latin_1.HT & " bunch of substrings";
-- The input data would normally be read from some external source or
-- whatever. Latin_1.HT is a horizontal tab.
Subs : String_Split.Slice_Set;
-- Subs is populated by the actual substrings.
Seps : constant String := " " & Latin_1.HT;
-- just an arbitrary simple set of whitespace.
begin
Put_Line ("Splitting '" & Data & "' at whitespace.");
-- Introduce our job.
String_Split.Create (S => Subs,
From => Data,
Separators => Seps,
Mode => String_Split.Multiple);
-- Create the split, using Multiple mode to treat strings of multiple
-- whitespace characters as a single separator.
-- This populates the Subs object.
Put_Line
("Got" &
String_Split.Slice_Number'Image (String_Split.Slice_Count (Subs)) &
" substrings:");
-- Report results, starting with the count of substrings created.
for I in 1 .. String_Split.Slice_Count (Subs) loop
-- Loop though the substrings.
declare
Sub : constant String := String_Split.Slice (Subs, I);
-- Pull the next substring out into a string object for easy handling.
begin
Put_Line (String_Split.Slice_Number'Image (I) &
" -> " &
Sub &
" (length" & Positive'Image (Sub'Length) &
")");
-- Output the individual substrings, and their length.
end;
end loop;
end Explode;
You compile and execute the Explode
program like this:
$ gnatmake explode.adb $ ./explode
You should see output similar to this:
Splitting 'This becomes a bunch of substrings' at whitespace. Got 6 substrings: 1 -> This (length 4) 2 -> becomes (length 7) 3 -> a (length 1) 4 -> bunch (length 5) 5 -> of (length 2) 6 -> substrings (length 10)
The comments in the example should more or less explain what's going on, but for the sake of clarity, we're going to do a step-by-step walk-through of the code, starting with the dependencies and use
clauses:
with Ada.Characters.Latin_1;
with Ada.Text_IO;
with GNAT.String_Split;
procedure Explode is
use Ada.Characters;
use Ada.Text_IO;
use GNAT;
The three with
lines list the packages on which our program depends. When the compiler encounters these, it retrieves those packages from its library. The "//Procedure Explode is//" line marks the start of our program, specifically the declarative part, where we declare/initialize our constants and variables. It also names our program Explode
. Note the use
clauses. Adding these enables us to do this:
Put_Line ("Some text");
instead of this
Ada.Text_IO.Put_Line ("Some text");
in the program. Very handy.
As an exercise, try commenting the three use
clauses, and prefix the actual package names to all types and procedures in the program.
Next up we have this:
Data : constant String :=
"This becomes a " & Latin_1.HT & " bunch of substrings";
This is the String
we're going to split into individual components. Latin_1.HT
is a constant declared in Ada.Characters.Latin_1
. It inserts a horizontal tab in the string. Since we don't change the value of Data
throughout the program, we've initialized it as a constant.
Subs : String_Split.Slice_Set;
The Subs
variable is the container for the individual components, or "slices".
Seps : constant String := " " & Latin_1.HT;
These are our separators. In this case we want to split the string on space (" ") and horizontal tabs (//Latin_1.HT//). Note that the separators are NOT included as part of the resulting Slice_Set
. Try experimenting with different separators.
begin
Put_Line ("Splitting '" & Data & "' at whitespace.");
begin
marks the beginning of the body of our program. Immediately after begin
we output a short message.
String_Split.Create (S => Subs,
From => Data,
Separators => Seps,
Mode => String_Split.Multiple);
This is the meat of the program. In this one statement the Data
String
is split into individual slices based on the Seps
separators, and the resulting slices are placed in the Subs Slice_Set
. Note the Mode => String_Split.Multiple
parameter. When using Multiple
mode, String_Split.Create
will treat consecutive whitespace and horizontal tabs as one separator.
As an exercise, try changing Multiple
to Single
and see what happens.
Put_Line
("Got" &
String_Split.Slice_Number'Image (String_Split.Slice_Count (Subs)) &
" substrings:");
This is the line that's responsible for the output:
Got 6 substrings:
Yes, it looks like an awfully long line for very little output, but there's method to the madness:
String_Split.Slice_Number'Image (String_Split.Slice_Count (Subs))
That line is responsible for the "6" part of the output. What it does is transform the Integer
value 6
into the String
value "6", and it does so using the Image
[[1]]. String_Split.Slice_Count (Subs)
return a Slice_Number
type, which is basically just an Integer
with a value >=0, and Image
then convert this to a String
suitable for output.
for I in 1 .. String_Split.Slice_Count (Subs) loop
-- Loop though the substrings.
declare
Sub : constant String := String_Split.Slice (Subs, I);
-- Pull the next substring out into a string object for easy handling.
begin
Put_Line (String_Split.Slice_Number'Image (I) &
" -> " &
Sub &
" (length" & Positive'Image (Sub'Length) &
")");
-- Output the individual substrings, and their length.
end;
end loop;
Here we start a loop that repeats String_Split.Slice_Count (Subs)
times, which in our case is 6. So on the first loop I
is 1 and on the final loop I
is 6. Inside the loop we declare
a new block. This enables us to locally initialize the Sub
constant, which on each repeat of the loop is initialized anew with the next slice from our split. This is done using the String_Split.Slice
function which takes our Sub
constant and the I
loop counter as parameters, and return a String
. In the body of the block we output each slice, along with its index in the Subs Slice_Set
and its length. As you can see, we once again make use of the Image
attribute to convert numeric values to Strings
.
You can get rid of the block inside the loop like this:
for I in 1 .. String_Split.Slice_Count (Subs) loop
-- Loop though the substrings.
Put_Line
(String_Split.Slice_Number'Image (I) &
" -> " &
String_Split.Slice (Subs, I) &
" (length" & Positive'Image (String_Split.Slice (Subs, I)'Length) &
")");
-- Output the individual substrings, and their length.
end loop;
As you can see, we're no longer using the Sub
constant. Instead we call String_Split.Slice (Subs, I)
directly. It works just the same, but it is perhaps a bit less readable.
Another option is to use an Ada.Strings.Unbounded.Unbounded_String
. You can see a possible solution here:
foobar.adb
with Ada.Characters.Latin_1; with Ada.Strings.Unbounded; with Ada.Text_IO; with Ada.Text_IO.Unbounded_IO; with GNAT.String_Split;
procedure Foobar is
use Ada.Characters; use Ada.Strings.Unbounded; use Ada.Text_IO; use Ada.Text_IO.Unbounded_IO; use GNAT; Data : constant String := "This becomes a " & Latin_1.HT & " bunch of substrings"; -- The input data, normally would be read from some external source or -- whatever. Latin_1.HT is a horizontal tab. Subs : String_Split.Slice_Set; -- Subs is populated by the actual substrings. Seps : constant String := " " & Latin_1.HT; -- just arbitrary simple set of whitespace. Sub : Unbounded_String; -- Object to a slice.
begin
Put_Line ("Splitting '" & Data & "' at whitespace."); -- Introduce our job String_Split.Create (S => Subs, From => Data, Separators => Seps, Mode => String_Split.Multiple); -- Create the split, using Multiple mode to treat strings of multiple -- whitespace characters as a single separator. -- This populates the Subs object. Put_Line ("Got" & String_Split.Slice_Number'Image (String_Split.Slice_Count (Subs)) & " substrings:"); -- Report results, starting with the count of substrings created for I in 1 .. String_Split.Slice_Count (Subs) loop -- Loop though the substrings -- Note that we've avoided the block from the first example. This is -- possible because our Sub variable is now an Unbounded_String, which -- does not have to be declared with an initial length. Sub := To_Unbounded_String (String_Split.Slice (Subs, I)); -- Pull the next substring out into an Unbounded_String object for -- easy handling. String_Split.Slice return a String, which we convert -- to an Unbounded_String using the aptly named To_Unbounded_String -- function. Put (String_Split.Slice_Number'Image (I)); Put (" -> "); Put (Sub); Put (" (length" & Positive'Image (Length (Sub)) & ")"); New_Line; end loop;
end Foobar; </syntaxhighlight>
Finally we have:
end Explode;
Which simply ends the program.
And with that, we've concluded this small tutorial on how to split a string into individual parts (slices) based on a set of separators. I hope you enjoyed reading it, as much as I enjoyed writing it.
See also
editWikibook
editExternal examples
edit- Search for examples of
GNAT.String_Split
in: Rosetta Code, GitHub (gists), any Alire crate or this Wikibook. - Search for posts related to
GNAT.String_Split
in: Stack Overflow, comp.lang.ada or any Ada related page.