Perl Programming/Functions

Previous: Modifiers Index Next: Perl 5.10 additions

A Perl function is a grouping of code that allows it to be easily used repeatedly. Functions are a key organizing component in all but the smallest of programs.

Introduction

edit

So far we have written just a few lines of Perl at a time. Our example programs have started at the top of a file and proceeded to the bottom of the file, with a little jumping around using control-flow keywords like if, else and while. In many instances, though, it is useful to add another layer of organization on our programs.

For example, when something goes wrong in a typical program, it prints an "error message". The code to print an error message might look like

print STDOUT "something went wrong!\n";

Other messages might look slightly different, like

print STDOUT "something ELSE went wrong!\n";

We could pepper our code with hundreds of lines just like this, and things would work just fine…for a while. But sooner or later, we'd want to separate out error messages from "status messages" that report harmless information about our program. To do this, we could prefix all of our error messages with the word "ERROR" and all of the status messages with "STATUS". The code for a typical error messages would change to

print STDOUT "ERROR: something went wrong!\n";

The problem is, with hundreds of error messages, it would be a hassle to go change them all. This is where subroutines can help out.

The Wikipedia defines a subroutine, as "a sequence of instructions that perform a specific task, as part of a larger program. Subroutines can be called from different locations in a program, thus allowing programs to access the subroutine repeatedly without the subroutine's code having been written more than once."

All that gobblydeegook means we can wrap up our error message code in one place, as follows:

sub print_error_message {
  my($message) = @_;
  print STDOUT "ERROR: " . $message . "\n";
}

Whenever something goes wrong in our program, we can activate, or call, this subroutine, with whatever message we prefer:

print_error_message("something bad happened");

print_error_message("something really horrible happened");

print_error_message("something sort of annoying happened");

And see messages such as

ERROR: something bad happened

If we need to change the formatting of our error message, say, to include a few exclamation points, it's simple enough to change the subroutine:

sub print_error_message {
  my($message) = @_;
  print STDOUT "ERROR: " . $message . "!!!\n";
}

This has been an admittedly simple example, and subroutines have a few other advantages, but that's it in a nutshell. Type it in one place, fix it in one place, change it in one place.

Now for a little more detail about subroutines. Much of what follows will use the following subroutine, which, if it isn't obvious already, adds two numbers together, and returns their sum.

sub add_two_numbers {
  my($x, $y) = @_;
  my $sum = $x + $y;
  return $sum;
}

Parts of a subroutine

edit

Name

edit
sub add_two_numbers {
  my($x, $y) = @_;
  my $sum = $x + $y;
  return $sum;
}

The first line of a function begins with the keyword sub followed by the function's name. Any string of letters and numbers that is not a reserved Perl word (such as for, while, if and else) is a valid function name. Subroutines with names that describe what the subroutine does make for easier-to-read programs.

Prototype (rarely used)

edit
sub add_two_numbers($$) {

The optional ($$) specifies how many arguments this subroutine expects. ($$) says "this function requires two scalar values". Perl prototypes are NOT what most people with experience with other languages expect them to be: instead, the prototypes alter the context of the parameters to the subroutine.

It is possible to disable a prototype by calling the function with a leading &, but this is not recommended.

Prototypes are used only seldomly, as it's much easier to use the normal method of passing parameters.

Body

edit

The body of a subroutine does the "work", and consists of three primary sections.

Reading arguments

edit

The pieces of information handed to a subroutine are called arguments or actual parameters. For instance, in

add_two_numbers(3, 4);

3 and 4 are the arguments to the subroutine add_two_numbers.

Perl passes arguments to a subroutine in an array represented by @_. Usually it is more convenient to give meaningful names to these arguments, so the first line of a function often looks like

sub add_two_numbers {
  my($x, $y) = @_; # reading parameters
  my $sum = $x + $y;
  return $sum;
}

that puts the contents of @_ into two variables named $x and $y. $x and $y are called formal parameters. The distinction between formal parameters (arguments) and actual parameters is subtle and for the most part unimportant. It is described somewhat in the wikipedia article Parameter (computer science). Be careful not to confuse the special variable $_ and @_, the array of arguments passed to a function.

Some subroutines don't require any arguments, for example

sub hello_world {
  print STDOUT "Hello World!\n";
}

will print "Hello World" to STDOUT. This subroutine doesn't need any extra information about how to do its job and accordingly doesn't need any arguments.

Most modern programming languages save programmers the trouble of explicitly breaking the argument array into variables. Unfortunately, Perl does not. Which, on the other hand, makes writing subroutines with variable number of arguments very easy.

In programming context parameter means almost the same thing as argument (see parameter for details). The two are often confused with no loss of understanding.

Important note: global and local variables
edit

Unlike programming languages such as C or Java, all variables created or used within Perl subroutines are, by default, global variables. This means that any piece of your program outside of your subroutine may modify these variables, and that your subroutine may be, unknowingly, modifying variables that it has no business modifying. In small programs this is often convenient, but as programs get longer this often leads to complexity and is considered poor practice.

The best way to avoid this trap is to place the keyword my in front of all your variables the first time they appear. This tells Perl that you only want these variables to be available inside the nearest enclosing group of curly braces. In effect, these local variables act as a "scratch space" for use within your subroutines that disappears when the subroutine returns. The line use strict; at the top of your program will instruct Perl to force you to use my in front of your variables, to prevent you from accidentally creating global variables.

An alternative to my that you may see in some older Perl programs is the local keyword. local is somewhat similar to my, but is more complicated to deal with. It's better to stick with my in your own programs.

"Scope" describes whether a variable is local or global, and a couple of other complexities. See scope for a technical discussion.

The interesting part of a subroutine

edit

In the very middle of the subroutine you're likely to find the more interesting "guts" of it all. In our add_two_numbers subroutine, this is the part that actually does the adding

sub add_two_numbers {
  my($x, $y) = @_;
  my $sum = $x + $y; # the interesting part
  return $sum;
}

In this middle section, you can do just about anything your heart desires, arithmetic, printing to files, or even calling other subroutines.

The return statement

edit

Finally, some subroutines "return" some piece of information, called the "return value", using the return keyword.

sub add_two_numbers {
  my($x, $y) = @_;
  my $sum = $x + $y;
  return $sum; # the return statement
}

For example

$sum = add_two_numbers(4, 5);

will set $sum to 9 (the sum of 4 and 5).

return can also be used without any return value as a shortcut to leave a subroutine before getting to a closing }

Invoking subroutines

edit

Subroutines may be declared anywhere within a Perl program. They may be invoked like so:

add_two_numbers(4, 5); # the safest approach

add_two_numbers 4, 5; # only if predeclared

&add_two_numbers(4, 5); # older Perl syntax, but still valid

If no "&" prefix is used, parentheses are required unless the subroutine has been predeclared.

Functions calling functions

edit

On their own, functions provide a major stepping stone towards good code, but combining functions together really unleashes their power.

As you might expect, calling a function from within another function doesn't look any different from calling a function from the part of your program that is sitting outside any curly-braces.

This function adds two numbers, then multiplies them by 3. Bear with us on the uselessness of these functions. As you build your own programs to solve your unique problems, you'll see their usefulness immediately.

sub add_two_numbers_and_mult_by_three {
  my($x, $y) = @_;    # read parameters
  my $sum = add_two_numbers($x, $y);   # add x and y, put result in sum
  my $sum_times_three = $sum*3;     # multiply by three
  return $sum_times_three;     # return result
}

The line

my $sum = add_two_numbers($x, $y);   # add x and y, put result in sum

calls our function add_two_numbers and puts the result into our $sum variable. Easy stuff, huh?

In this function, we've actually written much more code than we need. It could be pared down to something much smaller, but equally readable:

sub add_two_numbers_and_mult_by_three {
  my($x, $y) = @_;    # read parameters
  return 3*add_two_numbers($x, $y);   # add x and y, mult by 3, return
}

Functions calling themselves — recursion

edit

We've seen functions calling other functions, but one neat concept in programming is when functions call themselves. This is called recursion. At first it seems like this might cause a so-called infinite loop, but it's really quite standard programming.

In math, the factorial function, multiplies a positive integer by each positive integer less than itself. For example, "5 factorial" (usually written 5!) is calculated by multiplying 5 times 4 times 3 times 2 times 1. Of course, the 1 doesn't change the result. The factorial function is useful in calculating things like the number of different possible ways to seat your relatives at the dinner table.

Factorial makes a natural example for recursion, though it can be written just as easily with a while loop.

sub factorial {
  my($num) = @_;
  if ($num == 1) {
    return 1;   # stop at 1, factorial doesn't multiply times zero
  } else {
    return $num*factorial($num - 1);   # call factorial function recursively
  }
}

The self-referential line here is

return $num*factorial($num - 1);   # call factorial function recursively

which calls the factorial function from within the factorial function. This would go on forever, but we have a sort of stop-sign that prevents it:

if ($num == 1) {
  return 1;   # stop at 1, factorial doesn't multiply times zero
}

This stops the sequence of calls to factorial and prevents the never-ending infinite loop.

Written in a sort of longhand

 factorial(5)
 = 5*factorial(4)
 = 5*4*factorial(3)
 = 5*4*3*factorial(2)
 = 5*4*3*2*factorial(1)
 = 5*4*3*2*1
 = 120

We've just barely touched on recursion. For some programming problems, it is a very natural solution. For others, it's a little… unnatural. Suffice it to say, it's a tool every programmer should carry in his or her belt.

Functions vs procedures vs subroutines

edit

While reading programming literature and talking to programmers, you might run across the three terms "functions", "procedures" and "subroutines". Most of the time these are used interchangeably, but for the purists out there:

A function always returns a value, and, given the same arguments, always returns the same value. Functions are a lot like the functions you may have used in math class.

A procedure, unlike a function, may return no value at all. Unlike a function, a procedure often interacts with its external environment beyond its argument array. For instance, a procedure may read and write to files.

A subroutine refers to a sequence of instructions that may be either a function or a procedure.

The official grammar

edit

The syntax for defining a subroutine is:

sub NAME PROTOTYPE ATTRIBUTES BLOCK

If this makes any sense to you, you probably don't need to be reading this book ;)


Previous: Modifiers Index Next: Perl 5.10 additions