GLPK/Unix Batch Execution

These scripts demonstrate running batch jobs on the Unix command line. For the sake of generality, the demonstration is divided into two parts. The first part creates a set of problem instances which are then run in a set of batch queues in the second part.

Part one consists of four scripts, tst1, tst2a, tst2b and tst3. tst1 is an awk script which creates a set of points on a line with errors in both the X and Y values. tst3 calls tst1 and either tst2a or tst2b a number of times creating a set of data files in MathProg format.

These examples are from a Solaris system where the awk(1) of The AWK Programming Language by Aho, Weinberger and Kernighan is called nawk. On other systems, this is simply awk and the prior implementation of awk is called oawk for old awk. Many Linux systems use the Gnu implementation, gawk. So minor editing of references to nawk may be required.

All of these will work with suitable minor changes on Solaris, Linux, FreeBSD, OpenBSD, MacOS X and also on Windows with the addition of one of the Unix environments such as Cygwin.

#!/bin/nawk -f

#-----------------------------------------------------------------------
#  tst1
#
# generate the specified number of points on a line with random errors
# in the X & Y values
#-----------------------------------------------------------------------

BEGIN{

   srand();

   n = ARGV[1];
   ARGV[1] = "";

   for( i=0; i<n; i++ ){

      printf( "%d " ,i+1  );
      
      printf( "%f " ,i/3.0+rand() );
      printf( "%f " ,i/7.0+rand() );

      printf( "\n" );

   }
}

#!/bin/sh
#--------------------------------------------------------------
# tst2a
#
# This script uses Bourne shell here files to add the MathProg
# statements.
#--------------------------------------------------------------

# create the first part of the MathProg data file statements

cat <<EOF
data;

param : I :   x    y :=

EOF

# pass stdin through to stdout

cat 

# add the last part of the MathProg data file

cat <<EOF
;
end;
EOF

<syntaxhighlight lang="bash">
#!/bin/sh
#--------------------------------------------------------------
# tst2b
#
# This script uses Bourne shell here files to add the MathProg
# statements.  It then uses the Unix utilities, awk and sort to
# randomly reorder the points.
#--------------------------------------------------------------

# write out the first part of the MathProg data file

cat <<EOF
data;

param : I :   x    y :=

EOF

# randomly reorder the input

nawk 'BEGIN{srand();}{ print $0 ,rand()}' ${1}  \
| sort -k 4n | nawk '{print NR ,$2, $3}'

# write out the last part of the MathProg data file

cat <<EOF
;
end;
EOF

#!/bin/sh

#-----------------------------------------------------------------------
# tst3
#
# create a set of data files in MathProg format using tst1 to generate 
# the data and either tst2a or tst2b to add the required MathProg 
# statements.
#
# ./tst1 generates points on a line with random errors in the X & Y values
# ./tst2a just sets up the MathProg statements
# ./tst2b also randomly reorders the data using Unix command line 
# utilities.  
#
# tst3 takes two arguments specifying the number of points on the line 
# and the number of data files to create.  The sleep following tst2b 
# is to ensure that a new seed is used for each instance.
#-----------------------------------------------------------------------
 
if [ ${#} -ne 2 ]
   then

   echo "usage:"
   echo "./tst3 <npoints> <ninstances>"
   exit
fi

J=1

while [ ${J} -le ${2} ]
   do

   #./tst1 ${1}  | ./tst2a >${J}.dat
   ./tst1 ${1}  | ./tst2b >${J}.dat; sleep 1;

   J=`expr ${J} + 1`

done

tst4 gets a list of jobs based on the file extension .dat, splits the list into 4 groups and runs them in background queues. There are many options for implementing batch queues on Unix. This is a minimalist example suited to the needs of an individual researcher using a single multicore workstation. Should you need to run a large number of long running jobs on a cluster, more full featured queueing systems should be considered.

All of the console output from glpsol is directed to log files for each queue named Q0.log, Q1.log, etc.

#!/bin/sh 

#-----------------------------------------------------------------------
# tst4
#
# This script runs a collection of jobs in a set of 4 parallel queues.
#
# This can be extended to as many cores in a multicore processor as
# you wish.  If you plan to run a very large number of jobs that will
# require significant time to complete it is suggested that you use 
# N-1 queues where N is the number of cores in your system.  This 
# will ensure that you have a core free for interactive use.
#
# The jobs are identified by the extension .dat, however, any naming
# will work.
#
# White space matters.  In particular, the "\" must be followed by
# a newline (aka linefeed).
#-----------------------------------------------------------------------

# get a list of all jobs in a temporary file

/bin/ls *.dat >/tmp/tmp.$$

# break the list into 4 sublists

Q0=`nawk 'NR%4==0' /tmp/tmp.$$`
Q1=`nawk 'NR%4==1' /tmp/tmp.$$`
Q2=`nawk 'NR%4==2' /tmp/tmp.$$`
Q3=`nawk 'NR%4==3' /tmp/tmp.$$`

# remove the temporary file

rm /tmp/tmp.$$

# fire off the queues by putting shell loops into the background

(for I in ${Q0};                                       \
   do                                                  \
   glpsol -m tst.mod -d ${I} -o ${I}_log -y ${I}_out ; \
done ) 2>&1 >Q0.log &


(for I in ${Q1};                                       \
   do                                                  \
   glpsol -m tst.mod -d ${I} -o ${I}_log -y ${I}_out ; \
done ) 2>&1 >Q1.log &


(for I in ${Q2};                                       \
   do                                                  \
   glpsol -m tst.mod -d ${I} -o ${I}_log -y ${I}_out ; \
done ) 2>&1 >Q2.log &


(for I in ${Q3};                                       \
   do                                                  \
   glpsol -m tst.mod -d ${I} -o ${I}_log -y ${I}_out ; \
done ) 2>&1 >Q3.log &

The MathProg model is a minor modification of cf12a.mod from the examples directory of the distribution.

# set of points

set I;

# independent variable

param x {i in I};

# dependent variable

param y {i in I};

# define equation variables

var a;

var b;

var u {i in I}, >= 0;

var v {i in I}, >= 0;

# define objective function

minimize error: sum {i in I} u[i] + sum {i in I} v[i];

# define equation constraint

s.t. equation {i in I} : b * x[i] + a + u[i] - v[i] = y[i];

solve;

printf "y = %.4fx + %.4f\n", b, a;

end;