Shell Programming/Optimization
Optimization
editIf performance is a major concern, you should use a compiled language, such as C, or at least higher-performance interpreted language such as Python, rather than a shell script. Further, performance is generally dominated by the external commands that are called (and call overhead) and by the shell itself, rather than by a particular script or its logic. Thus the primary method of optimization is to call fewer or cheaper external commands, or to use a more optimized shell, typically a simpler one optimized for scripting, such as ash or dash, rather than a complex shell such as bash.
However, some finer optimization is possible, and profiling can be used to identify bottlenecks caused by a separate command, rather than the script itself.
Profiling
editMost simply, run the script in verbose mode (in bash, use the -x
flag or set -x
at the top of the script) and see where execution visibly hangs – this is a very quick way to identify bottlenecks.
Next, use time to measure the time used by the script or any particular command. This may be a shell built-in, as in bash, but the (external) GNU time program provides more sophisticated analytics. You can call these as follows (last assumes a location for the time binary):
time ./foo
`which time` -v ./foo
/usr/bin/time -v ./foo
Particularly useful data are:
- Real/User/Sys time (wall clock, user space CPU time, OS CPU time) – bottom line
- Context switches – measures switching to an external program or kernel, and back
- Page faults
This lets you quickly measure the effect of changes.
Beware that this measurement is noisy – particularly if other tasks are running – and there is overhead from running a script and from the time command itself. Further, performance will generally be slower on first run, due to caches being cold. To handle the noise and cache, kill other tasks (or check that the system in not loaded, via top) and run the script a few times. To calibrate baseline overhead, time the true (external) command, true (built-in, if available), and an empty script, to see the overall process and script overhead.
For longer-running scripts, on Linux you can see some information at /proc/$pid/status
, which you can watch via watch /proc/$pid/status
. Note that you can use grep to select only certain data, such as context switches (ctxt).
For system calls, strace, particularly strace -c
, breaks down the cost of system calls.
More finally, you can time sections of a script by using date, and comparing successive timestamps. For example, using GNU date with nanosecond precision:
date --rfc-3339=ns
This may be sufficient to identify bottlenecks, though binary searching can be tedious.
To profile an entire script, easiest is to turn on verbose mode (in bash) and pipe through date, which produces timestamps for each line; you can alternatively set PS4 in bash, either calling date each time (major overhead) or using a built-in date format specifier (in Bash 4.2 or later). Some simple manipulation (using tee and paste) can produce an annotated transcript with the time required for each line. See F. Hauri’s answer to How to profile a bash shell script? – note that (currently) line numbers are off by one (time is cost of previous line) – or use bashProfiler.
This analysis is sufficient for scripts with simple flow – a list of commands – but does not detect hotspots from loops or recursion. More sophisticated profilers analogous to gperf are not available for shell scripts, but (assuming script lines are unique, which can be ensured by adding manual line numbers if necessary) simple processing of the timestamped logs can let you sum across lines and identify hotspots.
Built-ins
editA key technique for optimizing shell scripts is to replace calls to external commands with calls to built-ins. This eliminates the substantial process overhead, and associated context switches and page faults which are the main causes of slow performance. A good example is using string manipulation operations in bash, rather than making an external call to sed.
A simple way to see the difference (in bash) is to profile calling the built-in true versus the external true program:
#!/bin/bash
for i in {1..1000} ; do true ; done
#!/bin/bash
for i in {1..1000} ; do /bin/true ; done
Profile via:
/usr/bin/time -v ./true-int-1000
/usr/bin/time -v ./true-ext-1000
The script that calls the external command will be much slower, which is caused primarily by having many more context switches (at least 2000 more, due to switch to true and then back); and page faults – the actual command does nothing (successfully).