Introduction to Programming Languages/Interpreted Programs

Interpreted Programs

Interpreters execute programs in a different way. They do not produce native binary code; at least not in general. Instead, an interpreter converts a program to an intermediate representation, usually a tree, and uses an algorithm to traverse this tree emulating the semantics of each of its nodes. In the previous chapter we had implemented a small interpreter in Prolog for a programming language whose programs represent arithmetic expressions. Even though that was a very simple interpreter, it contained all the steps of the interpretation process: we had a tree representing the abstract syntax of a programming language, and a visitor going over every node of this tree performing some interpretation-related task.

The source program is meaningless to the interpreter in its original format, e.g., a sequence of ASCII characters. Thus, like a compiler, an interpreter must parse the source program. However, contrary to the compiler, the interpreter does not need to parse all the source code before executing it. That is, only those pieces of the program text that are reachable by the execution flow of the program need to be translated. Thus, the interpreter does a kind of lazy translation.

Advantages and disadvantages of interpretation over compilation

The main advantage of an interpreter over a compiler is portability. The binary code produced by the compiler, as we have emphasized before, is tailored specifically to a target computer architecture. The interpreter, on the other hand, processes the source code directly. With the rise of the World Wide Web, and the possibility of downloading and executing programs from remote servers, portability became a very important issue. Because client web applications must run in many different machines, it is not effective for the browser to download the binary representation of the remote software. Source code must come instead.

A compiled program usually runs faster than an interpreted program, because there are less intermediaries between the compiled program and the underlying hardware. However, we must bear in mind that compiling a program is a lengthy process, as we had seen before. Therefore, if the program is meant to be executed only once, or at most a few times, then interpreting it might be faster than compiling and running it. This type of scenario is common in client web applications. For instance, JavaScript programs are usually interpreted, instead of compiled. These programs are downloaded from a remote web server, and once the browser section expires, their code is usually lost.

To change a program's source code is a common task during the development of an application. When using a compiler, each change implies a potentially long waiting time. The compiler needs to translate the modified files and to link all the binaries to create an executable program, before running that program. The larger is the program, the longer is this delay. On the other hand, because an interpreter does not translate all the source code before running it, the time necessary to test the modifications is significantly shorter. Therefore, interpreters tend to favour the development of software prototypes.

Example: bash-script: Bash-script is a typical interpreter commonly used in the Linux operating system. This interpreter provides to users a command-line interface; that is, it gives users a prompt where they can type commands. These commands are read and then interpreted. Commands can also be grouped into a single file. A bash script is a file containing a list of commands to be executed by the bash shell. Bash is a scripting language. In other words, bash makes it very easy for the user to call applications implemented in other programming languages different than bash itself. A such script can be used to automatically execute a sequence of commands that the user often needs. The following lines are very simple commands that could be stored in a script file, called, for instance, my_info.sh:

#! /bin/bash
# script to present some information
clear
echo 'System date:'
date
echo 'Current directory:'
pwd

The first line (#! /bin/bash) in the script specifies which shell should be used to interpret the commands in the script. Usually an operating system provides more than one shell. In this case we are using bash. The second line (# script to present some information) is a comment and does not have any effect when the script is executed. The life cycle of a bash script is much simpler than the life cycle of a C program. The script file can be edited using a text editor such as vim. After that, it is necessary to change its permission in the Linux file system so that we can make it executable. A script call can be done by prefixing the file's name with its location in the filesystem. So, an user can run the script in a shell by typing "path/my_info.sh", where path indicates the path necessary to find the script:

$> ./my_info.sh
System date:
Seg Jun 18 10:18:46 BRT 2012
Current directory:
/home/IPL/shell

Virtual Machines

A virtual machine is a hardware emulated in software. It combines together an interpreter, a runtime supporting system and a collection of libraries that the interpreted code can use. Typically the virtual machine interprets an assembly-like program representation. Therefore, the virtual machine bridges the gap between compilers and interpreters. The compiler transforms the program, converting it from a high-level language into low-level bytecodes. These bytecodes are then interpreted by the virtual machine.

One of the most important goals of virtual machines is portability. A virtualized program is executed directly by the virtual machine in such a way that this program's developer can be oblivious to the hardware where this virtual machine runs. As an example, Java programs are virtualized. In fact, the Java Virtual Machine (JVM) is probably the most well-known virtual machine in use today. Any hardware that supports the Java virtual machine can run Java programs. The virtual machine, in this case, ensures that all the different programs will have the same semantics. A slogan that describes this characteristic of Java programs is "write once, run anywhere". This slogan illustrates the cross-plataform benefits of Java. In order to guarantee this uniform behaviour, every JVM is distributed with a very large software library, the Java Application Program Interface. Parts of this library are treated in a special way by the compiler, and are implemented directly at the virtual machine level. Java [threads], for instance, are handled in such a way.

The Java programming language is very popular nowadays. The portability of the Java runtime environment is one of the key factors behind this popularity. Java was initially conceived as a programming language for embedded devices. However, by the time Java was been released, the World Wide Web was also making its revolutionary début. In the early 90's, the development of programs that could be downloaded and executed in web browsers was in high demand. Java would fill up this niche with the Java Applets. Today Java applets felt out of favour when compared to other alternatives such as JavaScript and Flash programs. However, by the time other technologies begun to be popular in the client side of web applications, Java was already one of the most used programming languages in the world. And, many years past the initial web revolution, the world watches a new unfolding in computer history: the rise of the smartphones as general purpose hardware. Again portability is at a premium, and again Java is an important player in this new market. The Android virtual machine, Dalvik is meant to run Java programs.

Just-in-Time Compilation

In general a compiled program will run faster than its interpreted version. However, there are situations in which the interpreted code is faster. As an example, the shootout benchmark game contains some Java benchmarks that are faster than the equivalent C programs. The core technology behind this efficiency is the Just-in-Time Compiler, or JIT for short. The JIT compiler translates a program to binary code while this program is being interpreted. This setup opens up many possibilities for speculative code optimizations. In other words, the JIT compiler has access to the runtime values that are being manipulated by the program; thus, it can use these values to produce better code. Another advantage of the JIT compiler is that it does not need to compile every part of the program, but only those pieces of it that are reachable by the execution flow. And even in this case, the interpreter might decide to compile only the heavily executed parts of a function, instead of the whole function body.

The program below provides an example of a toy JIT compiler. If executed correctly, the program will print Result = 1234. Depending on the protection mechanisms adopted by the operating system, the program might not executed correctly. In particular, systems that apply Data Execution Prevention (DEP), will not run this program till the end. Our "JIT compiler" dumps some assembly instructions into an array called program, and then diverts execution to this array.

#include <stdio.h> 
#include <stdlib.h> 
int main(void) { 
  char* program; 
  int (*fnptr)(void); 
  int a; 
  program = malloc(1000);
  program[0] = 0xB8;
  program[1] = 0x34;
  program[2] = 0x12;
  program[3] = 0;
  program[4] = 0;
  program[5] = 0xC3;
  fnptr = (int (*)(void)) program;
  a = fnptr();
  printf("Result = %X\n",a);
}

In general a JIT works in a way similar to the program above. It compiles the interpreted code, and dumps the result of this compilation, the binary code, into a memory array that is marked as executable. Then the JIT changes the execution flow of the interpreter to point to the newly written memory area. In order to give the reader a general picture of a JIT compiler, the figure below shows Trace Monkey, one of the compilers used by the Mozilla Firefox browser to run JavaScript programs.

TraceMonkey is a trace based JIT compiler. It does not compile whole functions. Rather, it converts to binary code only the most heavily executed paths inside a function. TraceMonkey is built on top of a JavaScript interpreter called SpiderMonkey. SpiderMonkey interprets bytecodes. In other words, the JavaScript source file is converted to a sequence of assembly-like instructions, and these instructions are interpreted by SpiderMonkey. The interpreter also monitors the program paths that are executed more often. After a certain program path reaches an execution threshold, it is translated to machine code. This machine code is a trace, that is, a linear sequence of instructions. The trace is then transformed in native code by nanojit, a JIT compiler used in the Tamarin JavaScript engine. Once the execution of this trace finishes, either due to normal termination or due to an exceptional condition, control comes back to the interpreter, which might find other traces to compile.

← Compiled Programs · Binding →