Emacs/Introduction to Emacs Lisp

Emacs Lisp is a programming language that belongs to the Lisp family of languages (which includes Scheme and Common Lisp). Lisp is the second-oldest programming language still in modern use (after Fortran), but although the Lisp community remains active it is now very small. As a result, most developers never have cause to learn Lisp, and a great many developers who use Emacs consider Emacs Lisp alien territory.

Although Emacs Lisp is not a mainstream programming language, it is a powerful language that is used to implement most of Emacs itself. This means that as an Emacs extension author you are using the same language, with access to all the same libraries, as was originally used to write the editor. This makes Emacs uniquely powerful; although other editors such as Eclipse accept extensions that modify their behaviour, no other editor can allow its behaviour to be tweaked at run-time with native code.

Simple examples of Emacs Lisp

edit

Emacs Lisp is such a fundamental part of the Emacs way of thinking that it doesn't make you go somewhere special to execute Lisp expressions; you can do it from any buffer. Right in the middle of writing a C function you can just drop some Emacs Lisp in there, execute it, and see a result right away.

Go into any Emacs buffer and type the following:

(+ 1 2)

Position the cursor after the close parenthesis and type C-x C-e (control and x, followed by control and e). The expression will be evaluated and the result will be displayed in the minibuffer. This expression simply adds the values 1 and 2, so the resultant value 3 should appear in the minibuffer.

Though the calculation is simple, the way it's expressed might catch some people out. Expressions in Emacs Lisp always take the same form: Open parenthesis, a function identifier, a list of arguments to that function, and finally a close parenthesis. All the expression above means is that we're calling the + function (addition) and giving it the arguments of 1 and 2.

This way of writing operations is known as Polish notation (also called Polish prefix notation or simply prefix notation). Seeing addition written this way seems unnatural to most programmers at first, since they are used to seeing mathematical expressions written with the operator in the middle, so-called infix notation. However, most programmers are also familiar with calling functions, where the function name appears before the arguments. Most programming languages make a distinction between operators, which use infix notation, and functions, which use prefix notation. In languages of the Lisp family there is no such distinction, and all calls use prefix notation. Though this takes some getting used to, it provides the benefit that all code follows the same structure, which makes it much easier for Lisp code to read and write Lisp code, thereby making the language much better at introspection.

In case this bothers you, it's worth reminding yourself just how arbitrary the dividing line between functions and operators can be. Particularly in object-oriented languages that allow operators to be re-defined or specialised for classes, they are really just syntactic sugar to function calls anyway.

There can be any number of arguments in a function call (assuming the function supports it, which most functions will do if it is meaningful to do so):

(+ 1 2 3 4)

The arguments to a function can themselves be function calls. For example, the following expression adds the first three square numbers:

(+ (* 1 1) (* 2 2) (* 3 3))

The first element after the open parenthesis must always be a function identifier.

The parentheses are an integral part of the expression; if you remove the parenthesis from (+ 1 2) there is no Lisp expression there (to be precise, there are three separate and unconnected Lisp expressions, and no function evaluation). Extra parentheses aren't harmless, like they are in some languages. Consider the expression:

((+ 1 2 3))

The expression is evaluated recursively, so the inner expression is evaluated first, and this reduces to:

(6)

The meaning of this expression is applying the function 6 with no arguments, but since there is no such function, an error will be thrown. You can regard the parentheses as being like the parentheses on a function call in C, rather than like simple arithmetic parentheses for modifying operator precedence.

In fact, one of the benefits of the Lisp operator notation is that there is never any ambiguity in how the operators will be evaluated, so never any need for additional parentheses to disambiguate. For example, using infix notation, an expression like 5 * 4 + 3 is possible, which requires the reader to know precedence rules in order to know how it will be evaluated ((5 * 4) + 3 or 5 * (4 + 3)). In Lisp languages, the equivalent expression will be written (+(* 5 4) 3), so there is never any ambiguity.

Interacting with Emacs Lisp

edit

Although you can execute Emacs Lisp statements from any buffer in Emacs, it's most convenient to set aside a buffer for the purpose, to avoid messing up buffers that have important work in them. When you start Emacs it creates a special buffer for you, *scratch*, which isn't associated with a file (unless you decide to save its content later). This makes a good choice for writing and executing Emacs Lisp statements.

The *scratch* buffer has another advantage for Lisp execution, which is that, by default, it starts in Lisp interaction mode. In this mode you can execute any Emacs Lisp expression with C-j and the result will be inserted permanently into the buffer, rather than appearing temporarily in the minibuffer. You can put any buffer into Lisp interaction mode with M-x lisp-interaction-mode.

Defining functions

edit

As you would expect, you can define your own functions in Emacs Lisp, and they will work the same way as built-in functions. You define a function with a defun expression:

(defun my-add (x y)
   (+ x y))

Try typing this into emacs and evaluating the expression. Emacs should reply with:

my-add

The return value of defining a function is just the function itself, my-add in this case. This illustrates an important point: Every Lisp expression has a value. This is like the return value from a function in C, except that you don't have to explicitly return anything. Emacs will simply take the last expression in the function body and treat it as the return value. There are no void functions in Emacs Lisp, although the caller is free to ignore any uninteresting return values.

Having defined a new function, you can now use the function in the same way as the built-in functions:

(my-add 1 2)

This gives the expected result 3. Note that you can't pass an arbitrary number of parameters to this addition function, like you can with the built-in function (+). Don't worry, it's perfectly possible to create a function that can work in this way, and we'll learn how later.

Code and Data

edit

The simplest data structure in any Lisp is the list; indeed, lists give the Lisp programming language its name, which is a shortening of List Processing. The name is perhaps a misnomer, since Lisp can be used for far richer data structures than just lists, but the ubiquitous list provides a natural starting point for exploring Lisp programming.

Type the following into a Lisp evaluation buffer and execute it (C-j if you're in lisp interaction mode, C-x C-e otherwise):

(list 1 2 3)

Emacs will reply with:

(1 2 3)

Lists can contain any number of items (or zero items), and the items need not all have the same type. In fact, lists can contain other lists as entries, giving a nested data structure:

(list 1 2 "buckle my shoe" (list 3 4))

If you evaluate this, Emacs will reply with:

(1 2 "buckle my shoe" (3 4))

When you enter an expression, Emacs evaluates it and returns the result to you. The expression (list 1 2 3) evaluates the list function with three arguments. The result of calling the list function is a list, which is returned to you: (1 2 3). So if (1 2 3) is correct syntax for a list, why must you call the list function, rather than typing the list in directly? Try executing the following Lisp expression in Emacs:

(1 2 3)

Emacs will respond with an error message[1]:

Lisp error: (invalid-function 1)

The problem here is that by entering the list and asking Emacs to evaluate it, you're asking for it to be treated as Lisp code, not as data. Lisp code consists of one or more lists of items, where the first item in each list is required to be a function identifier, and the remainder of the items in the list are arguments to the function (which can themselves be expressions to be evaluated). Since you've attempted to evaluate the expression (1 2 3), Emacs assumes that 1 must be a function identifier, and when it fails to find such a function it throws an error. The same problem doesn't happen with (list 1 2 3), since list is a function; this is code, not data.

There's an important point in the above paragraph that bears being repeated for emphasis: Lisp code is simply Lisp data, and all Lisp data can be treated as code. This is perhaps the key thing that differentiates Lisp from all other mainstream programming languages. It makes it relatively easy for Lisp code to generate further Lisp code at run time. This one simple design decision exponentially increases the richness of the language at a stroke, since higher-order functionality that would require special language support in other languages can simply be written using ordinary Lisp code.

If the only difference between Lisp code and data is that code is evaluated and data is not, then we must have some way of communicating to Lisp whether or not some particular data is intended to be evaluated. As we saw above, one way to do this is with a trivial function like list, which just returns its arguments. This is one way round the problem, but it isn't an elegant one. By using list, we haven't succeeded in preventing evaluation from taking place, we've just written an expression where the evaluation is trivial. More seriously, with nested lists you have to modify each level of nested list (such as (list 1 2 (list 3 4)) rather than just the outer one.

Since the default behaviour of Lisp is to evaluate lists, all we need in order to control evaluation is some way of preventing evaluation. This is done with the quote operator, which is written like any other Lisp function call but causes its argument not to be evaluated. quote takes only one argument, but the argument can be a list, which of course includes nested sub-lists. Try evaluating the following:

(quote (1 2))
(quote (1 (2 3) 4))
(quote 1 2)

The third form will throw an error since quote only allows one argument. The second form shows that the sub-expression (1 2) isn't evaluated, even though normal evaluation works by evaluating each of the arguments in turn.

The behaviour of the nested expression shows something important: quote isn't an ordinary Lisp function. If you tried to implement quote yourself you'd find it impossible, since the arguments to quote would be evaluated before your custom function was even called. Since the error (evaluating (1 2) in the above example) would be thrown before your function would be called, there's no way for your code to recover from it.

Instead of being a Lisp function, quote is one of a small number of special forms. A special form has the same syntax as a Lisp function, but has special behaviour provided by the Lisp interpreter that an ordinary function couldn't provide.

You might wonder why it's OK to evaluate certain expressions but not others. For example, all the following expressions can be evaluated, even though none of them contain functions:

12
"twelve"
nil
:thing

Even though none of these expressions are quoted they all evaluate to themselves. The reason for this is that Lisp regards certain values as self-evaluating, which means that they will return their own value if evaluated as a Lisp expression. This works for integers and strings, and in general where there is no possibility of ambiguity. It can't work with lists, since ordinary code is expressed with the syntax of a list, so if lists evaluated to themselves then no code could be evaluated.

Since quote is expected to be used often there is a convenient syntax sugar - an apostrophe character ':

(quote (1 2))
'(1 2)

Both expressions evaluates to the same (1 2) list.


Variables and scope

edit

Unlike some programming languages based on the functional paradigm, Emacs Lisp has mutable variables that will be immediately familiar to most developers. Variables in Emacs Lisp are untyped, which means that a variable can hold any value you care to give it: numbers, strings, even functions can be assigned to variables. The same variable can hold an integer at one point and a function definition a few lines later (though most developers will recognise that this is not good practice).

The simplest way to work with variables in Emacs Lisp is using global variables. As the name implies, these are available to read and write anywhere in the program and retain their values permanently.

One thing to be careful of when using global variables in Emacs Lisp is that there is no concept of namespacing that can be used to separate references to global variables in your library from global variables in someone else's library, or in the Emacs core. It's therefore good practice to prefix your global variables with a string that is specific to your library.

Global variables are widely used within Emacs (and within third-party Emacs Lisp packages) to hold simple configuration settings for a module. Although unrestrained use of global variables makes for code that is hard to follow, when used judiciously it provides a way of controlling configuration with minimal overhead. Global variables that are used for configuration will often have documentation associated with them. You can access this documentation by moving the cursor over the variable in question and typing C-h v.

You can assign a value to an existing variable, or create a new one, by using the setq special form:


 

To do:
If we haven't already explained what a special form is, some further explanation should be given here


(setq some-variable 12)

The setq form assigns the value of its second argument to the variable given in its first argument.

  • Scoped variables
  • let and let*
  • Buffer-local variables

Functions as first-class objects

edit
  • Passing functions as arguments
  • Lambda functions
  • Comparison between lambda and defun

Functions that write functions

edit
  • define-skeleton as an example

Notes

edit
  1. In fact, Emacs will give you a full backtrace to show where the problem occurred; details have been left out here for clarity, but don't be surprised if your error message looks more complex than described here