Tcl Programming/Print version

Tcl: the Tool Command language

edit

Introduction

edit

So what is Tcl?

edit

The name Tcl is derived from "Tool Command Language" and is pronounced "tickle". Tcl is a radically simple open-source interpreted programming language that provides common facilities such as variables, procedures, and control structures as well as many useful features that are not found in any other major language. Tcl runs on almost all modern operating systems such as Unix, Macintosh, and Windows (including Windows Mobile).

While Tcl is flexible enough to be used in almost any application imaginable, it does excel in a few key areas, including: automated interaction with external programs, embedding as a library into application programs, language design, and general scripting.

Tcl was created in 1988 by John Ousterhout and is distributed under a BSD style license (which allows you everything GPL does, plus closing your source code). The current stable version, in February 2008, is 8.5.1 (8.4.18 in the older 8.4 branch).

The first major GUI extension that works with Tcl is Tk, a toolkit that aims to rapid GUI development. That is why Tcl is now more commonly called Tcl/Tk.

The language features far-reaching introspection, and the syntax, while simple, is very different from the Fortran/Algol/C++/Java world. Although Tcl is a string based language there are quite a few object-oriented extensions for it like Snit, incr Tcl, and XOTcl to name a few.

Tcl was originally developed as a reusable command language for experimental computer aided design (CAD) tools. The interpreter is implemented as a C library that could be linked into any application. It is very easy to add new functions to the Tcl interpreter, so it is an ideal reusable "macro language" that can be integrated into many applications.

However, Tcl is a programming language in its own right, which can be roughly described as a cross-breed between

  • LISP/Scheme (mainly for its tail-recursion capabilities),
  • C (control structure keywords, expr syntax) and
  • Unix shells (but with more powerful structuring).

One language, many styles

edit

Although a language where "everything is a command" appears like it must be "imperative" and "procedural", the flexibility of Tcl allows one to use functional or object-oriented styles of programming very easily. See "Tcl examples" below for ideas what one can do.

The traditional, "procedural" approach would be

proc mean list {
   set sum 0.
   foreach element $list {set sum [expr {$sum + $element}]}
   return [expr {$sum / [llength $list]}]
}


Here is yet another style (not very fast on long lists, but depends on nothing but Tcl). It works by building up an expression, where the elements of the lists are joined with a plus sign, and then evaluating that:

proc mean list {expr double([join $list +])/[llength $list]}

From Tcl 8.5, with math operators exposed as commands, and the expand operator, this style is better:

proc mean list {expr {[tcl::mathop::+ {*}$list]/double([llength $list])}}

or, if you have imported the tcl::mathop operators, just

proc mean list {expr {[+ {*}$list]/double([llength $list])}}

Note that all of the above are valid stand alone Tcl scripts.

It is also very easy to implement other programming languages (be they (reverse) polish notation, or whatever) in Tcl for experimenting. One might call Tcl a "CS Lab". For instance, here's how to compute the average of a list of numbers in Tcl (after first writing somewhat more Tcl to implement a J-like functional language - see Tacit programming in examples):

Def mean = fork /. sum llength

or, one could implement a RPN language similar to FORTH or Postscript and write:

 : mean  dup sum swap size double / ;


A more practical aspect is that Tcl is very open for "language-oriented programming" - when solving a problem, specify a (little) language which most simply describes and solves that problem - then go implement that language...

Why should I use Tcl?

edit

Good question. The general recommendation is: "Use the best tool for the job". A good craftsman has a good set of tools, and knows how to use them best.

Tcl is a competitor to other scripting languages like awk, Perl, Python, PHP, Visual Basic, Lua, Ruby, and whatever else will come along. Each of these has strengths and weaknesses, and when some are similar in suitability, it finally becomes a matter of taste.

Points in favour of Tcl are:

  • simplest syntax (which can be easily extended)
  • cross-platform availability: Mac, Unix, Windows, ...
  • strong internationalization support: everything is a Unicode string
  • robust, well-tested code base
  • the Tk GUI toolkit speaks Tcl natively
  • BSD license, which allows open-source use like GPL, as well as closed-source
  • a very helpful community, reachable via newsgroup, Wiki, or chat :)

Tcl is not the best solution for every problem. It is however a valuable experience to find out what is possible with Tcl.

Example: a tiny web server

edit

Before spoon-feeding the bits and pieces of Tcl, a slightly longer example might be appropriate, just so you get the feeling how it looks. The following is, in 41 lines of code, a complete little web server that serves static content (HTML pages, images), but also provides a subset of CGI functionality: if an URL ends with .tcl, a Tcl interpreter is called with it, and the results (a dynamically generated HTML page) served.

Note that no extension package was needed - Tcl can, with the socket command, do such tasks already pretty nicely. A socket is a channel that can be written to with puts. The fcopy copies asynchronously (in the background) from one channel to another, where the source is either a process pipe (the "exec tclsh" part) or an open file.

This server was tested to work pretty well even on 200MHz Windows 95 over a 56k modem, and serving several clients concurrently. Also, because of the brevity of the code, this is an educational example for how (part of) HTTP works.

# DustMotePlus - with a subset of CGI support
set root      c:/html
set default   index.htm
set port      80
set encoding  iso8859-1
proc bgerror msg {puts stdout "bgerror: $msg\n$::errorInfo"}
proc answer {sock host2 port2} {
    fileevent $sock readable [list serve $sock]
}
proc serve sock {
    fconfigure $sock -blocking 0
    gets $sock line
    if {[fblocked $sock]} {
        return
    }
    fileevent $sock readable ""
    set tail /
    regexp {(/[^ ?]*)(\?[^ ]*)?} $line -> tail args
    if {[string match */ $tail]} {
        append tail $::default
    }
    set name [string map {%20 " " .. NOTALLOWED} $::root$tail]
    if {[file readable $name]} {
        puts $sock "HTTP/1.0 200 OK"
        if {[file extension $name] eq ".tcl"} {
            set ::env(QUERY_STRING) [string range $args 1 end]
            set name [list |tclsh $name]
        } else {
            puts $sock "Content-Type: text/html;charset=$::encoding\n"
        }
        set inchan [open $name]
        fconfigure $inchan -translation binary
        fconfigure $sock   -translation binary
        fcopy $inchan $sock -command [list done $inchan $sock]
    } else {
        puts $sock "HTTP/1.0 404 Not found\n"
        close $sock
    }
}
proc done {file sock bytes {msg {}}} {
    close $file
    close $sock
}
socket -server answer $port
puts "Server ready..."
vwait forever

And here's a little "CGI" script I tested it with (save as time.tcl):

# time.tcl - tiny CGI script.
if {![info exists env(QUERY_STRING)]} {
    set env(QUERY_STRING) ""
}
puts "Content-type: text/html\n"
puts "<html><head><title>Tiny CGI time server</title></head>
<body><h1>Time server</h1>
Time now is: [clock format [clock seconds]]
<br>
Query was: $env(QUERY_STRING)
<hr>
<a href=index.htm>Index</a>
</body></html>"

Where to get Tcl/Tk

edit

On most Linux systems, Tcl/Tk is already installed. You can find out by typing tclsh at a console prompt (xterm or such). If a "%" prompt appears, you're already set. Just to make sure, type info pa at the % prompt to see the patchlevel (e.g. 8.4.9) and info na to see where the executable is located in the file system.

Tcl is an open source project. The sources are available from http://tcl.sourceforge.net/ if you want to build it yourself.

For all major platforms, you can download a binary ActiveTcl distribution from ActiveState. Besides Tcl and Tk, this also contains many popular extensions - it's called the canonical "Batteries Included" distribution.

Alternatively, you can get Tclkit: a Tcl/Tk installation wrapped in a single file, which you don't need to unwrap. When run, the file mounts itself as a virtual file system, allowing access to all its parts.

January 2006, saw the release of a new and promising one-file vfs distribution of Tcl; eTcl. Free binaries for Linux, Windows, and Windows Mobile 2003 can be downloaded from http://www.evolane.com/software/etcl/index.html . Especially on PocketPCs, this provides several features that have so far been missing from other ports: sockets, window "retreat", and can be extended by providing a startup script, and by installing pure-Tcl libraries.

First steps

edit

To see whether your installation works, you might save the following text to a file hello.tcl and run it (type tclsh hello.tcl at a console on Linux, double-click on Windows):

package require Tk
pack [label .l -text "Hello world!"]

It should bring up a little grey window with the greeting.

To make a script directly executable (on Unix/Linux, and Cygwin on Windows), use this first line (the # being flush left):

#!/usr/bin/env tclsh

or (in an older, deprecated tricky style):

#! /bin/sh
# the next line restarts using tclsh \
exec tclsh "$0" ${1+"$@"}

This way, the shell can determine which executable to run the script with.

An even simpler way, and highly recommended for beginners as well as experienced users, is to start up tclsh or wish interactively. You will see a % prompt in the console, can type commands to it, and watch its responses. Even error messages are very helpful here, and don't cause a program abort - don't be afraid to try whatever you like! Example:

$ tclsh
info patchlevel
8.4.12
expr 6*7
42
expr 42/0
divide by zero

You can even write programs interactively, best as one-liners:

proc ! x {expr {$x<=2? $x: $x*[! [incr x -1]]}}
! 5
120

For more examples, see the chapter "A quick tour".


Syntax

edit

Syntax is just the rules how a language is structured. A simple syntax of English could say (ignoring punctuation for the moment):

  • A text consists of one or more sentences
  • A sentence consists of one or more words

Simple as this is, it also describes Tcl's syntax very well - if you say "script" for "text", and "command" for "sentence". There's also the difference that a Tcl word can again contain a script or a command. So

if {$x < 0} {set x 0}

is a command consisting of three words: if, a condition in braces, a command (also consisting of three words) in braces.

Take this for example

is a well-formed Tcl command: it calls Take (which must have been defined before) with the three arguments "this", "for", and "example". It is up to the command how it interprets its arguments, e.g.

puts acos(-1)

will write the string "acos(-1)" to the stdout channel, and return the empty string "", while

expr acos(-1)

will compute the arc cosine of -1 and return 3.14159265359 (an approximation of Pi), or

string length acos(-1)

will invoke the string command, which again dispatches to its length sub-command, which determines the length of the second argument and returns 8.

Quick summary

edit

A Tcl script is a string that is a sequence of commands, separated by newlines or semicolons.

A command is a string that is a list of words, separated by blanks. The first word is the name of the command, the other words are passed to it as its arguments. In Tcl, "everything is a command" - even what in other languages would be called declaration, definition, or control structure. A command can interpret its arguments in any way it wants - in particular, it can implement a different language, like expr.

A word is a string that is a simple word, or one that begins with { and ends with the matching } (braces), or one that begins with " and ends with the matching ". Braced words are not evaluated by the parser. In quoted words, substitutions can occur before the command is called:

  • $[A-Za-z0-9_]+ substitutes the value of the given variable. Or, if the variable name contains characters outside that regular expression, another layer of bracing helps the parser to get it right:
puts "Guten Morgen, ${Schüler}!"

If the code would say $Schüler, this would be parsed as the value of variable $Sch, immediately followed by the constant string üler.

  • (Part of) a word can be an embedded script: a string in [] brackets whose contents are evaluated as a script (see above) before the current command is called.

In short: Scripts and commands contain words. Words can again contain scripts and commands. (This can lead to words more than a page long...)

Arithmetic and logic expressions are not part of the Tcl language itself, but the language of the expr command (also used in some arguments of the if, for, while commands) is basically equivalent to C's expressions, with infix operators and functions. See separate chapter on expr below.


The man page: 11 rules

edit

Here is the complete manpage for Tcl (8.4) with the "endekalogue", the 11 rules. (From 8.5 onward there is a twelfth rule regarding the {*} feature).

The following rules define the syntax and semantics of the Tcl language:

(1) Commands A Tcl script is a string containing one or more commands. Semi-colons and newlines are command separators unless quoted as described below. Close brackets are command terminators during command substitution (see below) unless quoted.

(2) Evaluation A command is evaluated in two steps. First, the Tcl interpreter breaks the command into words and performs substitutions as described below. These substitutions are performed in the same way for all commands. The first word is used to locate a command procedure to carry out the command, then all of the words of the command are passed to the command procedure. The command procedure is free to interpret each of its words in any way it likes, such as an integer, variable name, list, or Tcl script. Different commands interpret their words differently.

(3) Words Words of a command are separated by white space (except for newlines, which are command separators).

(4) Double quotes If the first character of a word is double-quote (") then the word is terminated by the next double-quote character. If semi-colons, close brackets, or white space characters (including newlines) appear between the quotes then they are treated as ordinary characters and included in the word. Command substitution, variable substitution, and backslash substitution are performed on the characters between the quotes as described below. The double-quotes are not retained as part of the word.

(5) Braces If the first character of a word is an open brace ({) then the word is terminated by the matching close brace (}). Braces nest within the word: for each additional open brace there must be an additional close brace (however, if an open brace or close brace within the word is quoted with a backslash then it is not counted in locating the matching close brace). No substitutions are performed on the characters between the braces except for backslash-newline substitutions described below, nor do semi-colons, newlines, close brackets, or white space receive any special interpretation. The word will consist of exactly the characters between the outer braces, not including the braces themselves.

(6) Command substitution If a word contains an open bracket ([) then Tcl performs command substitution. To do this it invokes the Tcl interpreter recursively to process the characters following the open bracket as a Tcl script. The script may contain any number of commands and must be terminated by a close bracket (]). The result of the script (i.e. the result of its last command) is substituted into the word in place of the brackets and all of the characters between them. There may be any number of command substitutions in a single word. Command substitution is not performed on words enclosed in braces.

(7) Variable substitution If a word contains a dollar-sign ($) then Tcl performs variable substitution: the dollar-sign and the following characters are replaced in the word by the value of a variable. Variable substitution may take any of the following forms:

$name

Name is the name of a scalar variable; the name is a sequence of one or more characters that are a letter, digit, underscore, or namespace separators (two or more colons).

$name(index)

Name gives the name of an array variable and index gives the name of an element within that array. Name must contain only letters, digits, underscores, and namespace separators, and may be an empty string. Command substitutions, variable substitutions, and backslash substitutions are performed on the characters of index.

${name}

Name is the name of a scalar variable. It may contain any characters whatsoever except for close braces. There may be any number of variable substitutions in a single word. Variable substitution is not performed on words enclosed in braces.

(8) Backslash substitution If a backslash (\) appears within a word then backslash substitution occurs. In all cases but those described below the backslash is dropped and the following character is treated as an ordinary character and included in the word. This allows characters such as double quotes, close brackets, and dollar signs to be included in words without triggering special processing. The following table lists the backslash sequences that are handled specially, along with the value that replaces each sequence.

\a
Audible alert (bell) (0x7).
\b
Backspace (0x8).
\f
Form feed (0xc).
\n
Newline (0xa).
\r
Carriage-return (0xd).
\t
Tab (0x9).
\v
Vertical tab (0xb).
\<newline>whiteSpace
A single space character replaces the backslash, newline, and all spaces and tabs after the newline. This backslash sequence is unique in that it is replaced in a separate pre-pass before the command is actually parsed. This means that it will be replaced even when it occurs between braces, and the resulting space will be treated as a word separator if it isn't in braces or quotes.
\\
Literal backslash (\), no special effect.
\ooo
The digits ooo (one, two, or three of them) give an eight-bit octal value for the Unicode character that will be inserted. The upper bits of the Unicode character will be 0.
\xhh
The hexadecimal digits hh give an eight-bit hexadecimal value for the Unicode character that will be inserted. Any number of hexadecimal digits may be present; however, all but the last two are ignored (the result is always a one-byte quantity). The upper bits of the Unicode character will be 0.
\uhhhh
The hexadecimal digits hhhh (one, two, three, or four of them) give a sixteen-bit hexadecimal value for the Unicode character that will be inserted.

Backslash substitution is not performed on words enclosed in braces, except for backslash-newline as described above.

(9) Comments If a hash character (#) appears at a point where Tcl is expecting the first character of the first word of a command, then the hash character and the characters that follow it, up through the next newline, are treated as a comment and ignored. The comment character only has significance when it appears at the beginning of a command.

(10) Order of substitution Each character is processed exactly once by the Tcl interpreter as part of creating the words of a command. For example, if variable substitution occurs then no further substitutions are performed on the value of the variable; the value is inserted into the word verbatim. If command substitution occurs then the nested command is processed entirely by the recursive call to the Tcl interpreter; no substitutions are performed before making the recursive call and no additional substitutions are performed on the result of the nested script. Substitutions take place from left to right, and each substitution is evaluated completely before attempting to evaluate the next. Thus, a sequence like

set y [set x 0][incr x][incr x]

will always set the variable y to the value, 012.

(11) Substitution and word boundaries Substitutions do not affect the word boundaries of a command. For example, during variable substitution the entire value of the variable becomes part of a single word, even if the variable's value contains spaces.

Comments

edit

The first rule for comments is simple: comments start with # where the first word of a command is expected, and continue to the end of line (which can be extended, by a trailing backslash, to the following line):

# This is a comment \
going over three lines \
with backslash continuation

One of the problems new users of Tcl meet sooner or later is that comments behave in an unexpected way. For example, if you comment out part of code like this:

# if {$condition} {
    puts "condition met!"
# }

This happens to work, but any unbalanced braces in comments may lead to unexpected syntax errors. The reason is that Tcl's grouping (determining word boundaries) happens before the # characters are considered.

To add a comment behind a command on the same line, just add a semicolon:

puts "this is the command" ;# that is the comment

Comments are only taken as such where a command is expected. In data (like the comparison values in switch), a # is just a literal character:

if $condition {# good place
   switch -- $x {
       #bad_place {because switch tests against it}
       some_value {do something; # good place again}
   }
}

To comment out multiple lines of code, it is easiest to use "if 0":

if 0 {
    puts "This code will not be executed"
    This block is never parsed, so can contain almost any code
    - except unbalanced braces :)
}

Data types

edit

In Tcl, all values are strings, and the phrase "Everything is a string" is often used to illustrate this fact. But just as 2 can be interpreted in English as "the number 2" or "the character representing the number 2", two different functions in Tcl can interpret the same value in two different ways. The command expr, for example, interprets "2" as a number, but the command string length interprets "2" as a single character. All values in Tcl can be interpreted either as characters or something else that the characters represent. The important thing to remember is that every value in Tcl is a string of characters, and each string of characters might be interpreted as something else, depending on the context. This will become more clear in the examples below. For performance reasons, versions of Tcl since 8.0 keep track of both the string value and how that string value was last interpreted. This section covers the various "types" of things that Tcl values (strings) get interpreted as.

Strings

edit

A string is a sequence of zero or more characters (where all 16-bit Unicodes are accepted in almost all situations, see in more detail below). The size of strings is automatically administered, so you only have to worry about that if the string length exceeds the virtual memory size.

In contrast to many other languages, strings in Tcl don't need quotes for markup. The following is perfectly valid:

set greeting Hello!

Quotes (or braces) are rather used for grouping:

set example "this is one word"
set another {this is another}

The difference is that inside quotes, substitutions (like of variables, embedded commands, or backslashes) are performed, while in braces, they are not (similar to single quotes in shells, but nestable):

set amount 42
puts "You owe me $amount" ;#--> You owe me 42
puts {You owe me $amount} ;#--> You owe me $amount

In source code, quoted or braced strings can span multiple lines, and the physical newlines are part of the string too:

set test "hello
world
in three lines"

To reverse a string, we let an index i first point at its end, and decrementing i until it's zero, append the indexed character to the end of the result res:

proc sreverse str {
set res ""
for {set i [string length $str]} {$i > 0} {} {
    append res [string index $str [incr i -1]]
} 
set res
}

sreverse "A man, a plan, a canal - Panama"
amanaP - lanac a ,nalp a ,nam A


Hex-dumping a string:

proc hexdump string {
    binary scan $string H* hex
    regexp -all -inline .. $hex
}

hexdump hello
68 65 6c 6c 6f

Finding a substring in a string can be done in various ways:

string first  $substr  $str ;# returns the position from 0, or -1 if not found
string match *$substr* $str ;# returns 1 if found, 0 if not
regexp $substr  $str ;# the same

The matching is done with exact match in string first, with glob-style match in string match, and as a regular expression in regexp. If there are characters in substr that are special to glob or regular expressions, using string first is recommended.

Lists

edit

Many strings are also well-formed lists. Every simple word is a list of length one, and elements of longer lists are separated by whitespace. For instance, a string that corresponds to a list of three elements:

set example {foo bar grill}

Strings with unbalanced quotes or braces, or non-space characters directly following closing braces, cannot be parsed as lists directly. You can explicitly split them to make a list.

The "constructor" for lists is of course called list. It's recommended to use when elements come from variable or command substitution (braces won't do that). As Tcl commands are lists anyway, the following is a full substitute for the list command:

proc list args {set args}

Lists can contain lists again, to any depth, which makes modelling of matrixes and trees easy. Here's a string that represents a 4 x 4 unit matrix as a list of lists. The outer braces group the entire thing into one string, which includes the literal inner braces and whitespace, including the literal newlines. The list parser then interprets the inner braces as delimiting nested lists.

{{1 0 0 0}
 {0 1 0 0}
 {0 0 1 0}
 {0 0 0 1}}

The newlines are valid list element separators, too.

Tcl's list operations are demonstrated in some examples:

set      x {foo bar}
llength  $x        ;#--> 2
lappend  x  grill  ;#--> foo bar grill
lindex   $x 1      ;#--> bar (indexing starts at 0)
lsearch  $x grill  ;#--> 2 (the position, counting from 0)
lsort    $x        ;#--> bar foo grill
linsert  $x 2 and  ;#--> foo bar and grill
lreplace $x 1 1 bar, ;#--> foo bar, grill

Note that only lappend, above is mutating. To change an element of a list (of a list...) in place, the lset command is useful - just give as many indexes as needed:

set test {{a b} {c d}}
{a b} {c d}
lset test 1 1 x
{a b} {c x}

The lindex command also takes multiple indexes:

lindex $test 1 1
x

Example: To find out whether an element is contained in a list (from Tcl 8.5, there's the in operator for that):

proc in {list el} {expr {[lsearch -exact $list $el] >= 0}}
in {a b c} b
1
in {a b c} d
#ignore this line, which is only here because there is currently a bug in wikibooks rendering which makes the 0 on the following line disappear when it is alone 
0

Example: remove an element from a list variable by value (converse to lappend), if present:

proc lremove {_list el} {
  upvar 1 $_list list
  set pos [lsearch -exact $list $el]
  set list [lreplace $list $pos $pos]
}

set t {foo bar grill}
foo bar grill
lremove t bar
foo grill
set t
foo grill

A simpler alternative, which also removes all occurrences of el:

proc lremove {_list el} {
  upvar 1 $_list list
  set list [lsearch -all -inline -not -exact $list $el]
}

Example: To draw a random element from a list L, we first determine its length (using llength), multiply that with a random number > 0.0 and < 1.0, truncate that to integer (so it lies between 0 and length-1), and use that for indexing (lindex) into the list:

proc ldraw L {
   lindex $L [expr {int(rand()*[llength $L])}]
}

Example: Transposing a matrix (swapping rows and columns), using integers as generated variable names:

proc transpose matrix {
   foreach row $matrix {
       set i 0
       foreach el $row {lappend [incr i] $el}
   }
   set res {}
   set i 0
   foreach e [lindex $matrix 0] {lappend res [set [incr i]]}
   set res
}

transpose {{1 2} {3 4} {5 6}}
{1 3 5} {2 4 6}

Example: pretty-printing a list of lists which represents a table:

proc fmtable table {
   set maxs {}
   foreach item [lindex $table 0] {
       lappend maxs [string length $item]
   }
   foreach row [lrange $table 1 end] {
       set i 0
       foreach item $row max $maxs {
           if {[string length $item]>$max} {
               lset maxs $i [string length $item]
           }
           incr i
       }
   }
   set head +
   foreach max $maxs {append head -[string repeat - $max]-+}
   set res $head\n
   foreach row $table {
       append res |
       foreach item $row max $maxs {append res [format " %-${max}s |" $item]}
       append res \n
   }
   append res $head
}

Testing:

fmtable {
   {1 short "long field content"}
   {2 "another long one" short}
   {3 "" hello}
}
+---+------------------+--------------------+
| 1 | short            | long field content |
| 2 | another long one | short              |
| 3 |                  | hello              |
+---+------------------+--------------------+

Enumerations: Lists can also be used to implement enumerations (mappings from symbols to non-negative integers). Example for a nice wrapper around lsearch/lindex:

proc makeEnum {name values} {
   interp alias {} $name: {} lsearch $values
   interp alias {} $name@ {} lindex $values
}

makeEnum fruit {apple blueberry cherry date elderberry}

This assigns "apple" to 0, "blueberry" to 1, etc.

fruit: date
3
fruit@ 2
cherry

Numbers

edit

Numbers are strings that can be parsed as such. Tcl supports integers (32-bit or even 64-bit wide) and "double" floating-point numbers. From Tcl 8.5 on, bignums (integers of arbitrarily large precision) are supported. Arithmetics is done with the expr command, which takes basically the same syntax of operators (including ternary x?y:z), parens, and math functions as C. See below for detailed discussion of expr.

Control the display format of numbers with the format command which does appropriate rounding:

expr 2/3.
0.666666666667
format %.2f [expr 2/3.]
0.67

Up to the 8.4 version (the present version is 8.5), Tcl honored the C convention that an integer starting with 0 is parsed as octal, so

0377 == 0xFF == 255

This changes in 8.5, though - too often people stumbled over "08" meant as hour or month, raised a syntax error, because 8 is no valid octal number. In the future you'd have to write 0o377 if you really mean octal. You can do number base conversions with the format command, where the format is %x for hex, %d for decimal, %o for octal, and the input number should have the C-like markup to indicate its base:

format %x 255
ff
format %d 0xff
255
format %o 255
377
format %d 0377
255

Variables with integer value can be most efficiently modified with the incr command:

incr i    ;# default increment is 1
incr j 2
incr i -1 ;# decrement with negative value
incr j $j ;# double the value

The maximal positive integer can be determined from the hexadecimal form, with a 7 in front, followed by several "F" characters. Tcl 8.4 can use "wide integers" of 64 bits, and the maximum integer there is

expr 0x7fffffffffffffff
9223372036854775807

Demonstration: one more, and it turns into the minimum integer:

expr 0x8000000000000000
-9223372036854775808

Bignums: from Tcl 8.5, integers can be of arbitrary size, so there is no maximum integer anymore. Say, you want a big factorial:

proc tcl::mathfunc::fac x {expr {$x < 2? 1: $x * fac($x-1)}}

expr fac(100)


93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000

IEEE special floating-point values: Also from 8.5, Tcl supports a few special values for floating-point numbers, namely Inf (infinity) and NaN (Not a Number):

set i [expr 1/0.]
Inf
expr {$i+$i}
Inf
expr {$i+1 == $i}
1
set j NaN ;# special because it isn't equal to itself
NaN
expr {$j == $j}
#ignore this line, which is only here because there is currently a bug in wikibooks rendering which makes the 0 on the following line disappear when it is alone 
0

Booleans

edit

Tcl supports booleans as numbers in a similar fashion to C, with 0 being false and any other number being true. It also supports them as the strings "true", "false", "yes" and "no" and few others (see below). The canonical "true" value (as returned by Boolean expressions) is 1.

foreach b {0 1 2 13 true false on off no yes n y a} {puts "$b -> [expr {$b?1:0}]"}
0 -> 0
1 -> 1
2 -> 1
13 -> 1
true -> 1
false -> 0
on -> 1
off -> 0
no -> 0
yes -> 1
n -> 0
y -> 1
expected boolean value but got "a"

Characters

edit

Characters are abstractions of writing elements (e.g. letters, digits, punctuation characters, Chinese ideographs, ligatures...). In Tcl since 8.1, characters are internally represented with Unicode, which can be seen as unsigned integers between 0 and 65535 (recent Unicode versions have even crossed that boundary, but the Tcl implementation currently uses a maximum of 16 bits). Any Unicode U+XXXX can be specified as a character constant with an \uXXXX escape. It is recommended to only use ASCII characters (\u0000-\u007f) in Tcl scripts directly, and escape all others.

Convert between numeric Unicode and characters with

set char [format %c $int]
set int  [scan $char %c]

Watch out that int values above 65535 produce 'decreasing' characters again, while negative int even produces two bogus characters. format does not warn, so better test before calling it.

Sequences of characters are called strings (see above). There is no special data type for a single character, so a single character is just a string on length 1 (everything is a string). In UTF-8, which is used internally by Tcl, the encoding of a single character may take from one to three bytes of space. To determine the bytelength of a single character:

string bytelength $c ;# assuming [string length $c]==1

String routines can be applied to single characters too, e.g [string toupper] etc. Find out whether a character is in a given set (a character string) with

expr {[string first $char $set]>=0}

As Unicodes for characters fall in distinct ranges, checking whether a character's code lies within a range allows a more-or-less rough classification of its category:

proc inRange {from to char} {
    # generic range checker
    set int [scan $char %c]
    expr {$int>=$from && $int <= $to}
}
interp alias {} isGreek {}    inRange 0x0386 0x03D6
interp alias {} isCyrillic {} inRange 0x0400 0x04F9
interp alias {} isHangul {}   inRange 0xAC00 0xD7A3

This is a useful helper to convert all characters beyond the ASCII set to their \u.... escapes (so the resulting string is strict ASCII):

proc u2x s {
   set res ""
   foreach c [split $s ""] {
     scan $c %c int
     append res [expr {$int<128? $c :"\\u[format %04.4X $int]"}]
   }
   set res
}

Internal representation

edit

In the main Tcl implementation, which is written in C, each value has both a string representation (UTF-8 encoded) and a structured representation. This is an implementation detail which allows for better performance, but has no semantic impact on the language. Tcl tracks both representations, making sure that if one is changed, the other one is updated to reflect the change the next time it is used. For example, if the string representation of a value is "8", and the value was last used as a number in an [expr] command, the structured representation will be a numeric type like a signed integer or a double-precision floating point number. If the value "one two three" was last used in one of the list commands, the structured representation will be a list structure. There are various other "types" on the C side which may be used as the structured representation. As of Tcl 8.5, only the most recent structured representation of a value is stored, and it is replaced with a different representation when necessary. This "dual-porting" of values helps avoid, repeated parsing or "stringification", which otherwise would happen often because each time a value is encountered in source code, it is interpreted as a string prior to being interpreted in its current context. But to the programmer, the view that "everything is a string" is still maintained.

These values are stored in reference-counted structures termed objects (a term that has many meanings). From the perspective of all code that uses values (as opposed to code implementing a particular representation), they are immutable. In practice, this is implemented using a copy-on-write strategy.

Variables

edit

Variables can be local or global, and scalar or array. Their names can be any string not containing a colon (which is reserved for use in namespace separators) but for the convenience of $-dereference one usually uses names of the pattern [A-Za-z0-9_]+, i.e. one or more letters, digits, or underscores.

Variables need not be declared beforehand. They are created when first assigned a value, if they did not exist before, and can be unset when no longer needed:

set foo    42     ;# creates the scalar variable foo
set bar(1) grill  ;# creates the array bar and its element 1
set baz    $foo   ;# assigns to baz the value of foo
set baz [set foo] ;# the same effect
info exists foo   ;# returns 1 if the variable foo exists, else 0
unset foo         ;# deletes the variable foo

Retrieving a variable's value with the $foo notation is only syntactic sugar for [set foo]. The latter is more powerful though, as it can be nested, for deeper dereferencing:

set foo   42
set bar   foo
set grill bar
puts [set [set [set grill]]] ;# gives 42

Some people might expect $$$grill to deliver the same result, but it doesn't, because of the Tcl parser. When it encounters the first and second $ sign, it tries to find a variable name (consisting of one or more letters, digits, or underscores) in vain, so these $ signs are left literally as they are. The third $ allows substitution of the variable grill, but no backtracking to the previous $'s takes place. So the evaluation result of $$$grill is $$bar. Nested [set] commands give the user more control.

Local vs. global

edit

A local variable exists only in the procedure where it is defined, and is freed as soon as the procedure finishes. By default, all variables used in a proc are local.

Global variables exist outside of procedures, as long as they are not explicitly unset. They may be needed for long-living data, or implicit communication between different procedures, but in general it's safer and more efficient to use globals as sparingly as possible. Example of a very simple bank with only one account:

 set balance 0 ;# this creates and initializes a global variable

 proc deposit {amount} {
    global balance
    set balance [expr {$balance + $amount}]
 }

 proc withdraw {amount} {
    set ::balance [expr {$::balance - $amount}]
 }

This illustrates two ways of referring to global variables - either with the global command, or by qualifying the variable name with the :: prefix. The variable amount is local in both procedures, and its value is that of the first argument to the respective procedure.

Introspection:

info vars ;#-- lists all visible variables
info locals
info globals

To make all global variables visible in a procedure (not recommended):

eval global [info globals]

Scalar vs. array

edit

All of the value types discussed above in Data types can be put into a scalar variable, which is the normal kind.

Arrays are collections of variables, indexed by a key that can be any string, and in fact implemented as hash tables. What other languages call "arrays" (vectors of values indexed by an integer), would in Tcl rather be lists. Some illustrations:

#-- The key is specified in parens after the array name
set         capital(France) Paris

#-- The key can also be substituted from a variable:
set                  country France
puts       $capital($country)

#-- Setting several elements at once:
array set   capital         {Italy Rome  Germany Berlin}

#-- Retrieve all keys:
array names capital    ;#-- Germany Italy France -- quasi-random order

#-- Retrieve keys matching a glob pattern:
array names capital F* ;#-- France

A fanciful array name is "" (the empty string, therefore we might call this the "anonymous array" :) which makes nice reading:

set (example) 1
puts $(example)

Note that arrays themselves are not values. They can be passed in and out of procedures not as $capital (which would try to retrieve the value), but by reference. The dict type (available from Tcl 8.5) might be better suited for these purposes, while otherwise providing hash table functionality, too.

System variables

edit

At startup, tclsh provides the following global variables:

argc
number of arguments on the command line
argv
list of the arguments on the command line
argv0
name of the executable or script (first word on command line)
auto_index
array with instructions from where to load further commands
auto_oldpath
(same as auto_path ?)
auto_path
list of paths to search for packages
env
array, mirrors the environment variables
errorCode
type of the last error, or {}, e.g. ARITH DIVZERO {divide by zero}
errorInfo
last error message, or {}
tcl_interactive
1 if interpreter is interactive, else 0
tcl_libPath
list of library paths
tcl_library
path of the Tcl system library directory
tcl_patchLevel
detailed version number, e.g. 8.4.11
tcl_platform
array with information on the operating system
tcl_rcFileName
name of the initial resource file
tcl_version
brief version number, e.g. 8.4

One can use temporary environment variables to control a Tcl script from the command line, at least in Unixoid systems including Cygwin. Example scriptlet:

set foo 42
if [info exists env(DO)] {eval $env(DO)}
puts foo=$foo

This script will typically report

 foo=42

To remote-control it without editing, set the DO variable before the call:

DO='set foo 4711' tclsh myscript.tcl

which will evidently report

foo=4711

Dereferencing variables

edit

A reference is something that refers, or points, to another something (if you pardon the scientific expression). In C, references are done with *pointers* (memory addresses); in Tcl, references are strings (everything is a string), namely names of variables, which via a hash table can be resolved (dereferenced) to the "other something" they point to:

puts foo       ;# just the string foo
puts $foo      ;# dereference variable with name of foo
puts [set foo] ;# the same

This can be done more than one time with nested set commands. Compare the following C and Tcl programs, that do the same (trivial) job, and exhibit remarkable similarity:

#include <stdio.h>
int main(void) {
  int    i =      42;
  int *  ip =     &i;
  int ** ipp =   &ip;
  int ***ippp = &ipp;
  printf("hello, %d\n", ***ippp);
  return 0;
}

...and Tcl:

set i    42
set ip   i
set ipp  ip
set ippp ipp
puts "hello, [set [set [set [set ippp]]]]"

The asterisks in C correspond to calls to set in Tcl dereferencing. There is no corresponding operator to the C & because, in Tcl, special markup is not needed in declaring references. The correspondence is not perfect; there are four set calls and only three asterisks. This is because mentioning a variable in C is an implicit dereference. In this case, the dereference is used to pass its value into printf. Tcl makes all four dereferences explicit (thus, if you only had 3 set calls, you'd see hello, i). A single dereference is used so frequently that it is typically abbreviated with $varname, e.g.

puts "hello, [set [set [set $ippp]]]"

has set where C uses asterisks, and $ for the last (default) dereference.

The hashtable for variable names is either global, for code evaluated in that scope, or local to a proc. You can still "import" references to variables in scopes that are "higher" in the call stack, with the upvar and global commands. (The latter being automatic in C if the names are unique. If there are identical names in C, the innermost scope wins).

Variable traces

edit

One special feature of Tcl is that you can associate traces with variables (scalars, arrays, or array elements) that are evaluated optionally when the variable is read, written to, or unset.

Debugging is one obvious use for that. But there are more possibilities. For instance, you can introduce constants where any attempt to change their value raises an error:

proc const {name value} {
  uplevel 1 [list set $name $value]
  uplevel 1 [list trace var $name w {error constant ;#} ]
}

const x 11
incr x
can't set "x": constant

The trace callback gets three words appended: the name of the variable; the array key (if the variable is an array, else ""), and the mode:

  • r - read
  • w - write
  • u - unset

If the trace is just a single command like above, and you don't want to handle these three, use a comment ";#" to shield them off.

Another possibility is tying local objects (or procs) with a variable - if the variable is unset, the object/proc is destroyed/renamed away.

Commands and Functions

edit

Commands

edit

Commands are basically divided into C-defined ones, procedures, and aliases (and, from Tcl 8.5 onwards, ensembles). You can rename any command with

rename oldname newname

To delete a command (make it no more reachable), use the empty string as new name:

rename oldname {}

Introspection: Get a list of all defined commands with

info commands

C-defined commands

edit

These are implemented in C and registered to be available in the Tcl interpreter. They can come from the Tcl core, or a loaded shared library (DLL) - for instance Tk.

To get a list of built-in commands, subtract the result of info procs from that of info commands:

set builtins {}
set procs [info procs]
foreach cmd [info commands] {
   if {[lsearch -exact $procs $cmd] == -1} {lappend builtins $cmd}
}

The following C-defined commands are available in a fresh tclsh. For detailed documentation, see the respective man pages, e.g. at http://www.tcl.tk/man/tcl8.5/TclCmd/ - I will characterize each only very briefly:

after
group of commands for timed events
after msec ?script?
waits, or executes the script after, some time
append varName arg..
appends the arguments to a string variable
array
group of commands for arrays
binary
group of commands for binary scanning and formatting
break
terminate current loop
case
deprecated, use switch
catch script ?varName?
catch possible error in script
cd path
change working directory
clock
group of commands dealing with date and time
close handle
closes a channel (file, socket, etc.)
concat list..
make one space-separated list of the arguments
continue
start next turn of current loop
encoding
group of commands dealing with character set encoding
eof handle
1 if channel is at end of file, else 0
error message ?info? ?code?
raise an error with the given message
eval arg..
evaluate the arguments as script
exec file arg..
execute a separate process
exit ?int?
terminate this process, return status 0..127
expr arg..
arithmetic and logic engine, using C-like syntax and functions (variables referenced with $name). In addition, from Tcl 8.4 there are eq and ne operators for string equal or not; from 8.5, also in and ni operators for list inclusion or not
expr {"foo" in {foo bar grill}} == 1

The argument to expr should in most cases be {braced}. This prevents the Tcl parser from substituting variables in advance, while expr itself has to parse the value from the string. In a braced expression expr can parse variable references itself, and get their numeric value directly where possible. Much faster, usually. The only exception, where bracing should not be used, is if you want to substitute operators from variables:

foreach op {+ - * /} {puts [expr 1 $op 2]}
fblocked handle
returns 1 if the last input operation exhausted all available input, else 0
fconfigure handle -option value...
configure a channel, e.g. its encoding or line-end translation
fcopy handle1 handle2
copy data from handle1 to handle2
file
group of commands dealing with files
fileevent
group of commands dealing with events from channels (readable, writable) but not files
flush handle
make sure the channel's buffer is written out. Useful after puts -nonewline
for initbody condition stepbody body
loop, somehow similar to C's for
foreach varlist list ?varlist list...? body
loop over one or more lists, The varlists can be a single or multiple varNames. Example:
% foreach {x y} {1 0  1 2  0 2  0 0} {puts "x:$x, y:$y"}
x:1, y:0
x:1, y:2
x:0, y:2
x:0, y:0
format fstring arg..
put the arguments %-formatted into fstring, similar to C's sprintf()
gets handle ?varName?
read a line from handle. If variable is given, assigns the line to it and returns the number of characters read; else returns the line. Guaranteed to be safe against buffer overflows
glob ?-options? pattern..
list of files matching the glob pattern (which can contain * and ? wildcards)
global varName..
declare the given variable(s) as global
history
list the last interactive commands
if condition ?then? body1 ?elseif condition body2...? ??else? bodyN?
conditional
incr varName ?amount?
increments the integer variable by given amount (defaults to 1). Use negative amount to decrement
info
group of commands for introspection
interp
group of commands for interpreters
join list ?separator?
Turn a list into a string, with separator between elements (defaults to " ")
lappend varName arg..
appends the arguments to the list variable. Can also be used to make sure a variable exists:
lappend x ;# corresponds to: if {![info exists x]} {set x ""}
lindex list int..
retrieve an element from the list by integer index(es)
linsert list int arg..
inserts the arguments at int position into list
list ?arg..?
creates a list from the arguments
llength list
length of the list
load filename ?name?
loads a shared library (DLL)
lrange list from to
returns a sublist at integer indexes from-to
lreplace list from to arg..
replaces the sublist in list with the arguments
lsearch ?-options? list element
searches the list for the element, returns its integer index, or -1 if not found. Can be used to select a subset of elements from a list (using the -all option)
lset varName int.. value
sets an existing element in the named list variable, indexed by integer(s), to the given value
lsort ?-options? list
sorts the list
namespace
group of commands dealing with namespaces
open name ?mode ?permissions??
opens a file or pipe, returns the handle
package
group of commands dealing with packages
pid ?handle?
returns the id of the current process. Can also return the list of pids for a pipeline given the pipeline channel
proc name arglist body
defines a procedure
puts ?-nonewline? ?channel? string
outputs a line to the given channel (default stdout) To prevent errors from closed pipe (like more or head), use
proc puts! str {if [catch {puts $str}] exit}
pwd
returns the current working directory
read handle ?int?
reads int bytes from handle (all if int not given)
regexp ?-options? re string ?varName...?
regular expression matching of re in string, possibly assigning parenthesized submatches to the given variables
regsub ?-options? re value substring ?varName?
substitutes occurrences of the regular expression re in value with substring. If varName is given, assigns the new value to it, and returns the number of substitutions; else returns the new value
rename cmdName1 cmdName2
renames a command from cmdName1 to cmdName2, or deletes cmdName1 if cmdName2 is {}
return ?value?
exits from the current proc or sourced script
scan string format ?varName...?
extracts values by %-format in string to the given variables. Similar to C's sscanf()
seek channelId offset ?origin?
moves pointer in file to the given position
set varName ?value?
sets variable to value if given, and returns the variable's value
socket ?-myaddr addr? ?-myport myport? ?-async? host port
open the client side of a TCP connection as a channel
socket -server command ?-myaddr addr? port
open the server side of a TCP connection, register a handler callbacvk command for client requests
source filename
evaluate the contents of the given file
split list ?charset?
splits a string into a list, using any of the characters in charset string as delimiters (defaults to " ")
string
group of commands dealing with strings
subst ?-options? string
performs command, variable, and/or backslash substitutions in string
switch ?-options? ?--? value alternatives
performs one of the alternatives if the value matches
tell handle
return the byte position inside a file
time body ?int?
runs the body for int times (default 1), returns how many microseconds per iteration were used
trace
group of commands to tie actions to variables or commands
unset varName..
delete the given variable(s)
update ?idletasks?
services events
uplevel ?level? body
evaluates the body up in the call stack
upvar ?level? varName localVarName...
ties the given variables up in the call stack to the given local variables. Used for calling by reference, e.g. for arrays
variable varName ?value ?varName value...??
declare the variables as being non-local in a namespace
vwait varName
suspend execution until the given variable has changed. Starts event loop, if not active yet
while condition body
performs body as long as condition is not 0

Procedures

edit

Procedures in Tcl cover what other languages call procedures, subroutines, or functions. They always return a result (even if it is the empty string ""), so to call them functions might be most appropriate. But for historical reasons, the Tcl command to create a function is called proc and thus people most often call them procedures.

proc name argumentlist body

Examples:

proc sum {a b} {return [expr {$a+$b}]}

The return is redundant, as the proc returns anyway when reaching its end, and returning its last result:

proc sum {a b} {expr {$a+$b}}

The following variant is more flexible, as it takes any number of arguments (the special argument name args collects all remaining arguments into a list, which is the value of the parameter args):

proc sum args {
    set res 0
    foreach arg $args {set res [expr {$res + $arg}]}
    return $res
}

An elegant but less efficient alternative builds a string by joining the args with plus signs, and feeds that to expr:

proc sum args {expr [join $args +]}

If an argument in a proc definition is a list of two elements, the second is taken as default value if not given in the call ("Sir" in this example):

proc greet {time {person Sir}} {return "good $time, $person"}
% greet morning John
good morning, John
% greet evening
good evening, Sir

Introspection: Get the names of all defined procedures with

info procs

There are also info subcommands to get the argument list, and possible default arguments, and body of a proc. The following example combines them to recreate the textual form of a proc, given its name (corp being proc in reverse):

proc corp name {
   set argl {}
   foreach arg [info args $name] {
      if [info default $name $arg def] {lappend arg $def}
      lappend argl $arg
   }
   list proc $name $argl [info body $name]
}

Using rename, you can overload any command, including the C-coded ones. First rename the original command to something else, then reimplement it with the same signature, where ultimately the original is called. Here is for instance an overloaded proc that reports if a procedure is defined more than once with same name:

rename proc _proc
_proc proc {name argl body} {
   if {[info procs $name] eq $name} {
       puts "proc $name redefined in [info script]"
   }
   _proc $name $argl $body
}

Named arguments: Arguments to commands are mostly by position. But it's very easy to add the behavior known from Python or Ada, that arguments can be named in function calls, which documents the code a bit better, and allows any order of arguments.

The idea (as found in Welch's book) is to use an array (here called "" - the "anonymous array") keyed by argument names. Initially, you can set some default values, and possibly override them with the args of the proc (which has to be paired, i.e. contain an even number of elements):

proc named {args defaults} {
   upvar 1 "" ""
   array set "" $defaults
   foreach {key value} $args {
     if {![info exists ($key)]} {
        set names [lsort [array names ""]]
        error "bad option '$key', should be one of: $names"
     }
     set ($key) $value
   }
}

Usage example:

proc replace {s args} {
  named $args {-from 0 -to end -with ""}
  string replace $s $(-from) $(-to) $(-with)
}

Testing:

% replace suchenwirth -from 4 -to 6 -with xx
suchxxirth
% replace suchenwirth -from 4 -to 6 -witha xx
bad option '-witha', should be one of: -from -to -with

Argument passing by name or value

edit

Normally, arguments to commands are passed by value (as constants, or with $ prefixed to a variable name). This securely prevents side-effects, as the command will only get a copy of the value, and not be able to change the variable.

However, sometimes just that is wanted. Imagine you want a custom command to set a variable to zero. In this case, at call time specify the name of the variable (without $), and in the proc use upvar to link the name (in the scope "1 up", i.e. the caller's) to a local variable. I usually put a "_" before arguments that are variable names (e.g. _var), and upvar to the same name without "_" (e.g. var):

% proc zero _var {upvar 1 $_var var; set var 0}
% set try 42
42
% zero try
0
% set try
0

If you often use call by reference, you could indicate such arguments with a special pattern (e.g. &arg) and have the following code generate the necessary upvars:

proc use_refs { {char &}} {
   foreach v [uplevel 1 {info locals}] {
       if [string match $char* $v] {
           uplevel 1 "upvar 1 \${$v} [string range $v 1 end]"
       }
   }
}

That's all. This command is preferably called first inside a proc, and upvars all arguments that begin with a specific character, the default being "&" - it runs code like

upvar 1 ${&foo} foo

in the caller's scope. Testing:

proc test_refs {a &b} {
   use_refs
   puts a=$a,b=$b
   set b new_value
}
% set bar 42
42
% test_refs foo bar
a=foo,b=42

So the values of a (by value) and b (by reference) are readable; and the side effect of changing b in the caller did also happen:

% set bar
new_value

Variable scope

edit

Inside procedures, variables are by default local. They exist only in the proc, and are cleared up on return. However, you can tie local variables to others higher in the call stack (e.g. in the caller), up to the topmost global scope. Examples:

proc demo arg {
   global g
   set    g 0            ;# will effect a lasting change in g
   set local 1           ;# will disappear soon
   set ::anotherGlobal 2 ;# another way to address a global variable
   upvar 1 $arg myArg    ;# make myArg point at a variable 1-up
   set          myArg 3  ;# changes that variable in the calling scope
}

Aliases

edit

One can also define a command as an alias to a sequence of one or more words, which will be substituted for it before execution. (The funny {} arguments are names of the source and target interpreter, which typically is the current one, named by the empty string {} or ""). Examples:

interp alias {} strlen {} string length
interp alias {} cp     {} file copy -force

Introspection: Get the names of all defined aliases with

interp aliases

Advanced concepts

edit

Interpreters

edit

Tcl being an interpreted (plus on-the-fly byte-compiled) language, an interpreter is of course a central kind of object. Every time Tcl is running, at least one interpreter is running, who takes scripts and evaluates them.

One can also create further "slave" interpreters to encapsulate data and processes, which again can have their "sub-slaves", etc., ultimately forming a tree hierarchy. Examples:

% interp create helper
helper
% helper eval {expr 7*6}
42
% interp delete helper
% helper eval {expr 1+2}
invalid command name "helper"

By deleting an interpreter, all its global (and namespaced) variables are freed too, so you can use this for modularisation and encapsulation, if needed.

In particular, safe interpreters have intentionally limited capabilities (for instance, access to the file system or the Web) so that possibly malicious code from over the Web cannot create major havoc.

Introspection: The following command lists the sub-interpreters ("slaves") of the current interpreter:

% interp slaves

Ensembles

edit

Ensembles, (from Tcl 8.5 on), are commands that are composed out of sub-commands according to a standard pattern. Examples include Tcl's built-in chan and clock commands. Dispatching of subcommands, as well as informative error message for non-existing subcommands, is built-in. Subcommands are in a dict structure called "-map", with alternating name and action. Very simple example:

namespace ensemble create -command foo -map \
      {bar {puts Hello} grill {puts World}}

creates a command foo that can be called like

% foo bar
Hello
% foo grill
World
% foo help
unknown or ambiguous subcommand "help": must be foo, or bar

Obviously, ensembles are also a good base for implementing object orientation, where the command is the name of the objects, and the map contains its methods.

Introspection: Serialize an ensemble's map with

namespace ensemble configure $name -map

Namespaces

edit

Namespaces are containers for procedures, non-local variables, and other namespaces. They form a tree structure, with the root at the global namespace named "::". Their names are built with :: as separators too, so ::foo::bar is a child of ::foo, which is a child of :: (similar to pathnames on Unix, where / is the separator as well as the root).

In a nutshell, a namespace is a separate area, or scope, where procedures and variables are visible and private to that scope.

To create a namespace, just evaluate some script (which may be empty) in it:

namespace eval ::foo {}

Now you can use it to define procedures or variables:

proc ::foo::test {} {puts Hello!}
set  ::foo::var 42

To get rid of a namespace (and clean up all its variables, procs, and child namespaces):

namespace delete ::foo

Introspection:

namespace children ::
info var namespace::*
info commands namespace::*

The following code gives an approximate size in bytes consumed by the variables and children of a Tcl namespace (of which ::, the global namespace, is of particular interest - all the (grand)*children are also added). If you call this proc repeatedly, you can observe whether data are piling up:

proc namespace'size ns {
  set sum [expr wide(0)]
  foreach var [info vars ${ns}::*] {
      if {[info exists $var]} {
          upvar #0 $var v
          if {[array exists v]} {
              incr sum [string bytelength [array get v]]
          } else {
              incr sum [string bytelength $v]
          }
      }
  }
  foreach child [namespace children $ns] {
      incr sum [namespace'size $child]
  }
  set sum
}

Usage example:

% puts [namespace'size ::]
179914

Threads

edit

Tcl users have traditionally been skeptical about threads (lightweight concurrent sub-processes) - the event loop model has proved pretty powerful, and simpler to debug. Originally from Tk, the event loop has moved into Tcl and serves

  • fileevents (more on channels than real files)
  • timed events
  • UI events (mouse or keyboard actions by the user)

However, there is a growing tendency to enable threads in Tcl builds. The underlying model is that every thread runs in an interpreter of its own, so it is mostly encapsulated from the rest of the world. Communication between threads must be done with explicit methods.

Packages and extensions

edit

Packages are Tcl's recommended way to modularize software, especially supportive libraries. The user most often uses the command

package require name ?version?

One can write packages in pure Tcl, or as wrappers for extensions which come with one or more compiled shared libraries (plus optionally one or more Tcl scripts). Popular extensions are:

  • BWidget (adds useful widgets to Tk - more below)
  • Expect (supports remote execution over networks)
  • Img (adds support of additional image file formats to Tk)
  • snack (sound input/output)
  • Snit (OO extension, with support for "megawidgets" in Tk)
  • sqlite (a tiny yet powerful SQL database)
  • tcllib (a collection of pure-Tcl packages - see below)
  • TclOO (the canonical object-orientation extension from 8.5)
  • tcltcc (a built-in C compiler - see below)
  • TclX (collection of system-related extensions, like signal handling)
  • tdom (XML parser, SAX or DOM, with XPath query support)
  • Tk (the cross-platform GUI toolkit, see below)
  • tkcon (a vastly extended console)
  • XOTcl (advanced dynamic OO extension)

A little example package

edit

The following script creates the trivial but educational package futil, which in a namespace of the same name implements two procs for reading and writing complete text files, and the little introspection helper function, futil::?. The command to register the package (package provide) is executed only after all went well - this way, buggy source code, which raises an error during package require, will not be registered. (Other bugs you'd have to spot and fix for yourself...)

Common Tcl distribution practice has the good habit of profound testing, typically in a separate test directory. On the other hand, including a self-test in the same file with the code makes editing easier, so after the package provide comes a section only executed if this file is sourced as a top- level script, which exercises the commands defined in futil. Whether the string read should be equal to the string written is debatable - the present implementation appends \n to the string if it doesn't end in one, because some tools complain or misbehave if they don't see a final newline.

If the tests do not run into an error either, even the required construction of a package index is fired - assuming the simplified case that the directory contains only one package. Otherwise, you'd better remove these lines, and take care of index creation yourself.

A script that uses this package will only have to contain the two lines

lappend ::auto_path <directory of this file>
package require futil

You can even omit the first line, if you install (copy) the directory with the source and pkgIndex.tcl below ${tcl_install_directory}/lib. }

namespace eval futil {
    set version 0.1
}

But now back to the single script that makes up the package (it would make sense to save it as futil.tcl). We provide a read and a write function in the futil namespace, plus a little introspection function ? that returns the names of available functions:

proc futil::read {filename} {
   set fp [open $filename]
   set string [::read $fp] ;# prevent name conflict with itself
   close $fp
   return $string
}
proc futil::write {filename string} {
   set fp [open $filename w]
   if {[string index $string end]!="\n"} {append string \n}
   puts -nonewline $fp $string
   close $fp
}
proc futil::? {} {lsort [info procs ::futil::*]}
# If execution comes this far, we have succeeded ;-)
package provide futil $futil::version
#--------------------------- Self-test code
if {[info ex argv0] && [file tail [info script]] == [file tail $argv0]} {
   puts "package futil contains [futil::?]"
   set teststring {
       This is a teststring
       in several lines...}
   puts teststring:'$teststring'
   futil::write test.tmp $teststring
   set string2 [futil::read test.tmp]
   puts string2:'$string2'
   puts "strings are [expr {$teststring==$string2? {}:{not}}] equal"
   file delete test.tmp ;# don't leave traces of testing
   # Simple index generator, if the directory contains only this package
   pkg_mkIndex -verbose [file dirn [info scr]] [file tail [info scr]]
}

Tcllib

edit

Tcllib is a collection of packages in pure Tcl. It can be obtained from sourceForge, but is also part of ActiveTcl. The following list of contained packages may not be complete, as Tcllib is steadily growing...

  • aes - Advanced Encryption Standard.
  • asn - asn.1 BER encoder/decoder
  • http/autoproxy - code to automate the use of HTTP proxy servers
  • base64 - Base64 encoding and decoding of strings and files.
  • bee - BitTorrent serialization encoder/decoder.
  • bibtex - Neil Madden's parser for bibtex files. Not fully complete yet, therefore not set for installation.
  • calendar - Calendar operations (see also tcllib calendar module).
  • cmdline - Various form of command line and option processing.
  • comm - Socket based interprocess communication. Emulates the form of Tk's send command.
  • control - procedures for tcl flow structures such as assert, do/until, do/while, no-op
  • counter - procedures for counters and histograms
  • crc - Computation of various CRC checksums for strings and files.
  • csv - manipulate comma separated value data
  • des - Data Encryption Standard. ::DES::des (not yet installed)
  • dns - interact with the Domain Name Service. dns::address, dns::cleanup, dns::cname, dns::configure, dns::name, dns::reset, dns::resolve, dns::status, dns::wait,
  • doctools - System for writing manpages/documentation in a simple, yet powerful format.
  • exif - exif::analyze exif::fieldnames
  • fileutil - Utilities for operating on files, emulating various unix command line applications (cat, find, file(type), touch, ...).
  • ftp - Client side implementation of FTP (File Transfer Protocol). In dire need of a rewrite.
  • ftpd - Server side implementation of FTP
  • grammar_fa - Operations on finite automatons.
  • html - generate HTML from a Tcl script. html::author, html::author, html::bodyTag, html::cell, html::checkbox, html::checkSet, html::checkValue, html::closeTag, html::default, html::description, html::description, html::end, html::eval, html::extractParam, html::font, html::for, html::foreach, html::formValue, html::getFormInfo, html::getTitle, html::h, html::h1, html::h2, html::h3, html::h4, html::h5, html::h6, html::hdrRow, html::head, html::head, html::headTag, html::if, html::init, html::init, html::keywords, html::keywords, html::mailto, html::meta, html::meta, html::minorList, html::minorMenu, html::openTag, html::paramRow, html::passwordInput, html::passwordInputRow, html::radioSet, html::radioValue, html::refresh, html::refresh, html::row, html::select, html::selectPlain, html::set, html::submit, html::tableFromArray, html::tableFromList, html::tagParam, html::textarea, html::textInput, html::textInputRow, html::title, html::title, html::urlParent, html::varEmpty, html::while,
  • htmldoc - This is not a true module but the place where tcllib 1.3 installed the tcllib documentation in HTML format.
  • htmlparse - procedures to permit limited manipulation of strings containing HTML. ::htmlparse::parse, ::htmlparse::debugCallback, ::htmlparse::mapEscapes, ::htmlparse::2tree, ::htmlparse::removeVisualFluff, ::htmlparse::removeFormDefs,
  • ident - RFC 1413 ident client protocol implementation
  • imap4 - currently undocumented code for interacting with an IMAP server
  • inifile - code to manipulate an initialization file. ::ini::open, ::ini::close, ::ini::commit, ::ini::sections, ::ini::keys, ::ini::value
  • dns/ip - Manipulation of IP addresses. ::ip::version, ::ip::is, ::ip::normalize, ::ip::equal, ::ip::prefix
  • irc - Internet Relay Chat procedures. irc::config, irc::connection,
  • javascript - generate Javascript for including in HTML pages. javascript::BeginJS, javascript::EndJS, javascript::MakeMultiSel, javascript::MakeClickProc, javascript::makeSelectorWidget, javascript::makeSubmitButton, javascript::makeProtectedSubmitButton, javascript::makeMasterButton, javascript::makeParentCheckbox, javascript::makeChildCheckbox
  • jpeg - edit comment blocks, get image dimensions and information, read exif data of images in the JPG format
  • ldap - Client side implementation of LDAP (Lightweight Directory Access Protocol).
  • log - general procedures for adding log entries to files ::log::levels, ::log::logMsg, ::log::lv2longform, ::log::lv2color,::log::lv2priority,
  • logger - ::logger::walk, ::logger::services, ::logger::enable, ::logger::disable (part of the log module)
  • math - general mathematical procedures. ::math::calculus, ::math::combinatorics, ::math::cov, ::math::fibonacci, ::math::integrate, ::math::interpolate, ::math::max, ::math::mean, ::math::min, ::math::optimize, ::math::product, ::math::random, ::math::sigma, ::math::statistics, ::math::stats, ::math::sum
  • md4 -  ::md4::md4, ::md4::hmac, ::md4::MD4Init, ::md4::MD4Update, ::md4::MD4Final
  • md5 - [fill in the description of this module] ::md5::md5, ::md5::hmac, ::md5::test, ::md5::time, ::md5::<<<
  • md5crypt -  ::md5crypt::md5crypt, ::md5crypt::aprcrypt
  • mime - ::mime::initialize, ::mime::parsepart, ::mime::finalize, ::smtp::sendmessage
  • multiplexer - [fill in the external interfaces]
  • ncgi - procedures for use in a CGI application. ::ncgi::reset, ::ncgi::urlStub, ::ncgi::urlStub
  • nntp - routines for interacting with a usenet news server. ::nntp::nntp, ::nntp::NntpProc, ::nntp::NntpProc, ::nntp::okprint, ::nntp::message,
  • ntp - network time protocol procedure ::ntp::time
  • png - edit comment blocks, get image dimensions and information for Portable Network Graphics format.
  • pop3 - Post Office Protocol functions for reading mail from a pop3 server. ::pop3::open, ::pop3::close, ::pop3::status,
  • pop3d - Post Office Protocol Server. pop3d::new
  • profiler -  ::profiler::tZero, ::profiler::tMark, ::profiler::stats, ::profiler::Handler, ::profiler::profProc, ::profiler::init
  • rc4 - stream encryption. ::rc4::rc4
  • report - format text in various report styles. ::report::report , ::report::defstyle, ::report::rmstyle,
  • sha1 -  ::sha1::sha1, ::sha1::hmac
  • smtpd - ::smtpd::start, ::smtpd::stop, ::smtpd::configure, ::smtpd::cget
  • snit - Snit's Not Incr Tcl - OO package. Delegation based. ::snit::type, ::snit::widget, ::snit::widgetadaptor
  • soundex::knuth - string matching based on theoretical sound of the letters
  • stooop - OO package. stooop::class, stooop::virtual, stooop::new, stooop::delete, stooop::classof
  • struct1 - Version 1 of struct (see below), provided for backward compatibility.
struct::list, ::struct::graph, ::struct::matrix, ::struct::queue, ::struct::stack, ::struct::Tree, ::struct::record, ::struct::skiplist, ::struct::prioqueue, new: ::struct::sets
  • tar - untar, list, and stat files in tarballs and create new tarballs
  • textutil - Utilities for working with larger bodies of texts. textutil::expand - the core for the expand macro processor.
  • tie - Persistence for Tcl arrays.
  • treeql - Tree Query Language, inspired by COST.
  • uri - Handling of uri/urls (splitting, joining, ...)
  • uuid - Creation of unique identifiers.

TclOO

edit

TclOO is a loadable package to provide a foundation for object orientation, designed so that specific OO flavors like Itcl, Snit, or XOTcl can build on it. But it's a usable OO system in itself, offering classes, multiple inheritance, mixins and filters. Here is some example code to give you an impression:

#!/usr/bin/env tclsh85
package require TclOO
namespace import oo::*
class create Account {
   constructor { {ownerName undisclosed}} {
       my variable total overdrawLimit owner
       set total 0
       set overdrawLimit 10
       set owner $ownerName
   }
   method deposit amount {
       my variable total
       set total [expr {$total + $amount}]
   }
   method withdraw amount {
       my variable {*}[info object vars [self]] ;# "auto-import" all variables
       if {($amount - $total) > $overdrawLimit} {
           error "Can't overdraw - total: $total, limit: $overdrawLimit"
       }
       set total [expr {$total - $amount}]
   }
   method transfer {amount targetAccount} {
       my variable total
       my withdraw $amount
       $targetAccount deposit $amount
       set total
   }
   destructor {
       my variable total
       if {$total} {puts "remaining $total will be given to charity"}
   }
}

tcltcc

edit

Tcltcc is a loadable package that wraps the Tiny C compiler (tcc) for use with Tcl. It can be used to

  • compile C code directly to memory
  • produce dynamic loadable libraries (DLLs) or executable files.

Convenience functions generate wrapper code, so that the user needs only write the really substantial C code.

Examples:

Wrap a C function into a Tcl command "on the fly" (here shown in an interactive session):

% package require tcc
0.2
% namespace import tcc::*
% cproc sigmsg {int i} char* {return Tcl_SignalMsg(i);} 
% sigmsg 4
illegal instruction

Produce a DLL with a fast implementation of Fibonacci numbers:

% set d [tcc::dll]
% $d ccode {
     static int fib(int n) {return n <= 2? 1 : fib(n-1) + fib(n-2);}
  }
% $d cproc fiboy {int n} int {return fib(n);}
% $d write -name fiboy
% load fiboy[info sharedlibextension]
% fiboy 20
6765

Produce a tclsh with an extra square command:

% set code [tcc::wrapCmd square {double x} double x_square {return x*x;}]
% append code {
    int AppInit(Tcl_Interp *interp) {
       int rc;
       rc = Tcl_CreateObjCommand(interp,"square",x_square,NULL,NULL);
           return Tcl_Init(interp);
    }
    int main(int argc, char *argv[]) {
        Tcl_Main(argc, argv, AppInit);
        return 0;
    }
}
% tcc $::tcc::dir exe t
% t add_file    $::tcc::dir/c/crt1.c
% t add_library tcl8.5
% t compile     $code
% t output_file mytclsh.exe
% exec mytclsh.exe {<<puts [square 5]}
25.0

Tcltcc is open source, LGPL licensed, available at http://code.google.com/p/tcltcc/ . The full functionality is at the current early stage (October 2007) only available on Windows 95/XP platforms, but in-memory compilation works on Linux too.

tDOM

edit

tDOM is a popular extension for XML/HTML processing, allowing both SAX-style "on-the-fly" parsing and the DOM approach of representing a whole XML element in memory.

Here is an example of a SAX-style application. The expat parser that comes with tDOM is instrumented with callbacks for element start, character data, and processing instructions. Elements, attributes, characters and processing instructions are counted, plus a tally for each element type is done.

 #!/usr/bin/env tclsh
 package require tdom
 #--- Callbacks for certain parser events
 proc el {name attlist} {
     global g
     incr ::nEl
     incr ::nAtt [llength $attlist]
     inc g($name)
 }
 proc ch data {
    incr ::nChar [string length $data]
 }
 proc pi {target data} {
    incr ::nPi
 }
 proc inc {varName {increment 1}} {
    upvar 1 $varName var
    if {![info exists var]} {set var 0}
    incr var $increment
 }
 #--- "main" loop
 if ![llength $argv] {puts "usage: $argv0 file..."}
 foreach file $argv {
     foreach i {nEl nAtt nChar nPi} {set $i 0} ;# reset counters
     array unset g
     set p [expat -elementstartcommand el \
            -characterdatacommand          ch \
            -processinginstructioncommand  pi ]
     if [catch {$p parsefile $file} res] {
                puts "error:$res"
     } else {
        puts "$file:\n$nEl elements, $nAtt attributes, $nChar characters,\
            $nPi processing instructions"
        foreach name [lsort [array names g]] {
            puts [format %-20s%7d $name $g($name)]
        }
    }
    $p free
 }

expr: the arithmetic & logical unit

edit

Overview

edit

Arithmetic and logical operations (plus some string comparisons) are in Tcl concentrated in the expr command. It takes one or more arguments, evaluates them as an expression, and returns the result. The language of the expr command (also used in condition arguments of the if, for, while commands) is basically equivalent to C's expressions, with infix operators and functions. Unlike C, references to variables have to be done with $var. Examples:

set a [expr {($b + sin($c))/2.}]
if {$a > $b && $b > $c} {puts "ordered"}
for {set i 10} {$i >= 0} {incr i -1} {puts $i...} ;# countdown

The difference between Tcl syntax and expr syntax can be contrasted like this:

[f $x $y]  ;# Tcl:  embedded command
 f($x,$y)  ;# expr: function call, comma between arguments

In another contrast to Tcl syntax, whitespace between "words" is optional (but still recommended for better readability :) And string constants must always be quoted (or braced if you wish):

if {$keyword eq "foo"} ...

Then again, Tcl commands can always be embedded into expressions, in square brackets as usual:

proc max {x y} {expr {$x>$y? $x: $y}}
expr {[max $a $b] + [max $c $d]}

In expressions with numbers of mixed types, integers are coerced to doubles:

% expr 1+2.
3.0

It is important to know that division of two integers is done as integer division:

% expr 1/2
0

You get the (probably expected) floating-point division if at least one operand is double:

% expr 1/2.
0.5

If you want to evaluate a string input by the user, but always use floating-point division, just transform it, before the call to expr, by replacing "/" with "*1./" (multiply with floating-point 1. before every division):

expr [string map {/ *1./} $input]

Brace your expressions

edit

In most cases it is safer and more efficient to pass a single braced argument to expr. Exceptions are:

  • no variables or embedded commands to substitute
  • operators or whole expressions to substitute

The reason is that the Tcl parser parses unbraced expressions, while expr parses that result again. This may result in success for malicious code exploits:

% set e {[file delete -force *]}
% expr $e   ;# will delete all files and directories
% expr {$e} ;# will just return the string value of e

That braced expressions evaluate much faster than unbraced ones, can be easily tested:

% proc unbraced x {expr  $x*$x}
% proc braced x   {expr {$x*$x}}
% time {unbraced 42} 1000
197 microseconds per iteration
% time {braced 42} 1000
34 microseconds per iteration

The precision of the string representation of floating-point numbers is also controlled by the tcl_precision variable. The following example returns nonzero because the second term was clipped to 12 digits in making the string representation:

% expr 1./3-[expr 1./3]
3.33288951992e-013

while this braced expression works more like expected:

% expr {1./3-[expr 1./3]}
0.0

Operators

edit

Arithmetic, bitwise and logical operators are like in C, as is the conditional operator found in other languages (notably C):

  • c?a:b -- if c is true, evaluate a, else b

The conditional operator can be used for compact functional code (note that the following example requires Tcl 8.5 so that fac() can be called inside its own definition):

% proc tcl::mathfunc::fac x {expr {$x<2? 1 : $x*fac($x-1)}}
% expr fac(5)
120

Arithmetic operators

edit

The arithmetic operators are also like those found in C:

  • + addition
  • - (binary: subtraction. unary: change sign)
  • * multiplication
  • / (integer division if both operands are integer
  • % (modulo, only on integers)
  • ** power (available from Tcl 8.5)

Bitwise operators

edit

The following operators work on integers only:

  • & (AND)
  • | (OR)
  • ^ (XOR)
  • ~ (NOT)
  • << shift left
  • >> shift right

Logical operators

edit

The following operators take integers (where 0 is considered false, everything else true) and return a truth value, 0 or 1:

  • && (and)
  • || (or)
  • ! (not - unary)

Comparison operators

edit

If operands on both side are numeric, these operators compare them as numbers. Otherwise, string comparison is done. They return a truth value, 0 (false) or 1 (true):

  • == equal
  • != not equal
  • > greater than
  • >= greater or equal than
  • < less than
  • <= less or equal than

As truth values are integers, you can use them as such for further computing, as the sign function demonstrates:

proc sgn x {expr {($x>0) - ($x<0)}}
% sgn 42
1
% sgn -42
-1
% sgn 0
0

String operators

edit

The following operators work on the string representation of their operands:

  • eq string-equal
  • ne not string-equal

Examples how "equal" and "string equal" differ:

% expr {1 == 1.0}
1
% expr {1 eq 1.0}
0

List operators

edit

From Tcl 8.5, the following operators are also available:

  • a in b - 1 if a is a member of list b, else 0
  • a ni b - 1 if a is not a member of list b, else 0

Before 8.5, it's easy to write an equivalent function

proc in {list el} {expr {[lsearch -exact $list $el]>=0}}

Usage example:

if [in $keys $key] ...

which you can rewrite, once 8.5 is available wherever your work is to run, with

if {$key in $keys} ...

Functions

edit

The following functions are built-in:

  • abs(x) - absolute value
  • acos(x) - arc cosine. acos(-1) = 3.14159265359 (Pi)
  • asin(x) - arc sine
  • atan(x) - arc tangent
  • atan2(y,x)
  • ceil(x) - next-highest integral value
  • cos(x) - cosine
  • cosh(x) - hyperbolic cosine
  • double(x) - convert to floating-point number
  • exp(x) - e to the x-th power. exp(1) = 2.71828182846 (Euler number, e)
  • floor(x) - next-lower integral value
  • fmod(x,y) - floating-point modulo
  • hypot(y,x) - hypotenuse (sqrt($y*$y+$x*$x), but at higher precision)
  • int(x) - convert to integer (32-bit)
  • log(x) - logarithm to base e
  • log10(x) - logarithm to base 10
  • pow(x,y) - x to the y-th power
  • rand() - random number > 0.0 and < 1.0
  • round(x) - round a number to nearest integral value
  • sin(x) - sine
  • sinh(x) - hyperbolic sine
  • sqrt(x) - square root
  • srand(x) - initialize random number generation with seed x
  • tan(x) - tangent
  • tanh(x) - hyperbolic tangent
  • wide(x) - convert to wide (64-bit) integer

Find out which functions are available with info functions:

% info functions
round wide sqrt sin double log10 atan hypot rand abs acos atan2 srand
sinh floor log int tanh tan asin ceil cos cosh exp pow fmod

Exporting expr functionalities

edit

If you don't want to write [[expr {$x+5}]] every time you need a little calculation, you can easily export operators as Tcl commands:

foreach op {+ - * / %} {proc $op {a b} "expr {\$a $op \$b}"}

After that, you can call these operators like in LISP:

% + 6 7
13
% * 6 7
42

Of course, one can refine this by allowing variable arguments at least for + and *, or the single-argument case for -:

proc - {a {b ""}} {expr {$b eq ""? -$a: $a-$b}}

Similarly, expr functions can be exposed:

foreach f {sin cos tan sqrt} {proc $f x "expr {$f($x)}"}

In Tcl 8.5, the operators can be called as commands in the ::tcl::mathop namespace:

% tcl::mathop::+ 6 7
13

You can import them into the current namespace, for shorthand math commands:

% namespace import ::tcl::mathop::*
% + 3 4 ;# way shorter than [expr {3 + 4}]
7
% * 6 7
42

User-defined functions

edit

From Tcl 8.5, you can provide procs in the ::tcl::mathfunc namespace, which can then be used inside expr expressions:

% proc tcl::mathfunc::fac x {expr {$x < 2? 1: $x * fac($x-1)}}
% expr fac(100)
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000

This is especially useful for recursive functions, or functions whose arguments need some expr calculations:

% proc ::tcl::mathfunc::fib n {expr {$n<2? 1: fib($n-2)+fib($n-1)}} 
% expr fib(6)
13

Interaction and debugging

edit

Tcl itself is quite a good teacher. Don't be afraid to do something wrong - it will most often deliver a helpful error message. When tclsh is called with no arguments, it starts in an interactive mode and displays a "%" prompt. The user types something in and sees what comes out: either the result or an error message.

Trying isolated test cases interactively, and pasting the command into the editor when satisfied, can greatly reduce debugging time (there is no need to restart the application after every little change - just make sure it's the right one, before restarting.)

A quick tour

edit

Here's a commented session transcript:

% hello
invalid command name "hello"

OK, so we're supposed to type in a command. Although it doesn't look so, here's one:

% hi
     1  hello
    2  hi

Interactive tclsh tries to guess what we mean, and "hi" is the unambiguous prefix of the "history" command, whose results we see here. Another command worth remembering is "info":

% info
wrong # args: should be "info option ?arg arg ...?"

The error message tells us there should be at least one option, and optionally more arguments.

% info option
bad option "option": must be args, body, cmdcount, commands, complete, default,
exists, functions, globals, hostname, level, library, loaded, locals, nameofexecutable,
patchlevel, procs, script, sharedlibextension, tclversion, or vars

Another helpful error: "option" is not an option, but the valid ones are listed. To get information about commands, it makes sense to type the following:

% info commands
tell socket subst lremove open eof tkcon_tcl_gets pwd glob list exec pid echo 
dir auto_load_index time unknown eval lrange tcl_unknown fblocked lsearch gets 
auto_import case lappend proc break dump variable llength tkcon auto_execok return
pkg_mkIndex linsert error bgerror catch clock info split thread_load loadvfs array
if idebug fconfigure concat join lreplace source fcopy global switch which auto_qualify
update tclPkgUnknown close clear cd for auto_load file append format tkcon_puts alias 
what read package set unalias pkg_compareExtension binary namespace scan edit trace seek 
while flush after more vwait uplevel continue foreach lset rename tkcon_gets fileevent 
regexp tkcon_tcl_puts observe_var tclPkgSetup upvar unset encoding expr load regsub history
exit interp puts incr lindex lsort tclLog observe ls less string

Oh my, quite many... How many?

% llength [info commands]
115

Now for a more practical task - let's let Tcl compute the value of Pi.

% expr acos(-1)
3.14159265359

Hm.. can we have that with more precision?

% set tcl_precision 17
17
% expr acos(-1)
3.1415926535897931

Back to the first try, where "hello" was an invalid command. Let's just create a valid one:

% proc hello {} {puts Hi!}

Silently acknowledged. Now testing:

% hello
Hi!

Errors are exceptions

edit

What in Tcl is called error is in fact more like an exception in other languages - you can deliberately raise an error, and also catch errors. Examples:

if {$username eq ""} {error "please specify a user name"}
if [catch {open $filename w} fp] {
   error "$filename is not writable"
}

One reason for errors can be an undefined command name. One can use this playfully, together with catch, as in the following example of a multi-loop break, that terminates the two nested loops when a matrix element is empty:

if [catch {
   foreach row $matrix {
      foreach col $row {
          if {$col eq ""} throw
      }
   }
}] {puts "empty matrix element found"}

The throw command does not exist in normal Tcl, so it throws an error, which is caught by the catch around the outer loop.

The errorInfo variable

edit

This global variable provided by Tcl contains the last error message and the traceback of the last error. Silly example:

% proc foo {} {bar x}
% proc bar {input} {grill$input}
% foo
invalid command name "grillx"
% set errorInfo
invalid command name "grillx"
   while executing
"grill$input"
   (procedure "bar" line 1)
   invoked from within
"bar x"
   (procedure "foo" line 1)
   invoked from within
"foo"

If no error has occurred yet, errorInfo will contain the empty string.

The errorCode variable

edit

In addition, there is the errorCode variable that returns a list of up to three elements:

  • category (POSIX, ARITH, ...)
  • abbreviated code for the last error
  • human-readable error text

Examples:

% open not_existing
couldn't open "not_existing": no such file or directory
% set errorCode
POSIX ENOENT {no such file or directory}
% expr 1/0
divide by zero
% set errorCode
ARITH DIVZERO {divide by zero}
% foo
invalid command name "foo"
% set errorCode
NONE

Tracing procedure calls

edit

For a quick overview how some procedures are called, and when, and what do they return, and when, the trace execution is a valuable tool. Let's take the following factorial function as example:

proc fac x {expr {$x<2? 1 : $x * [fac [incr x -1]]}} 

We need to supply a handler that will be called with different numbers of arguments (two on enter, four on leave). Here's a very simple one:

proc tracer args {puts $args}

Now we instruct the interpreter to trace enter and leave of fac:

trace add execution fac {enter leave} tracer

Let's test it with the factorial of 7:

fac 7

which gives, on stdout:

{fac 7} enter
{fac 6} enter
{fac 5} enter
{fac 4} enter
{fac 3} enter
{fac 2} enter
{fac 1} enter
{fac 1} 0 1 leave
{fac 2} 0 2 leave
{fac 3} 0 6 leave
{fac 4} 0 24 leave
{fac 5} 0 120 leave
{fac 6} 0 720 leave
{fac 7} 0 5040 leave

So we can see how recursion goes down to 1, then returns in backward order, stepwise building up the final result. The 0 that comes as second word in "leave" lines is the return status, 0 being TCL_OK.

Stepping through a procedure

edit

To find out how exactly a proc works (and what goes wrong where), you can also register commands to be called before and after a command inside a procedure is called (going down transitively to all called procs). You can use the following step and interact procedures for this:

proc step {name {yesno 1}} {
   set mode [expr {$yesno? "add" : "remove"}]
   trace $mode execution $name {enterstep leavestep} interact
}
proc interact args {
   if {[lindex $args end] eq "leavestep"} {
       puts ==>[lindex $args 2]
       return
   }
   puts -nonewline "$args --"
   while 1 {
       puts -nonewline "> "
       flush stdout
       gets stdin cmd
       if {$cmd eq "c" || $cmd eq ""} break
       catch {uplevel 1 $cmd} res
       if {[string length $res]} {puts $res}
   }
}
#----------------------------Test case, a simple string reverter:
proc sreverse str {
   set res ""
   for {set i [string length $str]} {$i > 0} {} {
       append res [string index $str [incr i -1]]
   }
   set res
}
#-- Turn on stepping for sreverse:
step sreverse
sreverse hello
#-- Turn off stepping (you can also type this command from inside interact):
step sreverse 0
puts [sreverse Goodbye]

The above code gives the following transcript when sourced into a tclsh:

{set res {}} enterstep -->
==>
{for {set i [string length $str]} {$i > 0} {} {
       append res [string index $str [incr i -1]]
   }} enterstep -->
{string length hello} enterstep -->
==>5
{set i 5} enterstep -->
==>5
{incr i -1} enterstep -->
==>4
{string index hello 4} enterstep -->
==>o
{append res o} enterstep -->
==>o
{incr i -1} enterstep -->
==>3
{string index hello 3} enterstep -->
==>l
{append res l} enterstep -->
==>ol
{incr i -1} enterstep -->
==>2
{string index hello 2} enterstep -->
==>l
{append res l} enterstep -->
==>oll
{incr i -1} enterstep -->
==>1
{string index hello 1} enterstep -->
==>e
{append res e} enterstep -->
==>olle
{incr i -1} enterstep -->
==>0
{string index hello 0} enterstep -->
==>h
{append res h} enterstep -->
==>olleh
==>
{set res} enterstep -->
==>olleh
eybdooG

Debugging

edit

The simplest way to inspect why something goes wrong is inserting a puts command before the place where it happens. Say if you want to see the values of variables x and y, just insert

puts x:$x,y:$y

(if the string argument contains no spaces, it needs not be quoted). The output will go to stdout - the console from where you started the script. On Windows or Mac, you might need to add the command

console show

to get the substitute console Tcl creates for you, when no real one is present.

If at some time you want to see details of what your program does, and at others not, you can define and redefine a dputs command that either calls puts or does nothing:

proc d+ {} {proc dputs args {puts $args}}
proc d- {} {proc dputs args {}}
d+ ;# initially, tracing on... turn off with d-

For more debugging comfort, add the proc interact from above to your code, and put a call to interact before the place where the error happens. Some useful things to do at such a debugging prompt:

info level 0    ;# shows how the current proc was called
info level      ;# shows how deep you are in the call stack
uplevel 1 ...   ;# execute the ... command one level up, i.e. in the caller of the current proc
set ::errorInfo ;# display the last error message in detail

Assertions

edit

Checking data for certain conditions is a frequent operation in coding. Absolutely intolerable conditions can just throw an error:

  if {$temperature > 100} {error "ouch... too hot!"}

Where the error occurred is evident from ::errorInfo, which will look a bit clearer (no mention of the error command) if you code

  if {$temperature > 100} {return -code error "ouch... too hot!"}

If you don't need hand-crafted error messages, you can factor such checks out to an assert command:

proc assert condition {
   set s "{$condition}"
   if {![uplevel 1 expr $s]} {
       return -code error "assertion failed: $condition"
   }
}

Use cases look like this:

  assert {$temperature <= 100}

Note that the condition is reverted - as "assert" means roughly "take for granted", the positive case is specified, and the error is raised if it is not satisfied.

Tests for internal conditions (that do not depend on external data) can be used during development, and when the coder is sure they are bullet-proof to always succeed, (s)he can turn them off centrally in one place by defining

proc assert args {}

This way, assertions are compiled to no bytecode at all, and can remain in the source code as a kind of documentation.

If assertions are tested, it only happens at the position where they stand in the code. Using a trace, it is also possible to specify a condition once, and have it tested whenever a variable's value changes:

proc assertt {varName condition} {
   uplevel 1 [list trace var $varName w "assert $condition ;#"]
}

The ";#" at the end of the trace causes the additional arguments name element op, that are appended to the command prefix when a trace fires, to be ignored as a comment.

Testing:

% assertt list {[llength $list]<10}
% set list {1 2 3 4 5 6 7 8}
1 2 3 4 5 6 7 8
% lappend list 9 10
can't set "list": assertion failed: 10<10

The error message isn't as clear as could be, because the [llength $list] is already substituted in it. But I couldn't find an easy solution to that quirk in this breakfast fun project - backslashing the $condition in the assertt code sure didn't help. Better ideas welcome.

To make the assertion condition more readable, we could quote the condition one more time,i.e

 % assertt list {{[llength $list]<10}}
 % set list {1 2 3 4 5 6 7 8}
 1 2 3 4 5 6 7 8
 % lappend list 9 10
 can't set "list": assertion failed: [llength $list]<10
 %

In this case,when trace trigger fires, the argument for assert is {[llength $list]<10}.


In any case, these few lines of code give us a kind of bounds checking - the size of Tcl's data structures is in principle only bounded by the available virtual memory, but runaway loops may be harder to debug, compared to a few assertt calls for suspicious variables:

assertt aString {[string length $aString]<1024}

or

assertt anArray {[array size anArray] < 1024*1024}

Tcllib has a control::assert with more bells and whistles.

A tiny testing framework

edit

Bugs happen. The earlier found, the easier for the coder, so the golden rule "Test early. Test often" should really be applied.

One easy way is adding self-tests to a file of Tcl code. When the file is loaded as part of a library, just the proc definitions are executed. If however you feed this file directly to a tclsh, that fact is detected, and the "e.g." calls are executed. If the result is not the one expected, this is reported on stdout; and in the end, you even get a little statistics.

Here's a file that implements and demonstrates "e.g.":

# PROLOG -- self-test: if this file is sourced at top level:
if {[info exists argv0]&&[file tail [info script]] eq [file tail $argv0]} {
   set Ntest 0; set Nfail 0
   proc e.g. {cmd -> expected} {
       incr ::Ntest
       catch {uplevel 1 $cmd} res
       if {$res ne $expected} {
           puts "$cmd -> $res, expected $expected"
           incr ::Nfail
       }
   }
} else {proc e.g. args {}} ;# does nothing, compiles to nothing
##------------- Your code goes here, with e.g. tests following
proc sum {a b} {expr {$a+$b}}
e.g. {sum 3 4} -> 7
proc mul {a b} {expr {$a*$b}}
e.g. {mul 7 6} -> 42
# testing a deliberate error (this way, it passes):
e.g. {expr 1/0} -> "divide by zero"
## EPILOG -- show statistics:
e.g. {puts "[info script] : tested $::Ntest, failed $::Nfail"} -> ""

Guarded proc

edit

In more complex Tcl software, it may happen that a procedure is defined twice with different body and/or args, causing hard-to-track errors. The Tcl command proc itself doesn't complain if it is called with an existing name. Here is one way to add this functionality. Early in your code, you overload the proc command like this:

 rename proc _proc
 _proc proc {name args body} {
 	set ns [uplevel namespace current]
 	if {[info commands $name]!="" || [info commands ${ns}::$name]!=""} {
 		puts stderr "warning: [info script] redefines $name in $ns"
 	}
 	uplevel [list _proc $name $args $body]
 }

From the time that is sourced, any attempt to override a proc name will be reported to stderr (on Win-wish, it would show on the console in red). You may make it really strict by adding an "exit" after the "puts stderr ...", or throw an error.

Known feature: proc names with wildcards will run into this trap, e.g.

  proc * args {expr [join $args *]*1}

will always lead to a complaint because "*" fits any proc name. Fix (some regsub magic on 'name') left as an exercise.

Windows wish console

edit

While on Unixes, the standard channels stdin, stdout, and stderr are the same as the terminal you started wish from, a Windows wish doesn't typically have these standard channels (and is mostly started with double-click anyway). To help this, a console was added that takes over the standard channels (stderr even coming in red, stdin in blue). The console is normally hidden, but can be brought up with the command

 console show

You can also use the partially documented "console" command. "console eval <script>" evals the given script in the Tcl interpreter that manages the console. The console's text area is actually a text widget created in this interpreter. For example:

       console eval {.console config -font Times}

will change the font of the console to "Times". Since the console is a Tk text widget, you can use all text widget commands and options on it (for example, changing colors, bindings...).

console eval {winfo children .}

tells you more about the console widget: it is a toplevel with children .menu, .console (text), and .sb (scrollbar). You can resize the whole thing with

console eval {wm geometry . $Wx$H+$X+$Y}

where $W and $H are dimensions in character cells (default 80x24), but $X and $Y are in pixels.

And more again: you can even add widgets to the console - try

console eval {pack [button .b -text hello -command {puts hello}]}

The button appears between the text widget and the scroll bar, and looks and does as expected. There is also a way back: the main interpreter is visible in the console interpreter under the name, consoleinterp.

Remote debugging

edit

Here's a simple experiment on how to connect two Tcl processes so that one (call it "debugger") can inspect and control the other ("debuggee"). Both must have an event loop running (which is true when Tk runs, or when started with e.g. vwait forever).

As this goes over a socket connection, the two processes could be on different hosts and operating systems (though I've so far tested only the localhost variety). Use at your own risk, of course... :^)

The "debuggee" contains in my experiments the following code, in addition to its own:

proc remo_server {{port 3456}} {
   set sock [socket -server remo_accept $port]
}
proc remo_accept {socket adr port} {
   fileevent $socket readable [list remo_go $socket]
}
proc remo_go {sock} {
   gets $sock line
   catch {uplevel \#0 $line} res
   puts $sock $res
   if [catch {flush $sock}] {close $sock}
}
remo_server

The "debugger" in this version (remo.tcl) runs only on Windows in a wish, as it needs a console, but you could modify it to avoid these restrictions:

#!/usr/bin/env wish
console show
wm withdraw .
set remo [socket localhost 3456]
fileevent $remo readable "puts \[gets $remo\]"
proc r args {puts $::remo [join $args]; flush $::remo}
puts "remote connection ready - use r to talk"

Now from remo you can call any Tcl command in the "debuggee", where it is executed in global scope, so in particular you can inspect (and modify) global variables. But you could also redefine procs on the fly, or whatever tickles your fancy... Examples from a remo session, showing that the two have different pids, how errors are reported, and that quoting is different from normal (needs more work):

10 % pid
600
11 % r pid
2556
12 % r wm title . "Under remote control"
wrong # args: should be "wm title window ?newTitle?"
13 % r wm title . {"Under remote control"}

Tcl in internationalization

edit

"Everything is a string", the Tcl mantra goes. A string is a (finite-length) sequence of characters. Now, what is a character? A character is not the same as a glyph, the writing element that we see on screen or paper - that represents it, but the same glyph can stand for different characters, or the same character be represented with different glyphs (think e.g. of a font selector).

Also, a character is not the same as a byte, or sequence of bytes, in memory. That again may represent a character, but not unequivocally, once we leave the safe haven of ASCII.

Let's try the following working definition: "A character is the abstract concept of a small writing unit". This often amounts to a letter, digit, or punctuation sign - but a character can be more or less than that. More: Ligatures, groups of two or more letters, can at times be treated as one character (arranged even in more than one line, as seen in Japanese U+337F ㍿ or Arabic U+FDFA ﷺ). Less: little marks (diacritics) added to a character, like the two dots on ü in Nürnberg (U+00FC), can turn that into a new "precomposed" character, as in German; or they may be treated as a separate, "composing character" (U+0308 in the example) which in rendering is added to the preceding glyph (u, U+0075) without advancing the rendering position - a more sensible treatment of the function of these two dots, "trema", in Spanish, Dutch, or even (older) English orthography: consider the spelling "coöperation" in use before c. 1950. Such composition is the software equivalent of "dead keys" on a typewriter.

Although an abstract concept, a character may of course have attributes, most importantly a name: a string, of course, which describes its function, usage, pronunciation etc. Various sets of names have been formalized in Postscript (/oumlaut) or HTML (&ouml;). Very important in technical applications is of course the assignment of a number (typically a non-negative integer) to identify a character - this is the essence of encodings, where the numbers are more formally called code points. Other attributes may be predicates like "is upper", "is lower", "is digit".

The relations between the three domains are not too complicated: an encoding controls how a 1..n sequence of bytes is interpreted as a character, or vice versa; the act of rendering turns an abstract character into a glyph (typically by a pixel or vector pattern). Conversely, making sense of a set of pixels to correctly represent a sequence of characters, is the much more difficult art of OCR, which will not be covered here.

Pre-Unicode encodings

edit

Work on encodings, mapping characters to numbers (code points), has a longer history than electronic computing. Francis Bacon (1561-1626) is reported to have used, around 1580, a five-bit encoding where bits were represented as "a" or "b", of the English/Latin alphabet (without the letters j and u!), long before Leibniz discussed binary arithmetics in 1679. An early encoding in practical use was the 5-bit Baudot/CCIT-2 teletype (punch tape) code standardized in 1932, which could represent digits and some punctuations by switching between two modes. I have worked on Univac machines that used six bits per "Fieldata" character, as hardware words were 36 bits long. While IBM used 8 bits in the EBCDIC code, the more famous American Standard Code for Information Interchange (ASCII) did basically the same job in 7 bits per character, which was sufficient for upper/lowercase basic Latin (English) as well as digits and a number of punctuations and other "special" characters - as hardware tended to 8-bit bytes as smallest memory unit, one was left for parity checks or other purposes.

The most important purpose, outside the US, was of course to accommodate more letters required to represent the national writing system - Greek, Russian, or the mixed set of accented or "umlauted" characters used in virtually every country in Europe. Even England needed a code point for the Pound Sterling sign. The general solution was to use the 128 additional positions available when ASCII was implemented as 8-bit bytes, hex 80..FF. A whole flock of such encodings were defined and used:

  • ISO standard encodings iso8859-.. (1-15)
  • MS/DOS code pages cp...
  • Macintosh code pages mac...
  • Windows code pages cp1...

East Asian encodings

edit

The East Asian countries China, Japan, and Korea all use character sets numbering in the thousands, so the "high ASCII" approach was not feasible there. Instead, the ASCII concept was extended to a 2x7 bit pattern, where the 94 printing ASCII characters indicate row and column in a 94x94 matrix. This way, all character codes were in practice two bytes wide, and thousands of Hanzi/Kanji/Hangul could be accommodated, plus hundreds of others (ASCII, Greek, Russian alphabets, many graphic characters). These national multibyte encodings are:

  • JIS C-6226 (Japan, 1978; significantly revised 1983, 1990)
  • GB 2312-80 (Mainland China, 1980)
  • KS C-5601 (South Korea, 1987)

If the 2x7 pattern was directly implemented, files in such encodings could not be told apart from ASCII files, except for unreadability. However, it does get some use in Japanese e-mails (in prevailing conventions developed prior to 8-bit-clean mail servers), with ANSI escape codes using the ESC control character being used to declare sections of text as ASCII or JIS C-6226, in what has become known as "JIS encoding" (or more properly as ISO-2022-JP).

Elsewhere, in order to handle both types of strings transparently in a more practical manner, the "high ASCII" approach was extended so that a byte in 00..7F was taken at ASCII face value, while bytes with high bit set (80..FF) were interpreted as halves of multibyte codes. For instance, the first Chinese character in GB2312, row 16 col 1 (decimally 1601 for short), gives the two bytes

16 + 32 + 128 = 176 = 0xB0
 1 + 32 + 128 = 161 = 0xA1

This implementation became known as "Extended UNIX Code" (EUC) in the national flavors euc-cn (China), -jp (Japan), -kr (Korea).

To add to the "ideograph soup" confusion, unlike euc-cn and euc-kr, euc-jp (Japan) was not widely adopted on the Windows or Macintosh platforms, which instead tend to use the incompatible ShiftJIS, which re-arranges the codes to make space for older single-byte codes for phonetic katakana characters. Also, Taiwan and Hong Kong use their own "Big 5" encoding, which doesn't use the 94×94 structure. Unlike EUC encodings, ASCII bytes can appear as the second byte of two-byte codes in these encodings, which is also true of common extensions to the EUC encodings (GBK extending euc-cn, Unified Hangul Code extending euc-kr).

Unicode as pivot for all other encodings

edit

The Unicode standard is an attempt to unify all modern character encodings into one consistent 16-bit representation. Consider a page with a 16x16 table filled with EuroLatin-1 (ISO 8859-1), the lower half being the ASCII code points. Call that "page 00" and imagine a book of 256 or more such pages (with all kinds of other characters on them, in majority CJK), then you have a pretty clear concept of the Unicode standard, in which a character's code position is "U+" hex (page number*256+cell number), for instance, U+20A4 is a version of the Pound Sterling sign. Initiated by the computer industry (www.unicode.org), the Unicode has grown together with ISO 10646, a parallel standard providing an up-to-31-bits encoding (one left for parity?) with the same scope. Software must allow Unicode strings to be fit for i18n. From Unicode version 3.1, the 16-bit limit was transcended for some rare writing systems, but also for the CJK Unified Ideographs Extension B - apparently, even 65536 code positions are not enough. The total count in Unicode 3.1 is 94,140 encoded characters, of which 70,207 are unified Han ideographs; the next biggest group are over 14000 Korean Hangul. And the number is growing.

Unicode implementations: UTF-8, UCS-2/UTF-16

edit

UTF-8 is made to cover 7-bit ASCII, Unicode, and ISO 10646. Characters are represented as sequences of 1..6 eight-bit bytes - termed octets in the character set business - (for ASCII: 1, for Unicode: 2..4) as follows:

  • ASCII 0x00..0x7F (Unicode page 0, left half): 0x00..0x7F. Nothing changed.
  • Unicode, pages 00..07: 2 bytes, 110aaabb 10bbbbbb, where aaa are the rightmost bits of page#, bb.. are the bits of the second Unicode byte. These pages cover European/Extended Latin, Greek, Cyrillic, Armenian, Hebrew, Arabic.
  • Unicode, pages 08..FE: 3 bytes, 1110aaaa 10aaaabb 10bbbbbb. These cover the rest of the Basic Multilingual Plane, including Hangul, Kanji, and what else. This means that East Asian texts are 50% longer in UTF-8 than in pure 16 bit Unicode.
  • Unicode, supplementary planes: 4 bytes, 11110ppp 10ppaaaa 10aaaabb 10bbbbbb. These were not part of the original design of Unicode (only ISO 10646), but they were added to the Unicode standard when it became clear that one plane would not be sufficient for Unicode's goals. They mostly cover Emoji, ancient writing systems, niche writing systems, enormous numbers of obscure Kanji/Hanzi, and a few very new writing systems which postdate the original design of Unicode.
  • ISO 10646 codes beyond Unicode: 4..6 bytes. Since the current approval process keeps the standards in sync by preventing ISO 10646 being allocated beyond the 17 Unicode planes, these are guaranteed not to exist in the foreseeable future.

The general principle of UTF-8 is that the first byte either is a single-byte character (if below 0x80), or indicates the length of a multi-byte code by the number of 1's before the first 0, and is then filled up with data bits. All other bytes start with bits 10 and are then filled up with 6 data bits. It follows from this that bytes in UTF-8 encoding fall in distinct ranges:

  00..7F - plain ASCII
  80..BF - non-initial bytes of multibyte code
  C2..FD - initial bytes of multibyte code (C0, C1 are not legal!)
  FE, FF - never used, so can be used to detect a UTF-16 byte-order mark (and thus, a non-UTF-8 file).

The distinction between initial and non-initial helps in plausibility checks, or to re-synchronize with missing data. Besides, it's independent of byte order (as opposed to UCS-16, see below). Tcl however shields these UTF-8 details from us: characters are just characters, no matter whether 7 bit, 16 bit, or (in the future) more.

The byte sequence EF BB BF is the UTF-8 equivalent of \uFEFF, which is detected by Windows Notepad, which switches to the UTF-8 encoding when a file starts with these three bytes, and writes them when saving a file as UTF-8. This isn't always used elsewhere, but will generally override an otherwise declared character encoding if a file starts with it.

The UCS-2 representation (in Tcl just called the "unicode" encoding) is much more easily explained: each character code is written as a 16-bit "short" unsigned integer. The practical complication is that the two memory bytes making up a "short" can be arranged in "big-endian" (Motorola, Sparc) or "little-endian" (Intel) byte order. Hence, the following rules were defined for Unicode:

  • Code point U+FEFF was defined as Byte Order Mark (BOM), later renamed to "Zero-width non-breaking space", although actually using it in its secondary whitespace role is now considered obsolete.
  • Code point U+FFFE (as well as FFFF) is a guaranteed non-character, and will never be a valid Unicode character. They are intended for use as sentinels, or for other internal use, as well as for detecting if the Byte Order Mark is being read incorrectly.

This way, a Unicode-reading application (even Notepad/W2k) can easily detect that something's wrong when it encounters the byte sequence FFFE, and swap the following byte pairs - a minimal and elegant way of dealing with varying byte orders.

While Unicode was originally intended to fit entirely within UCS-2, with the entirety of ISO 10646 requiring a 32-bit "long" (so-called UCS-4 or UTF-32), this distinction was later scrapped since one sixteen-bit plane was no longer considered sufficient to achieve Unicode's goal. Therefore, sixteen "supplementary" planes were added to Unicode, with the original 16-bit plane being kept as the "Plane 0" or "Basic Multilingual Plane". In order to use characters from supplementary planes in interfaces expecting a UCS-2 stream, the range U+D800–U+DFFF was guaranteed never to be used for Unicode characters. This allows characters in supplementary planes to be unambiguously represented in an otherwise UCS-2 stream, this is known as UTF-16.

A supplementary character is represented in big-endian UTF-16 as follows, where ssss represents one less than the plane number. In little-endian, the first two bytes are swapped and the last two bytes are swapped, the entire sequence isn't reversed. This because it is treated like a sequence of two UCS-2 characters.

  110110ss ssaaaaaa 110111aa bbbbbbbb

For XML, an encoding self-identification is defined with the encoding attribute in the leading tag. This is only useful for documents which can be treated as ASCII up to that point though, so UTF-16/UCS-2 has to be detected beforehand or otherwise indicated.

Tcl and encodings

edit

From Tcl 8.1, i18n support was brought to string processing, and it was wisely decided to

  • use Unicode as general character set
  • use UTF-8 as standard internal encoding
  • provide conversion support for the many other encodings in use.

However, as unequal-length byte sequences make simple tasks as indexing into a string, or determining its length in characters more complex, the internal representation is converted to fixed-length 16-bit UCS-16 in such cases. (This brings new problems with recent Unicodes that cross the 16-bit barrier... When practical use justifies it, this will have to change to UCS-32, or 4 bytes per character.)

Not all i18n issues are therefore automatically solved for the user. One still has to analyze seemingly simple tasks like uppercase conversion (Turkish dotted/undotted I make an anomaly) or sorting ("collation order" is not necessarily the numeric order of the Unicodes, as lsort would apply by default), and write custom routines if a more correct behavior is required. Other locale-dependent i18n issues like number/currency formatting, date/time handling also belong to this group. I recommend to start from the defaults Tcl provides, and if necessary, customize the appearance as desired. International data exchange is severely hampered if localized numeric data are exchanged, one side using period, the other comma as decimal point...

Strictly spoken, the Tcl implementation "violates the UTF-8 spec, which explicitly forbids non-canonical representation of characters and requires that malformed UTF-8 sequences in the input be errors. ... I think that to be an advantage. But the spec says 'MUST' so we're at least technically non-compliant." (Kevin B. Kenny in the Tcl chat, 2003-05-13)

If textual data are internal to your Tcl script, all you have to know is the \uxxxx notation, which is substituted into the character with Unicode U+xxxx (hexadecimal). This notation can be used wherever Tcl substitution takes place, even in braced regexp's and string map pairlists; else you can force it by substing the string in question.

To demonstrate that for instance scan works transparently, here's a one-liner to format any Unicode character as HTML hex entity:

proc c2html c {format "&#x%4.4x;" [scan $c %c]}

Conversely it takes a few lines more:

proc html2u string {
   while {[regexp {&#[xX]([0-9A-Fa-f]+);} $string matched hex]} {
       regsub -all $matched $string [format %c 0x$hex] string
   }
   set string
}
% html2u "this is a &x20ac; sign"
this is a € sign

For all other purposes, two commands basically provide all i18n support:

fconfigure $ch -encoding $e

enables conversion from/to encoding e for an open channel (file or socket) if different from system encoding;

encoding convertfrom/to $e $string

does what it says, the other encoding being always Unicode.

For instance, I could easily decode the bytes EF BB BF from a hexdump with

format %x [encoding convertfrom utf-8 \xef\xbb\xbf]

in an interactive tclsh, and found that it stood for the famous byte-order mark FEFF. Internally to Tcl, (almost) everything is a Unicode string. All communications with the operating system is done in the "system encoding", which you can query (but best not change) with the [encoding system] command. Typical values are iso8859-1 or -15 on European Linuxes, and cp1252 on European Windowses.

Introspection: Find out what encodings are available in your installation with

encoding names

You can add new encodings by producing an .enc file and copying that in the directory lib/tcl8.4/encoding (or similar) where the other .enc files are situated. For the format of encoding files (which are text files, consisting mostly of hex digits), see the man page http://www.tcl.tk/man/tcl8.4/TclLib/Encoding.htm . The basename of your .enc file (without the .enc extension) will be the name under which it can be addressed, e.g. for an encoding iso4711 name the file iso4711.enc.

Localization: message catalogs

edit

Finally, the msgcat package supports localization ("l10n") of apps by allowing message catalogs for translation of strings, typically for GUI display, from a base language (typically English) to a target language selected by the current locale. For example, an app to be localized for France might contain a file en_fr.msg with, for simplicity, only the line

msgcat::mcset fr File Fichier

In the app itself, all you need is

package require msgcat
namespace import msgcat::mc
msgcat::mclocale fr ;#(1)
#...
pack [button .b -text [mc File]]

to have the button display the localized text for "File", namely "Fichier", as obtained from the message catalog. For other locales, only a new message catalog has to be produced by translating from the base language. Instead of explicit setting as in (1), typically the locale information might come from an environment (LANG) or registry variable.

Tk: text rendering, fonts

edit

Rendering international strings on displays or printers can pose the biggest problems. First, you need fonts that contain the characters in question. Fortunately, more and more fonts with international characters are available, a pioneer being Bitstream Cyberbit that contains roughly 40000 glyphs and was for some time offered for free download on the Web. Microsoft's Tahoma font also added support for most alphabet writings, including Arabic. Arial Unicode MS delivered with Windows 2000 contains just about all the characters in the Unicode, so even humble Notepad can get truly international with that.

But having a good font is still not enough. While strings in memory are arranged in logical order, with addresses increasing from beginning to end of text, they may need to be rendered in other ways, with diacritics shifted to various positions of the preceding character, or most evident for the languages that are written from right to left ("r2l"): Arabic, Hebrew. (Tk still lacks automatic "bidi"rectional treatment, so r2l strings have to be directed "wrongly" in memory to appear right when rendered - see A simple Arabic renderer on the Wiki).

Correct bidi treatment has consequences for cursor movement, line justification, and line wrapping as well. Vertical lines progressing from right to left are popular in Japan and Taiwan - and mandatory if you had to render Mongolian.

Indian scripts like Devanagari are alphabets with about 40 characters, but the sequence of consonants and vowels is partially reversed in rendering, and consonant clusters must be rendered as ligatures of the two or more characters involved - the pure single letters would look very ugly to an Indian. An Indian font for one writing system already contains several hundred glyphs. Unfortunately, Indian ligatures are not contained in the Unicode (while Arabic ones are), so various vendor standards apply for coding such ligatures.

A little i18n tester

edit
 

Here's a little script that shows you what exotic characters your system has available. It creates a text window and tries to show some sample text for the specified languages (the screenshot is from a PocketPC in the Bitstream Cyberbit font):

pack [text .t -font {Helvetica 16}]
.t insert end "
Arabic \uFE94\uFEF4\uFE91\uFEAE\uFECC\uFEDF\uFE8D\uFE94\uFEE4\uFEE0\uFEDC\uFEDF\uFE8D
Trad. Chinese      \u4E2D\u570B\u7684\u6F22\u5B57
Simplified Chinese \u6C49\u8BED
Greek   \u0395\u03BB\u03BB\u03B7\u03BD\u03B9\u03BA\u03AE\
\u03B3\u03BB\u03CE\u03C3\u03C3\u03B1
Hebrew  \u05DD\u05D9\u05DC\u05E9\u05D5\u05E8\u05D9\
\u05DC\u05D9\u05D0\u05E8\u05E9\u05D9
Japanese \u65E5\u672C\u8A9E\u306E\u3072\u3089\u304C\u306A,\
\u6F22\u5B57\u3068\u30AB\u30BF\u30AB\u30CA
Korean          \uB300\uD55C\uBBFC\uAD6D\uC758 \uD55C\uAE00 (\u3CA2\u3498)
Russian         \u0420\u0443\u0441\u0441\u043A\u0438\u0439\
\u044F\u0437\u044B\u043A
"

No font or size are specified, so you see the pure defaults (and notice how Tk manages to find characters). You can then configure the text widget for the fonts you'd like to see.

Input methods in Tcl/Tk

edit

To get outlandish characters not seen on the keyboard into the machine, they may at lowest level be specified as escape sequences, e.g. "\u2345". But most user input will come from keyboards, for which many layouts exist in different countries. In CJK countries, there is a separate coding level between keys and characters: keystrokes, which may stand for the pronunciation or geometric components of a character, are collected in a buffer and converted into the target code when enough context is available (often supported by on-screen menus to resolve ambiguities).

Finally, a "virtual keyboard" on screen, where characters are selected by mouse click, is especially helpful for non-frequent use of rarer characters, since the physical keyboard gives no hints which key is mapped to which other code. This can be implemented by a set of buttons, or minimally with a canvas that holds the provided characters as text items, and bindings to <1>, so clicking on a character inserts its code into the widget which has keyboard focus. See iKey: a tiny multilingual keyboard.

The term "input methods" is often used for operating-system-specific i18n support, but I have no experiences with this, doing i18n from a German Windows installation. So far I'm totally content with hand-crafted pure Tcl/Tk solutions - see taiku on the Wiki.

Transliterations: The Lish family

edit

The Lish family is a set of transliterations, all designed to convert strings in lowly 7-bit ASCII to appropriate Unicode strings in some major non-Latin writing systems. The name comes from the common suffix "lish" as in English, which is actually the neutral element of the family, faithfully returning its input ;-) Some rules of thumb:

  • One *lish character should unambiguously map to one target character, wherever applicable
  • One target letter should be represented by one *lish letter (A-Za-z), wherever applicable. Special characters and digits should be avoided for coding letters
  • Mappings should be intuitive and/or follow established practices
  • In languages that distinguish case, the corresponding substitutes for upper- and lowercase letters should also correspond casewise in lower ASCII.

The Tclers' Wiki http://mini.net/tcl/ has the members of the Lish family available for copy'n'paste. The ones I most frequently use are

  • Arblish, which does context glyph selection and right-to-left conversion;
  • Greeklish;
  • Hanglish for Korean Hangul, which computes Unicodes from initial-vowel-final letters;
  • Ruslish for Cyrillic.

Calling examples, that return the Unicodes for the specified input:

  arblish   dby w Abw Zby
  greeklish Aqh'nai
  hanglish  se-qul
  heblish   irwsliM
  ruslish   Moskva i Leningrad

Greeklish

edit

It all began with Greeklish, which is not my invention, but used by Greeks on the Internet for writing Greek without Greek fonts or character set support. I just extended the practice I found with the convention of marking accented vowels with a trailing apostrophe (so it's not a strict 1:1 transliteration anymore). Special care was taken to convert "s" at word end to "c", so it produces the final-sigma. Here is the code:

proc greeklish str {
  regsub -all {s([ \t\n.,:;])} $str {c\1} str
  string map {
   A' \u386 E' \u388 H' \u389 I' \u38a O' \u38c U' \u38e W' \u38f
   a' \u3ac e' \u3ad h' \u3ae i' \u3af o' \u3cc u' \u3cd w' \u3ce
   A \u391 B \u392 G \u393 D \u394 E \u395 Z \u396 H \u397 Q \u398
   I \u399 K \u39a L \u39b M \u39c N \u39d J \u39e O \u39f P \u3a0
   R \u3a1 S \u3a3 T \u3a4 U \u3a5 F \u3a6 X \u3a7 Y \u3a8 W \u3a9
   a \u3b1 b \u3b2 g \u3b3 d \u3b4 e \u3b5 z \u3b6 h \u3b7 q \u3b8
   i \u3b9 k \u3ba l \u3bb m \u3bc n \u3bd j \u3be o \u3bf p \u3c0
   r \u3c1 c \u3c2 s \u3c3 t \u3c4 u \u3c5 f \u3c6 x \u3c7 y \u3c8 
   w \u3c9 ";" \u387 ? ";"
  } $str
}

Testing:

% greeklish Aqh'nai
Αθήναι
% greeklish "eis thn po'lin"
εις την πόλιν

Hanglish

edit

Even though the Korean Hangul writing has many thousands of syllable characters, it is possible to compute the Unicode from the spelling of a syllable and vice versa. Here's how:

proc hangul2hanglish s {
   set lead {g gg n d dd r m b bb s ss "" j jj c k t p h}
   set vowel {a ae ya yai e ei ye yei o oa oai oi yo u ue uei ui yu w wi i}
   set tail {"" g gg gs n nj nh d l lg lm lb ls lt lp lh m b bs s ss ng j c k t p h}
   set res ""
   foreach c [split $s ""] {
       scan $c %c cnum
       if {$cnum>=0xAC00 && $cnum<0xD7A3} {
           incr cnum -0xAC00
           set l [expr {$cnum / (28*21)}]
           set v [expr {($cnum/28) % 21}]
           set t [expr {$cnum % 28}]
           append res  [lindex $lead $l ]
           append res  [lindex $vowel $v]
           append res "[lindex $tail $t] "
       } else {append res $c}
   }
   set res
}
proc hanglish2uc hanglish {
   set L ""; set V "" ;# in case regexp doesn't hit
   set hanglish [string map {
       AE R SH S R L NG Q YE X YAI F AI R YA V YO Y YU Z VI F
   } [string toupper $hanglish]]
   regexp {^([GNDLMBSQJCKTPH]+)?([ARVFEIXOYUZW]+)([GNDLMBSQJCKTPH]*)$} \
       $hanglish ->  L V T ;# lead cons.-vowel-trail cons.
   if {$L==""} {set L Q}
   if {$V==""} {return $hanglish}
   set l [lsearch {G GG N D DD L M B BB S SS Q J JJ C K T P H} $L]
   set v [lsearch {A R V F E EI X XI O OA OR OI Y U UE UEI UI Z W WI I} $V]
   set t [lsearch {"" G GG GS N NJ NH D L LG LM LB LS LT LP LH  \
       M B BS S SS Q J C K T P H} $T] ;# trailing consonants
   if {[min $l $v $t] < 0} {return $hanglish}
   format %c [expr {$l*21*28 + $v*28 + $t + 0xAC00}]
}
proc min args {lindex [lsort -real $args] 0}
proc hanglish argl {
   set res ""
   foreach i $argl {
       foreach j [split $i -] {append res [hanglish2uc $j]}
   }
   append res " "
}

Collation

edit

Collation is "the logical ordering of character or wide-character strings according to defined precedence rules. These rules identify a collation sequence between the collating elements, and such additional rules that can be used to order strings consisting of multiple collating elements."

Tcl's lsort sorts according to numerical Unicode values, which may not be correct in some locales. For instance, in Portuguese, accented letters should sort as if they weren't, but in Unicode sequence come after "z".

The following simple code takes a map in which collation differences can be listed as {from to from to...}, sorts the mapped items, and retrieves only the original elements:

proc collatesort {list map} {
   set l2 {}
   foreach e $list {
      lappend l2 [list $e [string map $map $e]]
   }
   set res {}
   foreach e [lsort -index 1 $l2] {lappend res [lindex $e 0]}
   set res
}

Testing, Portuguese:

% collatesort {ab ãc ãd ae} {ã a}
ab ãc ãd ae

Spanish (ll sorts after lz):

% collatesort {llano luxación leche} {ll lzz}
leche luxación llano

German (umlauts sorted as if "ä" was "ae"):

% lsort {Bar Bär Bor}
Bar Bor Bär
% collatesort {Bar Bär Bor} {ä ae}
Bär Bar Bor

Regular expressions

edit

Overview

edit

Another language (I wouldn't want to call it "little") embedded inside Tcl is regular expressions. They may not look like this to you - but they follow (many) rules, indeed. The purpose is to describe a string pattern to match with - for searching, extracting, or replacing substrings.

Regular expressions are used in the regexp, regsub commands, and optionally in lsearch and switch. Note that this language is very different from Tcl itself, so in most cases it is best to brace an RE, to prevent the Tcl parser from misunderstanding them.

Before the gory details begin, let's start with some examples:

regexp {[0-9]+[a-z]} $input 

returns 1 if $input contains one or more digits, followed by a lowercase letter.

set result [regsub -all {[A-Z]} $input ""]

deletes all uppercase letters from $input, and saves that to the result variable.

lsearch -all -inline -regexp $input {^-}

returns all elements in the list $input which start with a "-".

Character classes

edit

Many characters just stand for themselves. E.g.

a

matches indeed the character "a". Any Unicode can be specified in the \uXXXX format. In brackets (not the same as Tcl's), a set of alternatives (a "class") is defined:

[abc]

matches "a", "b", or "c". A dash (-) between two characters spans the range between them, e.g.

[0-9]

matches one decimal digit. To have literal "-" in a set of alternatives, put it first or last:

[0-9-]

matches a digit or a minus sign. A bracketed class can be negated by starting it with ^, e.g.

[^0-9]

matches any character that is not a decimal digit. The period "." represents one instance of any character. If a literal "." is intended (or in general, to escape any character that has a special meaning to the regexp syntax), put a backslash "\" before it - and make sure the regular expression is braced, so the Tcl parser doesn't eat up the backslash early...).

Quantifiers

edit

To repeat a character (set) other than once, there are quantifiers put behind it, e.g.

a+     matches one or more "a"s,
a?     matches zero or one "a",
a*     matches zero or more "a"s.

There is also a way of numeric quantification, using braces (again, not the same as Tcl's):

a{2}   matches two "a"s - same as "aa"
a{1,5} matches one to five "a"s
a{1,}  one or more - like a+
a{0,1} zero or one - like a?
a{0,}  zero or more - like a*

The + and * quantifiers act "greedy", i.e. they consume the longest possible substring. For non-greedy behavior, which provides the shortest possible match, add a "?" in behind. Examples:

% regexp -inline {<(.+)>} <foo><bar><grill>
<foo><bar><grill> foo><bar><grill

This matches until the last close-bracket

% regexp -inline {<(.+?)>} <foo><bar><grill>
<foo> foo

This matches until the first close-bracket.

Anchoring

edit

By default, a regular expression can match anywhere in a string. You can limit that to the beginning (^) and/or end ($):

regexp {^a.+z$} $input

succeeds if input begins with "a" and ends with "z", and has one or more characters between them.

Grouping

edit

A part of a regular expression can be grouped by putting parentheses () around it. This can have several purposes:

  • regexp and regsub can extract or refer to such substrings
  • the operator precedence can be controlled

The "|" (or) operator has high precedence. So

foo|bar grill

matches the strings "foo" or "bar grill", while

(foo|bar) grill

matches "foo grill" or "bar grill".

(a|b|c) ;# is another way to write [abc]

For extracting substrings into separate variables, regexp allows additional arguments:

regexp ?options? re input fullmatch part1 part2...

Here, variable fullmatch will receive the substring of input that matched the regular expression re, while part1 etc. receive the parenthesized submatches. As fullmatch is often not needed, it has become an eye-candy idiom to use the variable name "->" in that position, e.g.

regexp {(..)(...)} $input -> first second

places the first two characters of input in the variable first, and the next three in the variable second. If $input was "ab123", first will hold "ab", and second will contain "123".

In regsub and regexp, you can refer to parenthesized submatches with \1 for the first, \2 for the second, etc. \0 is the full match, as with regexp above. Example:

% regsub {(..)(...)} ab123 {\2\1=\0}
123ab=ab123

Here \1 contains "ab", \2 contains "123", and \0 is the full match "ab123". Another example, how to find four times the same lowercase letter in a row (the first occurrence, plus then three):

regexp {([a-z])\1{3}} $input

More examples

edit

Parse the contents in angle brackets (note that the result contains the full match and the submatch in paired sequence, so use foreach to extract only the submatches):

% regexp -all -inline {<([^>]+)>} x<a>y<b>z<c>d
<a> a <b> b <c> c

Insert commas between groups of three digits into the integer part of a number:

% regsub -all {\d(?=(\d{3})+($|\.))} 1234567.89 {\0,}
1,234,567.89

In other countries, you might use an apostrophe (') as separator, or make groups of four digits (used in Japan).

In the opposite direction, converting such formatted numbers back to the regular way for use in calculations, the task consists simply of removing all commas. This can be done with regsub:

% regsub -all , 1,234,567.89 ""
1234567.89

but as the task involves only constant strings (comma and empty string), it is more efficient not to use regular expressions here, but use a string map command:

% string map {, ""} 1,234,567.89
1234567.89

Working with files

edit

Files and channels

edit

In addition to the functionalities known from C's stdio, Tcl offers more commands that deal with files, similar to what shells provide, though often a bit more verbose. Here are some examples:

glob *.tcl

List all files in the current directory that match the pattern *.tcl.

file copy /from/path/file.ext /to/path
.
file delete /from/path/file.ext
.
file rename before.txt after.txt
.
cd /an/other/directory
.
pwd

To let code temporarily execute in another directory, use this pattern:

set here [pwd]
cd $someotherdir
#... code here
cd $here

More precisely, many "file" operations work in general on "channels", which can be

  • standard channels (stdin, stdout, stderr)
  • files as opened with open ...
  • pipes as opened with 'open |...
  • sockets (TCP)

File names

edit

Files are often addressed with path names, which indicate their position in the directory tree. Like Unix, Tcl uses "/" as path separator, while on Windows "\" is the native way - which brings trouble not only to Tcl, but even to C, because "\" is the escape character on both, so that e.g. \t is parsed as horizontal tab, \n as newline, etc. Fortunately Windows accepts natively "/" as well, so you can use forward slash in both Tcl and C programs as path separator without worries. However, you still have to take care of escape sequences. One stopgap measure is

  • to escape the escape character, i.e. write \\ for \, or
  • brace backslashed names, e.g. {\foo\bar\grill.txt}

But Tcl allows to use the "normal" separator / in almost all situations, so you're safer off with that. Unfortunately, things are sad for Mac users, since MacOS (before X) accepts only ":" as file separator.

If you need to, here's ways to convert between the two forms:

% file normalize \\foo\\bar\\grill.txt
C:/foo/bar/grill.txt
% file nativename /foo/bar/grill.txt
\foo\bar\grill.txt

You can even use file join command: file join arg1 arg2 ... argN

Tcl will then take care of all platform dependent details to create platform independent path. For example:

set somepath [file join foo bar grill.txt]

will result in following path (on windows machine): foo/bar/grill.txt

Input and output

edit

Tcl's input/output commands are pretty closely based on those from C's stdio (just strip off the leading f):

  • set handle [open filename ?mode?]
  • set data [read $handle ?int?]
  • tell $handle
  • seek $handle offset ?from?
  • puts ?-nonewline? ?$handle? content
  • gets $handle ?varname?
  • close $handle

C's printf functionality is split in two steps:

  • format the data into a string with format (which is very much like sprintf)
  • output the resulting string with puts. For example,
puts $handle [format "%05d %s" $number $text]

To process a text file line by line, you have two choices. If the file is smaller than several megabytes, you can read it just in one go:

set f [open $filename]
foreach line [split [read $f] \n] {
    # work with $line here ...
}
close $f

For files of any big size, you can read the lines one by one (though this is slightly slower than the above approach):

set f [open $filename]
while {[gets $f line] >= 0} {
    # work with $line here ...
}
close $f

Finally, if you can format your file so that it is executable Tcl code, the following reading method is fastest:

source $filename

To "touch a file", i.e. create it if not exists, and in any case update its modification time, you can use this:

proc touch name {close [open $name a]}

"Binary" files

edit

All files are made of bytes, which are made of bits, which are binary. The term "binary" with files relates mostly to the fact that they can contain bytes of any value, and line-ends (Carriage Return+Newline in the DOS/Windows world) are not to be translated. Tcl can handle "binary" files without a problem -- just configure the channel as binary after opening:

set fp [open tmp.jpg]
fconfigure $fp -translation binary
set content [read $fp]
close $fp

Now the variable content holds the file's contents, byte for byte.

To test whether a file is "binary", in the sense that it contains NUL bytes:

proc binary? filename {
   set f [open $filename]
   set data [read $f 1024]
   close $f
   expr {[string first \x00 $data]>=0}
}

The file command

edit

Many useful operations with files are collected in the file command. The first argument tells which operation to do:

  • file atime name ?time?
  • file attributes name
  • file attributes name ?option?
  • file attributes name ?option value option value...?
  • file channels ?pattern? - returns the handles of currently open files
  • file copy ?-force? ?- -? source target
  • file copy ?-force? ?- -? source ?source ...? targetDir
  • file delete ?-force? ?- -? pathname ?pathname ... ?
  • file dirname name - e.g. [file dirname /foo/bar/grill.txt] -> /foo/bar
  • file executable name
  • file exists name
  • file extension name - e.g. [file extension /foo/bar/grill.txt] -> .txt
  • file isdirectory name
  • file isfile name
  • file join name ?name ...?
  • file link ?-linktype? linkName ?target?
  • file lstat name varName
  • file mkdir dir ?dir ...? - creates one or more directories (folders)
  • file mtime name ?time?
  • file nativename name
  • file normalize name
  • file owned name
  • file pathtype name
  • file readable name
  • file readlink name
  • file rename ?-force? ?- -? source target
  • file rename ?-force? ?- -? source ?source ...? targetDir
  • file rootname name - e.g. [file rootname /foo/bar/grill.txt] -> /foo/bar/grill
  • file separator ?name?
  • file size name
  • file split name - e.g [file split /foo/bar/grill.txt] -> {foo bar grill.txt}
  • file stat name varName
  • file system name
  • file tail name - e.g. [file tail /foo/bar/grill.txt] -> grill.txt
  • file type name
  • file volumes - Windows: returns your "drive letters", e.g {A:/ C:/}
  • file writable name

Tcl examples

edit

Most of these example scripts first appeared in the Tclers' Wiki http://wiki.tcl.tk . The author (Richard Suchenwirth) declares them to be fully in the public domain. The following scripts are plain Tcl, they don't use the Tk GUI toolkit (there's a separate chapter for those).

Sets as lists

edit

Tcl's lists are well suited to represent sets. Here's typical set operations. If you use the tiny testing framework explained earlier, the e.g. lines make the self-test; otherwise they just illustrate how the operations should work.

proc set'contains {set el} {expr {[lsearch -exact $set $el]>=0}}

e.g. {set'contains {A B C} A} -> 1
e.g. {set'contains {A B C} D} -> 0

proc set'add {_set args} {
   upvar 1 $_set set
   foreach el $args {
       if {![set'contains $set $el]} {lappend set $el}
   }
   set set
}

set example {1 2 3}
e.g. {set'add example 4} -> {1 2 3 4}
e.g. {set'add example 4} -> {1 2 3 4}

proc set'remove {_set args} {
   upvar 1 $_set set
   foreach el $args {
       set pos [lsearch -exact $set $el]
       set set [lreplace $set $pos $pos]
   }
   set set
}

e.g. {set'remove example 3} -> {1 2 4}

proc set'intersection {a b} {
   foreach el $a {set arr($el) ""}
   set res {}
   foreach el $b {if {[info exists arr($el)]} {lappend res $el}}
   set res

e.g. {set'intersection {1 2 3 4} {2 4 6 8}} -> {2 4}

proc set'union {a b} {
   foreach el $a {set arr($el) ""}
   foreach el $b {set arr($el) ""}
   lsort [array names arr]
}

e.g. {set'union {1 3 5 7} {2 4 6 8}} -> {1 2 3 4 5 6 7 8}

proc set'difference {a b} {
   eval set'remove a $b
}

e.g. {set'difference {1 2 3 4 5} {2 4 6}} -> {1 3 5}

Hex-dumping a file

edit

The following example code opens a file, configures it to binary translation (i.e. line-ends \r\n are not standardized to \n as usual in C), and prints as many lines as needed which each contain 16 bytes in hexadecimal notation, plus, where possible, the ASCII character.

proc file'hexdump filename {
   set fp [open $filename]
   fconfigure $fp -translation binary
   set n 0
   while {![eof $fp]} {
       set bytes [read $fp 16]
       regsub -all {[^\x20-\xfe]} $bytes . ascii
       puts [format "%04X %-48s %-16s" $n [hexdump $bytes] $ascii]
       incr n 16
   }
   close $fp
}

proc hexdump string {
   binary scan $string H* hex
   regexp -all -inline .. $hex
}

The "main routine" is a single line that dumps all files given on the command line:

foreach file $argv {file'hexdump $file}

Sample output, the script applied to itself:

...> tclsh hexdump.tcl hexdump.tcl
0000 0d 0a 20 70 72 6f 63 20 66 69 6c 65 27 68 65 78  .. proc file'hex
0010 64 75 6d 70 20 66 69 6c 65 6e 61 6d 65 20 7b 0d  dump filename {.
0020 0a 20 20 20 20 73 65 74 20 66 70 20 5b 6f 70 65  .    set fp [ope
0030 6e 20 24 66 69 6c 65 6e 61 6d 65 5d 0d 0a 20 20  n $filename]..
...

Roman numerals

edit

Roman numerals are an additive (and partially subtractive) system with the following letter values:

I=1 V=5 X=10 L=50 C=100 D=500 M=1000; MCMXCIX = 1999

Here's some Tcl routines for dealing with Roman numerals.

Sorting roman numerals: I,V,X already come in the right order; for the others we have to introduce temporary collation transformations, which we'll undo right after sorting:

proc roman:sort list {
   set map {IX VIIII L Y XC YXXXX C Z D {\^} ZM {\^ZZZZ} M _}
   foreach {from to} $map {
       regsub -all $from $list $to list
   }
   set list [lsort $list]
   foreach {from to} [lrevert $map] {
       regsub -all $from $list $to list
   }
   set list
}

Roman numerals from integer:

proc roman:numeral {i} {
       set res ""
       foreach {value roman} {
           1000 M 900 CM 500 D 400 CD 100 C 90 XC 50 L 40 XL 
           10 X 9 IX 5 V 4 IV 1 I} {
               while {$i>=$value} {
                       append res $roman
                       incr i -$value
               }
       }
       set res
}

Roman numerals parsed into integer:

proc roman:get {s} {
       array set r_v {M 1000 D 500 C 100 L 50 X 10 V 5 I 1}
       set last 99999; set res 0
       foreach i [split [string toupper $s] ""] {
               if [catch {set val $r_v($i)}] {
                   error "un-Roman digit $i in $s"
               }
               incr res $val
               if {$val>$last} {incr res [expr -2*$last]}
               set last $val
       }
       set res
}

Custom control structures

edit

As "control structures" are really nothing special in Tcl, just a set of commands, it is easier than in most other languages to create one's own. For instance, if you would like to simplify the for loop

for {set i 0} {$i < $max} {incr i} {...}

for the typical simple cases so you can write instead

loop i 0 $max {...}

here is an implementation that even returns a list of the results of each iteration:

proc loop {_var from to body} {
   upvar 1 $_var var
   set res {}
   for {set var $from} {$var < $to} {incr var} {lappend res [uplevel 1 $body]}
   return $res
 }

using this, a string reverse function can be had as a one-liner:

proc sreverse {str} {
   join [loop i 0 [string length $str] {string index $str end-$i}] ""
}

Range-aware switch

edit

Another example is the following range-aware switch variation. A range (numeric or strings) can be given as from..to, and the associated scriptlet gets executed if the tested value lies inside that range.

Like in switch, fall-through collapsing of several cases is indicated by "-", and "default" as final condition fires if none else did. Different from switch, numbers are compared by numeric value, no matter whether given as decimal, octal or hex.

proc rswitch {value body} {
  set go 0
  foreach {cond script} $body {
     if {[regexp {(.+)\.\.(.+)} $cond -> from to]} {
          if {$value >= $from && $value <= $to} {incr go}
     } else {
         if {$value == $cond} {incr go}
     }
     if {$go && $script ne "-"} { #(2)
         uplevel 1 $script
         break
     }
  }
  if {$cond eq "default" && !$go} {uplevel 1 $script} ;#(1)
}

Testing:

% foreach i {A K c z 0 7} {
     puts $i
     rswitch $i {
        A..Z {puts upper} 
        a..z {puts lower} 
        0..9 {puts digit}
     }
}
A
upper
K
upper
c
lower
z
lower
0
digit
7
digit
% rswitch 0x2A {42 {puts magic} default {puts df}}
magic

The K combinator

edit

A very simple control structure (one might also call it a result dispatcher) is the K combinator, which is almost terribly simple:

proc K {a b} {return $a}

It can be used in all situations where you want to deliver a result that is not the last. For instance, reading a file in one go:

proc readfile filename {
   set f [open $filename]
   set data [read $f]
   close $f
   return $data
}

can be simplified, without need for the data variable, to:

proc readfile filename {
   K [read [set f [open $filename]]] [close $f]
}

Another example, popping a stack:

proc pop _stack {
   upvar 1 $_stack stack
   K [lindex $stack end] [set stack [lrange $stack 0 end-1]]
}

This is in some ways similar to LISP's PROG1 construct: evaluate the contained expressions, and return the result of the first one.

Rational numbers

edit

Rational numbers, a.k.a. fractions, can be thought of as pairs of integers {numerator denominator}, such that their "real" numerical value is numerator/denominator (and not in integer nor "double" division!). They can be more precise than any "float" or "double" numbers on computers, as those can't exactly represent any fractions whose denominator isn't a power of 2 — consider 13 which can not at any precision be exactly represented as floating-point number to base 2, nor as decimal fraction (base 10), even if bignum.

An obvious string representation of a rational is of course "n/d". The following "constructor" does that, plus it normalizes the signs, reduces to lowest terms, and returns just the integer n if d==1:

proc rat {n d} {
  if {!$d} {error "denominator can't be 0"}
  if {$d<0} {set n [- $n]; set d [- $d]}
  set g [gcd $n $d]
  set n [/ $n $g]
  set d [/ $d $g]
  expr {$d==1? $n: "$n/$d" }
}

Conversely, this "deconstructor" splits zero or more rational or integer strings into num and den variables, such that [ratsplit 1/3 a b] assigns 1 to a and 3 to b:

proc ratsplit args {
   foreach {r _n _d} $args {
      upvar 1 $_n n  $_d d
      foreach {n d} [split $r /] break
      if {$d eq ""} {set d 1}
   }
}

#-- Four-species math on "rats":
proc rat+ {r s} {
   ratsplit $r a b $s c d
   rat [+ [* $a $d] [* $c $b]] [* $b $d]
}
proc rat- {r s} {
   ratsplit $r a b $s c d
   rat [- [* $a $d] [* $c $b]] [* $b $d]
}
proc rat* {r s} {
   ratsplit $r a b $s c d
   rat [* $a $c] [* $b $d]
}
proc rat/ {r s} {
   ratsplit $r a b $s c d
   rat [* $a $d] [* $b $c]
}

Arithmetical helper functions can be wrapped with func if they only consist of one call of expr:

proc func {name argl body} {proc $name $argl [list expr $body]}

#-- Greatest common denominator:
func gcd {u v} {$u? [gcd [% $v $u] $u]: abs($v)}

#-- Binary expr operators exported:
foreach op {+ * / %} {func $op {a b} \$a$op\$b}

#-- "-" can have 1 or 2 operands:
func - {a {b ""}} {$b eq ""? -$a: $a-$b}

#-- a little tester reports the unexpected:
proc ? {cmd expected} {
   catch {uplevel 1 $cmd} res
   if {$res ne $expected} {puts "$cmd -> $res, expected $expected"}
}

#-- The test suite should silently pass when this file is sourced:
? {rat 42 6} 7
? {rat 1 -2} -1/2
? {rat -1 -2} 1/2
? {rat 1 0} "denominator can't be 0"
? {rat+ 1/3 1/3} 2/3
? {rat+ 1/2 1/2} 1
? {rat+ 1/2 1/3} 5/6
? {rat+ 1 1/2}    3/2
? {rat- 1/2 1/8} 3/8
? {rat- 1/2 1/-8} 5/8
? {rat- 1/7 1/7} 0
? {rat* 1/2 1/2} 1/4
? {rat/ 1/4 1/4} 1
? {rat/ 4 -6} -2/3

Docstrings

edit

Languages like Lisp and Python have the docstring feature, where a string in the beginning of a function can be retrieved for on-line (or printed) documentation. Tcl doesn't have this mechanism built-in (and it would be hard to do it exactly the same way, because everything is a string), but a similar mechanism can easily be adopted, and it doesn't look bad in comparison:

  • Common Lisp: (documentation 'foo 'function)
  • Python: foo.__doc__
  • Tcl: docstring foo

If the docstring is written in comments at the top of a proc body, it is easy to parse it out. In addition, for all procs, even without docstring, you get the "signature" (proc name and arguments with defaults). The code below also serves as usage example: }

proc docstring procname {
   # reports a proc's args and leading comments.
   # Multiple documentation lines are allowed.
   set res "{usage: $procname [uplevel 1 [list info args $procname]]}"
   # This comment should not appear in the docstring
   foreach line [split [uplevel 1 [list info body $procname]] \n] {
       if {[string trim $line] eq ""} continue
       if ![regexp {\s*#(.+)} $line -> line] break
       lappend res [string trim $line]
   }
   join $res \n
}
proc args procname {
   # Signature of a proc: arguments with defaults
   set res ""
   foreach a [info args $procname] {
       if [info default $procname $a default] {
           lappend a $default
       }
       lappend res $a
   }
   set res
}

Testing:

% docstring docstring
usage: docstring procname
reports a proc's args and leading comments.
Multiple documentation lines are allowed.

% docstring args
usage: args procname
Signature of a proc: arguments with defaults

Factorial

edit

Factorial (n!) is a popular function with super-exponential growth. Mathematically put,

  0! = 1
  n! = n (n-1)! if n >0, else undefined

In Tcl, we can have it pretty similarly:

proc fact n {expr {$n<2? 1: $n * [fact [incr n -1]]}}

But this very soon crosses the limits of integers, giving wrong results.

A math book showed me the Stirling approximation to n! for large n (at Tcl's precisions, "large" is > 20 ...), so I built that in:

proc fact n {expr {
    $n<2? 1:
    $n>20? pow($n,$n)*exp(-$n)*sqrt(2*acos(-1)*$n):
           wide($n)*[fact [incr n -1]]}
}

Just in case somebody needs approximated large factorials... But for n>143 we reach the domain limit of floating point numbers. In fact, the float limit is at n>170, so an intermediate result in the Stirling formula must have busted at 144. For such few values it is most efficient to just look them up in a pre-built table, as Tcllib's math::factorial does.

How big is A4?

edit

Letter and Legal paper formats are popular in the US and other places. In Europe and elsewhere, the most widely used paper format is called A4. To find out how big a paper format is, one can measure an instance with a ruler, or look up appropriate documentation. The A formats can also be deduced from the following axioms:

  • A0 has an area of one square meter
  • A(n) has half the area of A(n-1)
  • The ratio between the longer and the shorter side of an A format is constant

How much this ratio is, can easily be computed if we consider that A(n) is produced from A(n-1) by halving it parallel to the shorter side, so

2a : b = b : a, 
2 a2 = b2, 
b=sqrt(2) a, hence 
b : a = sqrt(2) : 1

So here is my Tcl implementation, which returns a list of height and width in centimeters (10000 cm2 = 1 m2) with two fractional digits, which delivers a sufficient precision of 1/10 mm: }

proc paperA n {
   set w [expr {sqrt(10000/(pow(2,$n) * sqrt(2)))}]
   set h [expr {$w * sqrt(2)}]
   format "%.2f %.2f" $h $w
}
% paperA 4
29.73 21.02

Bit vectors

edit

Here is a routine for querying or setting single bits in vectors, where bits are addressed by non-negative integers. Implementation is as a "little-endian" list of integers, where bits 0..31 are in the first list element, 32..63 in the second, etc.

Usage: bit varName position ?bitval?

If bitval is given, sets the bit at numeric position position to 1 if bitval != 0, else to 0; in any case returns the bit value at specified position. If variable varName does not exist in caller's scope, it will be created; if it is not long enough, it will be extended to hold at least $position+1 bits, e.g. bit foo 32 will turn foo into a list of two integers, if it was only one before. All bits are initialized to 0.

proc bit {varName pos {bitval {}}} {
   upvar 1 $varName var
   if {![info exist var]} {set var 0}
   set element [expr {$pos/32}]
   while {$element >= [llength $var]} {lappend var 0}
   set bitpos [expr {1 << $pos%32}]
   set word [lindex $var $element]
   if {$bitval != ""} {
       if {$bitval} {
           set word [expr {$word | $bitpos}]
       } else {
           set word [expr {$word & ~$bitpos}]
       }
       lset var $element $word
   }
   expr {($word & $bitpos) != 0}
}

#---------------------- now testing...
if {[file tail [info script]] == [file tail $argv0]} {
   foreach {test      expected} {
       {bit foo 5 1}  1
       {set foo}      32
       {bit foo 32 1} {32 1}
   } {
       catch {eval $test} res
       puts $test:$res/$expected
   }
}

This may be used for Boolean properties of numerically indexed sets of items. Example: An existence map of ZIP codes between 00000 and 99999 can be kept in a list of 3125 integers (where each element requires about 15 bytes overall), while implementing the map as an array would take 100000 * 42 bytes in worst case, but still more than a bit vector if the population isn't extremely sparse — in that case, a list of 1-bit positions, retrieved with lsearch, might be more efficient in memory usage. Runtime of bit vector accesses is constant, except when a vector has to be extended to much larger length.

Bit vectors can also be used to indicate set membership (set operations would run faster if processing 32 bits on one go with bitwise operators (&, |, ~, ^)) — or pixels in a binary imary image, where each row could be implemented by a bitvector.

Here's a routine that returns the numeric indices of all set bits in a bit vector:

proc bits bitvec {
   set res {}
   set pos 0
   foreach word $bitvec {
       for {set i 0} {$i<32} {incr i} {
           if {$word & 1<<$i} {lappend res $pos}
           incr pos
       }
   }
   set res
}
% bit foo 47 1
1
% bit foo 11 1
1
% set foo
2048 32768
% bits $foo
11 47

Sieve of Erastothenes: The following procedure exercises the bit vector functions by letting bits represent integers, and unsetting all that are divisible. The numbers of the bits finally still set are supposed to be primes, and returned:

proc sieve max {
   set maxroot [expr {sqrt($max)}]
   set primes [string repeat " 0xFFFFFFFF" [expr {($max+31)/32}]]
   bit primes 0 0; bit primes 1 0
   for {set i [expr $max+1]} {$i<=(($max+31)/32)*32} {incr i} {
       bit primes $i 0 ;# mask out excess bits
   }
   for {set i 2} {$i<=$maxroot} {incr i} {
      if {[bit primes $i]} {
          for {set j [expr $i<<1]} {$j<=$max} {incr j $i} {
              bit primes $j 0
          }
      }
   }
   bits $primes
}
% time {set res [sieve 10000]}
797000 microseconds per iteration

Here's code to count the number of 1-bits in a bit vector, represented as an integer list. It does so by adding the values of the hex digits:

proc bitcount intlist {
   array set bits {
      0 0  1 1  2 1  3 2  4 1  5 2  6 2  7 3
      8 1  9 2  a 2  b 3  c 2  d 3  e 3  f 4
   }
   set sum 0
   foreach int $intlist {
      foreach nybble [split [format %x $int] ""] {
         incr sum $bits($nybble)
      }
   }
   set sum
}

Stacks and queues

edit

Stacks and queues are containers for data objects with typical access methods:

  • push: add one object to the container
  • pop: retrieve and remove one object from the container

In Tcl it is easiest to implement stacks and queues with lists, and the push method is most naturally lappend, so we only have to code a single generic line for all stacks and queues:

interp alias {} push {} lappend

It is pop operations in which stacks, queues, and priority queues differ:

  • in a stack, the most recently pushed object is retrieved and removed (last in first out, LIFO)
  • in a (normal) queue, it is the least recently pushed object (first in first out, FIFO)
  • in a priority queue, the object with the highest priority comes first.

Priority (a number) has to be assigned at pushing time — by pushing a list of two elements, the item itself and the priority, e.g..

push toDo [list "go shopping" 2]
push toDo {"answer mail" 3}
push toDo {"Tcl coding" 1}  ;# most important thing to do

In a frequent parlage, priority 1 is the "highest", and the number increases for "lower" priorities — but you could push in an item with 0 for "ultrahigh" ;-) Popping a stack can be done like this:

proc pop name {
   upvar 1 $name stack
   set res [lindex $stack end]
   set stack [lrange $stack 0 end-1]
   set res
}

Popping a queue is similarly structured, but with so different details that I found no convenient way to factor out things:

proc qpop name {
   upvar 1 $name queue
   set res [lindex $queue 0]
   set queue [lrange $queue 1 end]
   set res
}

Popping a priority queue requires sorting out which item has highest priority. Sorting can be done when pushing, or when popping, and since our push is so nicely generic I prefer the second choice (as the number of pushs and pops should be about equal, it does not really matter). Tcl's lsort is stable, so items with equal priority will remain in the order in which they were queued:

proc pqpop name {
   upvar 1 $name queue
   set queue [lsort -real -index 1 $queue]
   qpop queue ;# fall back to standard queue, now that it's sorted
}

A practical application is e.g. in state space searching, where the kind of container of the to-do list determines the strategy:

  • stack is depth-first
  • (normal) queue is breadth-first
  • priority queue is any of the more clever ways: A*, Greedy, ...

Recent-use lists: A variation that can be used both in a stack or queue fashion is a list of values in order of their last use (which may come handy in an editor to display the last edited files, for instance). Here, pushing has to be done by dedicated code because a previous instance would have to be removed:

proc rupush {listName value} {
     upvar 1 $listName list
     if {![info exist list]} {set list {}}
     set pos [lsearch $list $value]
     set list [lreplace $list $pos $pos]
     lappend list $value
}
% rupush tmp hello
hello
% rupush tmp world
hello world
% rupush tmp again
hello world again
% rupush tmp world
hello again world

The first element is the least recently, the last the most recently used. Elements are not removed by the popping, but (if necessary) when re-pushing. (One might truncate the list at front if it gets too long).

Functions

edit

Functions in Tcl are typically written with the proc command. But I notice more and more that, on my way to functional programming, my proc bodies are a single call to expr which does all the rest (often with the powerful x?y:z operator). So what about a thin abstraction (wrapper) around this recurring pattern?

proc func {name argl body} {proc $name $argl [list expr $body]}

(I might have called it fun as well... it sure is.) That's all. A collateral advantage is that all expressions are braced, without me having to care. But to not make the page look so empty, here's some examples for func uses:

func fac n     {$n<2? 1: $n*[fac [incr n -1]]}
func gcd {u v} {$u? [gcd [expr $v%$u] $u]: $v}
func min {a b} {$a<$b? $a: $b}
func sgn x     {($x>0)-($x<0)} ;# courtesy rmax

Pity we have to make expr explicit again, in nested calls like in gcd ... But func isn't limited to math functions (which, especially when recursive, come out nice), but for expr uses in testing predicates as well:

func atomar list          {[lindex $list 0] eq $list}
func empty  list          {[llength $list] == 0}
func in    {list element} {[lsearch -exact $list $element] >= 0}
func limit {x min max}    {$x<$min? $min: $x>$max? $max: $x}
func ladd  {list e}       {[in $list $e]? $list: [lappend list $e]}

Exposing expr binary arithmetic operators as Tcl commands goes quite easy too:

foreach op {+ * / %} {func $op {a b} "\$a $op \$b"}

For "-", we distinguish unary and binary form:

func - {a {b ""}} {$b eq ""? -$a: $a-$b}

Having the modulo operator exposed, gcd now looks nicer:

func gcd {u v} {$u? [gcd [% $v $u] $u]: abs($v)}

For unary not I prefer that name to "!", as it might also stand for factorial — and see the shortest function body I ever wrote :^) :

func not x {!$x}

Without big mention, functions implemented by recursion have a pattern for which func is well suited (see fac and gcd above). Another example is this integer range generator (starts from 1, and is inclusive, so [iota1 5] == {1 2 3 4 5}):

func iota1 n {$n == 1? 1: [concat [iota1 [- $n 1]] $n]}

Experiments with Boolean functions

edit

"NAND is not AND." Here are some Tcl codelets to demonstrate how all Boolean operations can be expressed in terms of the single NAND operator, which returns true if not both his two inputs are true (NOR would have done equally well). We have Boolean operators in expr, so here goes:

proc nand {A B} {expr {!($A && $B)}}

The only unary operator NOT can be written in terms of nand:

proc not {A} {nand $A $A}

.. and everything else can be built from them too:

proc and {A B} {not [nand $A $B]}

proc or {A B} {nand [not $A] [not $B]}

proc nor {A B} {not [or $A $B]}

proc eq {A B} {or [and $A $B] [nor $A $B]}

proc ne {A B} {nor [and $A $B] [nor $A $B]}

Here's some testing tools — to see whether an implementation is correct, look at its truth table, here done as the four results for A,B combinations 0,0 0,1 1,0 1,1 — side note: observe how easily functions can be passed in as arguments:

proc truthtable f {
   set res {}
   foreach A {0 1} {
       foreach B {0 1} {
           lappend res [$f $A $B]
       }
   }
   set res
}

% truthtable and
0 0 0 1

% truthtable nand
1 1 1 0

% truthtable or
0 1 1 1

% truthtable nor
1 0 0 0

% truthtable eq
1 0 0 1

To see how efficient the implementation is (in terms of NAND units used), try this, which relies on the fact that Boolean functions contain no lowercase letters apart from the operator names:

proc nandcount f {
   regsub -all {[^a-z]} [info body $f] " " list
   set nums [string map {nand 1 not 1 and 2 nor 4 or 3 eq 6} $list]
   expr [join $nums +]
}

As a very different idea, having nothing to do with NAND as elementary function, the following generic code "implements" Boolean functions very intuitively, by just giving their truth table for look-up at runtime:

proc booleanFunction {truthtable a b} {
   lindex $truthtable [expr {!!$a+!!$a+!!$b}]
}

interp alias {} and  {} booleanFunction {0 0 0 1}
interp alias {} or   {} booleanFunction {0 1 1 1}
interp alias {} nand {} booleanFunction {1 1 1 0}

Solving cryptarithms

edit

Cryptarithms are puzzles where digits are represented by letters, and the task is to find out which. The following "General Problem Solver" (for small values of General) uses heavy metaprogramming: it

  • builds up a nest of foreachs suiting the problem,
  • quick kills (with continue) to force unique values for the variables, and
  • returns the first solution found, or else an empty string:
proc solve {problem {domain0 {0 1 2 3 4 5 6 7 8 9}}} {
   set vars [lsort -u [split [regsub -all {[^A-Z]} $problem ""] ""]]
   set map {= ==}
   set outers {}
   set initials [regexp -all -inline {[^A-Z]([A-Z])} /$problem]
   set pos [lsearch $domain0 0]
   set domain1 [lreplace $domain0 $pos $pos]
   foreach var $vars {
       append body "foreach $var \$domain[expr [lsearch $initials $var]>=0] \{\n"
       lappend map $var $$var
       foreach outer $outers {
           append body "if {$$var eq $$outer} continue\n"
       }
       lappend outers $var
       append epilog \}
   }
   set test [string map $map $problem]
   append body "if {\[expr $test\]} {return \[subst $test\]}" $epilog
   if 1 $body
}

This works fine on some well-known cryptarithms:

% solve SEND+MORE=MONEY
9567+1085==10652

% solve SAVE+MORE=MONEY
9386+1076==10462

% solve YELLOW+YELLOW+RED=ORANGE
143329+143329+846==287504

Database experiments

edit

A simple array-based database

edit

There are lots of complex databases around. Here I want to explore how a database can be implemented in the Tcl spirit of simplicity, and how far that approach takes us. Consider the following model:

  • A database is a set of records
  • A record is a nonempty set of fields with a unique ID
  • A field is a pair of tag and nonempty value, both being strings

Fields may well be implemented as array entries, so we could have an array per record, or better one array for the whole database, where the key is composed of ID and tag. Unique IDs can be had by just counting up (incrementing the highest ID so far). The process of creating a simple database consists only of setting an initial value for the ID:

set db(lastid) 0

Let's consider a library application for an example. Adding a book to the database can be simply done by

set id [incr db(lastid)]
set db($id,author) "Shakespeare, William"
set db($id,title) "The Tempest"
set db($id,printed) 1962
set db($id,label) S321-001

Note that, as we never specified what fields a record shall contain, we can add whatever we see fit. For easier handling, it's a good idea to classify records somehow (we'll want to store more than books), so we add

set db($id,isa) book

Retrieving a record is as easy as this (though the fields come in undefined order):

array get db $id,*

and deleting a record is only slightly more convolved:

foreach i [array names db $id,*] {unset db($i)}

or, even easier and faster from Tcl 8.3 on:

array unset db $id,*

Here's how to get a "column", all fields of a given tag:

array get db *,title

But real columns may have empty fields, which we don't want to store. Retrieving fields that may not physically exist needs a tolerant access function:

proc db'get {_db id field} {
   upvar $_db db
   if {[array names db $id,$field]=="$id,$field"} {
       return $db($id,$field)
   } else {return ""}
}

In a classical database we have to define tables: which fields of what type and of which width. Here we can do what we want, even retrieve which fields we have used so far (using a temporary array to keep track of field names):

proc db'fields {_db} {
  upvar $_db db
  foreach i [array names db *,*] {
     set tmp([lindex [split $i ,] 1]) ""
  }
  lsort [array names tmp]
}

Searching for records that meet a certain condition can be done sequentially. For instance, we want all books printed before 1980:

foreach i [array names *,printed] {
   if {$db($i)<1980} {
       set id [lindex [split $i ,] 0]
       puts "[db'get db $id author]: [db'get db $id title] $db($i)"
   }
}

We might also store our patrons in the same database (here in a different style):

set i [incr $db(lastid)]
array set db [list $i,name "John F. Smith" $i,tel (123)456-7890 $i,isa  patron}

Without a concept of "tables", we can now introduce structures like in relational databases. Assume John Smith borrows "The Tempest". We have the patron's and book's ID in variables and do double bookkeeping:

lappend db($patron,borrowed) $book ;# might have borrowed other books
set db($book,borrower) $patron
set db($book,dueback) 2001-06-12

When he returns the book, the process is reversed:

set pos [lsearch $db($patron,borrowed) $book]
set db($patron,borrowed) [lreplace $db($patron,borrowed) $pos $pos]
unset db($book,borrower) ;# we're not interested in empty fields
unset db($book,dueback)

The dueback field (%Y-%M-%d format is good for sorting and comparing) is useful for checking whether books have not been returned in time:

set today [clock format [clock seconds] -format %Y-%M-%d]]
foreach i [array names db *,dueback] {
   if {$db($i)<$today} {
       set book [lindex [split $i ,] 0] ;# or: set book [idof $i] - see below
       set patron $db($book,borrower)
       #write a letter
       puts "Dear $db($patron,name), "
       puts "please return $db($book,title) which was due on\
       $db($book,dueback)"
   }
}

Likewise, parts of the accounting (e.g. orders to, and bills from, booksellers) can be added with little effort, and cross-related also to external files (just set the value to the filename).

Indexes: As shown, we can retrieve all data by sequential searching over array names. But if the database grows in size, it's a good idea to create indexes which cross-reference tags and values to IDs. For instance, here's how to make an authors' index in four lines:

foreach i [array names db *,author] {
   set book [lindex [split $i ,] 0]
   lappend db(author=[string toupper $db($i)]) $book
}
# and then..
foreach i [lsort [array names db author=SHAK*]] {
   puts "[lindex [split $i =] 1]:" ;# could be wrapped as 'valueof'
   foreach id $db($i) {
       puts "[db'get db $id title] - [db'get db $id label]"
   }
}

gives us a books list of all authors matching the given glob pattern (we reuse Tcl's functionality, instead of reinventing it...). Indexes are useful for repeated information that is likely to be searched. Especially, indexing the isa field allows iterating over "tables" (which we still don't explicitly have!;-):

regsub -all isa= [array names db isa=*] "" tables
foreach patron $db(isa=patron) {...}

And beyond industry-standard SQL, we can search multiple indices in one query:

array names db *=*MARK*

gives you all (case-independent) occurrences of MARK, be it in patron's names, book's authors or titles. As versatile as good old grep...

Persistence: Databases are supposed to exist between sessions, so here's how to save a database to a file:

set fp [open Library.db w]
puts $fp [list array set db [array get db]]
close $fp

and loading a database is even easier (on re-loading, better unset the array before):

source Library.db

If you use characters outside your system encoding (no problem to write Japanese book titles in Kanji), you'll have to fconfigure (e.g -encoding utf-8) on saving and loading, but that's just a few more LOC. Saving also goes a good way to what is ceremonially called "committing" (you'll need write-locking for multi-user systems), while loading (without saving before) might be called a "one-level rollback", where you want to discard your latest changes.

Notice that so far we have only defined one short proc, all other operations were done with built-in Tcl commands only. For clearer code, it is advisable to factor out frequent operations into procs, e.g.

proc idof {index} {lindex [split $index ,] 0}
proc db'add {_db data} {
   upvar $_db db
   set id [incr db(lastid)]
   foreach {tag value} $data {set db($id,$tag) $value}
   # might also update indexes here
}
proc db'tablerow {_db id tags} {
   upvar $_db db
   set res {}
   foreach tag $tags {lappend res [db'get db $id $tag]}
   set res
}

Of course, with growing databases we may reach memory limits: arrays need some extra storage for administration. On the other hand, the present approach is pretty economic, since it does not use field widths (all strings are "shrink-wrapped"), and omits empty fields, while at the same time allowing to add whatever fields you wish. A further optimization could be to tally value strings, and replace the frequent ones with "@$id", where db(@$id) holds the value once, and only db'get has to be adapted to redirect the query.

Also, memory limits on modern computers are somewhere up high... so only at some time in the future you might have (but maybe not want) to change to a complex database ;-)

On the limits: Tcl arrays may get quite large (one app was reported to store 800000 keys in Greek characters), and at some point enumerating all keys with array names db (which produces one long list) may exceed your available memory, causing the process to swap. In that situation, you can fall back to the (otherwise slower, and uglier) use of a dedicated iterator:

set search [array startsearch db]
while {[array anymore db $search]} {
   set key [array nextelement db $search]
   # now do something with db($key) - but see below!
}
array donesearch db $search

But neither can you filter the keys you will get with a glob pattern, nor may you add or delete array elements in the loop — the search will be immediately terminated.

Tables as lists of lists

edit

Tables are understood here as rectangular (matrix) arrangements of data in rows (one row per "item"/"record") and columns (one column per "field"/"element"). They are for instance the building blocks of relational databases and spreadsheets. In Tcl, a sensible implementation for compact data storage would be as a list of lists. This way, they are "pure values" and can be passed e.g. through functions that take a table and return a table. No con-/destructors are needed, in contrast to the heavierweight matrix in Tcllib. I know there are many table implementations in Tcl, but like so often I wanted to build one "with my bare hands" and as simple as possible. As you see below, many functionalities can be "implemented" by just using Tcl's list functions.

A nice table also has a header line, that specifies the field names. So to create such a table with a defined field structure, but no contents yet, one just assigns the header list:

set tbl { {firstname lastname phone}}

Note the double bracing, which makes sure tbl is a 1-element list. Adding "records" to the table is as easy as

lappend tbl {John Smith (123)456-7890}

Make sure the fields (cells) match those in the header. Here single bracing is correct. If a field content contains spaces, it must be quoted or braced too:

lappend tbl {{George W} Bush 234-5678}

Sorting a table can be done with lsort -index, taking care that the header line stays on top:

proc tsort args {
   set table [lindex $args end]
   set header [lindex $table 0]
   set res [eval lsort [lrange $args 0 end-1] [list [lrange $table 1 end]]]
   linsert $res 0 $header
}

Removing a row (or contiguous sequence of rows) by numeric index is a job for lreplace:

set tbl [lreplace $tbl $from $to]

Simple printing of such a table, a row per line, is easy with

puts [join $tbl \n]

Accessing fields in a table is more fun with the field names than the numeric indexes, which is made easy by the fact that the field names are in the first row:

proc t@ {tbl field} {lsearch [lindex $tbl 0] $field}
% t@ $tbl phone
2

You can then access cells:

puts [lindex $tbl $rownumber [t@ $tbl lastname]]

and replace cell contents like this:

lset tbl $rownumber [t@ $tbl phone] (222)333-4567

Here is how to filter a table by giving pairs of field name and glob-style expression — in addition to the header line, all rows that satisfy at least one of those come through (you can force AND behavior by just nesting such calls):

proc trows {tbl args} {
   set conditions {}
   foreach {field condition} $args {
       lappend conditions [t@ $tbl $field] $condition
   }
   set res [list [lindex $tbl 0]]
   foreach row [lrange $tbl 1 end] {
       foreach {index condition} $conditions {
           if [string match $condition [lindex $row $index]] {
              lappend res $row
              break; # one hit is sufficient
           }
       }
   }
   set res
}
% trows $tbl lastname Sm*
{firstname lastname} phone {John Smith (123)456-7890}

This filters (and, if wanted, rearranges) columns, sort of what is called a "view":

proc tcols {tbl args} {
   set indices {}
   foreach field $args {lappend indices [t@ $tbl $field]}
   set res {}
   foreach row $tbl {
       set newrow {}
       foreach index $indices {lappend newrow [lindex $row $index]}
       lappend res $newrow
   }
   set res
}

Programming Languages Laboratory

edit

In the following few chapters you'll see how easy it is to emulate or explore other programming languages with Tcl.

GOTO: a little state machine

edit

The GOTO "jumping" instruction is considered harmful in programming for many years now, but still it might be interesting to experiment with. Tcl has no goto command, but it can easily be created. The following code was created in the Tcl chatroom, instigated by the quote: "A computer is a state machine. Threads are for people who can't program state machines."

So here is one model of a state machine in ten lines of code. The "machine" itself takes a list of alternating labels and state code; if a state code does not end in a goto or break, the same state will be repeated as long as not left, with goto or break (implicit endless loop). The goto command is defined "locally", and deleted after leaving the state machine — it is not meaningfully used outside of it. Execution starts at the first of the states.

proc statemachine states {
   array set S $states
   proc goto label {
       uplevel 1 set this $label
       return -code continue
   }
   set this [lindex $states 0]
   while 1 {eval $S($this)}
   rename goto {}
}

Testing: a tiny state machine that greets you as often as you wish, and ends if you only hit Return on the "how often?" question:

statemachine {
   1 {
       puts "how often?"
       gets stdin nmax
       if {$nmax eq ""} {goto 3}
       set n 0
       goto 2
   } 2 {
       if {[incr n] > $nmax} {goto 1}
       puts "hello"
   } 3 {puts "Thank you!"; break}
}

Playing Assembler

edit

In this weekend fun project to emulate machine language, I picked those parts of Intel 8080A/8085 Assembler (because I had a detailed reference handy) that are easily implemented and still somehow educational (or nostalgic ;-).

Of course this is no real assembler. The memory model is constant-size instructions (strings in array elements), which are implemented as Tcl procs. So an "assembler" program in this plaything will run even slower than in pure Tcl, and consume more memory — while normally you associate speed and conciseness with "real" assembler code. But it looks halfway like the real thing: you get sort of an assembly listing with symbol table, and can run it — I'd hardly start writing an assembler in C, but in Tcl it's fun for a sunny Sunday afternoon... }

namespace eval asm {
   proc asm body {
       variable mem
       catch {unset mem} ;# good for repeated sourcing
       foreach line [split $body \n] {
           foreach i {label op args} {set $i ""}
           regexp {([^;]*);} $line -> line ;# strip off comments
           regexp {^ *(([A-Z0-9]+):)? *([A-Z]*) +(.*)} [string toupper $line]\
                ->  -   label           op       args
                puts label=$label,op=$op,args=$args
           if {$label!=""} {set sym($label) $PC}
           if {$op==""}     continue
           if {$op=="DB"}  {set mem($PC) [convertHex $args]; incr PC; continue}
           if {$op=="EQU"} {set sym($label) [convertHex $args]; continue}
           if {$op=="ORG"} {set PC [convertHex $args]; continue}
           regsub -all ", *" $args " " args ;# normalize commas
           set mem($PC) "$op $args"
           incr PC
       }
       substituteSymbols sym
       dump   sym
   }
   proc convertHex s {
       if [regexp {^([0-9A-F]+)H$} [string trim $s] -> s] {set s [expr 0x$s]}
       set s
   }
   proc substituteSymbols {_sym} {
       variable mem
       upvar $_sym sym
       foreach i [array names mem] {
           set tmp [lindex $mem($i) 0]
           foreach j [lrange $mem($i) 1 end] {
               if {[array names sym $j] eq $j} {set j $sym($j)}
               lappend tmp $j
           }
           set mem($i) $tmp
       }
   }
   proc dump {_sym} {
       variable mem
       upvar $_sym sym
       foreach i [lsort -integer [array names mem]] {
           puts [format "%04d %s" $i $mem($i)]
       }
       foreach i [lsort [array names sym]] {
           puts [format "%-10s: %04x" $i $sym($i)]
       }
   }
   proc run { {pc 255}} {
       variable mem
       foreach i {A B C D E Z} {set ::$i 0}
       while {$pc>=0} {
           incr pc
           #puts "$mem($pc)\tA:$::A B:$::B C:$::C D:$::D E:$::E Z:$::Z"
           eval $mem($pc)
       }
   }
#----------------- "machine opcodes" implemented as procs
   proc ADD  {reg reg2}  {set ::Z [incr ::$reg [set ::$reg2]]}
   proc ADI  {reg value} {set ::Z [incr ::$reg $value]}
   proc CALL {name}      {[string tolower $name] $::A}
   proc DCR  {reg}       {set ::Z [incr ::$reg -1]}
   proc INR  {reg}       {set ::Z [incr ::$reg]}
   proc JMP  where       {uplevel 1 set pc [expr $where-1]}
   proc JNZ  where       {if $::Z {uplevel 1 JMP $where}}
   proc JZ   where       {if !$::Z {uplevel 1 JMP $where}}
   proc MOV  {reg adr}   {variable mem; set ::$reg $mem($adr)}
   proc MVI  {reg value} {set ::$reg $value}
}

Now testing:

asm::asm {
       org  100     ; the canonical start address in CP/M
       jmp  START   ; idiomatic: get over the initial variable(s)
DONE:  equ  0       ; warm start in CP/M ;-)
MAX:   equ  5
INCR:  db   2       ; a variable (though we won't vary it)
;; here we go...
START: mvi  c,MAX   ; set count limit
       mvi  a,0     ; initial value
       mov  b,INCR
LOOP:  call puts    ; for now, fall back to Tcl for I/O
       inr  a
       add  a,b     ; just to make adding 1 more complicated
       dcr  c       ; counting down..
       jnz  LOOP    ; jump on non-zero to LOOP
       jmp  DONE    ; end of program
       end
}

The mov b,INCR part is an oversimplification. For a real 8080, one would have to say

LXI H,INCR ; load double registers H+L with the address INCR
MOV B,M    ; load byte to register B from the address pointed to in HL

Since the pseudo-register M can also be used for writing back, it cannot be implemented by simply copying the value. Rather, one could use read and write traces on variable M, causing it to load from, or store to, mem($HL). Maybe another weekend...

Functional programming (Backus 1977)

edit

John Backus turned 80 these days. For creating FORTRAN and the BNF style of language description, he received the ACM Turing Award in 1977. In his Turing Award lecture,

Can Programming Be Liberated from the von Neumann Style? A Functional Style and Its Algebra of Programs. (Comm. ACM 21.8, Aug. 1978, 613-641)

he developed an amazing framework for functional programming, from theoretical foundations to implementation hints, e.g. for installation, user privileges, and system self-protection. In a nutshell, his FP system comprises

  • a set O of objects (atoms or sequences)
  • a set F of functions that map objects into objects (f : O |-> O}
  • an operation, application (very roughly, eval)
  • a set FF of functional forms, used to combine functions or objects to form new functions in F
  • a set D of definitions that map names to functions in F

I'm far from having digested it all, but like so often, interesting reading prompts me to do Tcl experiments, especially on weekends. I started with Backus' first Functional Program example,

Def Innerproduct = (Insert +) o (ApplyToAll x) o Transpose

and wanted to bring it to life — slightly adapted to Tcl style, especially by replacing the infix operator "o" with a Polish prefix style:

Def Innerproduct = {o {Insert +} {ApplyToAll *} Transpose}

Unlike procs or lambdas, more like APL or RPN, this definition needs no variables — it declares (from right to left) what to do with the input; the result of each step is the input for the next step (to the left of it). In an RPN language, the example might look like this:

/Innerproduct {Transpose * swap ApplyToAll + swap Insert} def

which has the advantage that execution goes from left to right, but requires some stack awareness (and some swaps to set the stack right ;^)

Implementing Def, I took an easy route by just creating a proc that adds an argument and leaves it to the "functional" to do the right thing (with some quoting heaven :-) }

proc Def {name = functional} {
   proc $name x "\[$functional\] \$x"
}

For functional composition, where, say for two functions f and g,

[{o f g} $x] == [f [g $x]]

again a proc is created that does the bracket nesting:

proc o args {
   set body return
   foreach f $args {append body " \[$f"}
   set name [info level 0]
   proc $name x "$body \$x [string repeat \] [llength $args]]"
   set name
}

Why Backus used Transpose on the input, wasn't first clear to me, but as he (like we Tclers) represents a matrix as a list of rows, which are again lists (also known as vectors), it later made much sense to me. This code for transposing a matrix uses the fact that variable names can be any string, including those that look like integers, so the column contents are collected into variables named 0 1 2 ... and finally turned into the result list:

proc Transpose matrix {
   set cols [iota [llength [lindex $matrix 0]]]
   foreach row $matrix {
       foreach element $row col $cols {
           lappend $col $element
       }
   }
   set res {}
   foreach col $cols {lappend res [set $col]}
   set res
}

An integer range generator produces the variable names, e.g iota 3 => {0 1 2}

proc iota n {
   set res {}
   for {set i 0} {$i<$n} {incr i} {lappend res $i}
   set res
}

#-- This "functional form" is mostly called map in more recent FP:
proc ApplyToAll {f list} {
   set res {}
   foreach element $list {lappend res [$f $element]}
   set res
}

...and Insert is better known as fold, I suppose. My oversimple implementation assumes that the operator is one that expr understands:

proc Insert {op arguments} {expr [join $arguments $op]}

#-- Prefix multiplication comes as a special case of this:
interp alias {} * {} Insert *

#-- Now to try out the whole thing:
Def Innerproduct = {o {Insert +} {ApplyToAll *} Transpose}
puts [Innerproduct {{1 2 3} {6 5 4}}]

which returns 28 just as Dr. Backus ordered (= 1*6 + 2*5 + 3*4). Ah, the joys of weekend Tcl'ing... — and belatedly, Happy Birthday, John! :)

Another example, cooked up by myself this time, computes the average of a list. For this we need to implement the construction operator, which is sort of inverse mapping — while mapping a function over a sequence of inputs produces a sequence of outputs of that function applied to each input, Backus' construction maps a sequence of functions over one input to produce a sequence of results of each function to that input, e.g.

[f,g](x) == <f(x),g(x)>

Of course I can't use circumfix brackets as operator name, so let's call it constr:

proc constr args {
   set functions [lrange $args 0 end-1]
   set x [lindex $args end]
   set res {}
   foreach f $functions {lappend res [eval $f [list $x]]}
   set res
}

#-- Testing:
Def mean = {o {Insert /} {constr {Insert +} llength}}
puts [mean {1 2 3 4 5}]

which returns correctly 3. However, as integer division takes place, it would be better to make that

proc double x {expr {double($x)}}

Def mean    = {o {Insert /} {constr {Insert +} dlength}}
Def dlength = {o double llength}

puts [mean {1 2 3 4}]

giving the correct result 2.5. However, the auxiliary definition for dlength cannot be inlined into the definition of mean — so this needs more work... But this version, that maps double first, works:

Def mean = {o {Insert /} {constr {Insert +} llength} {ApplyToAll double}}

One more experiment, just to get the feel:

Def hypot  = {o sqrt {Insert +} {ApplyToAll square}}
Def square = {o {Insert *} {constr id id}}

proc sqrt x {expr {sqrt($x)}}
proc id x   {set x}

puts [hypot {3 4}]

which gives 5.0. Compared to an RPN language, hypot would be

/hypot {dup * swap dup * + sqrt} def

which is shorter and simpler, but meddles more directly with the stack.

An important functional form is the conditional, which at Backus looks like

p1 -> f; p2 -> g; h

meaning, translated to Tcl,

if {[p1 $x]} then {f $x} elseif {[p2 $x]} then {g $x} else {h $x}

Let's try that, rewritten Polish-ly to:

cond p1 f p2 g h

proc cond args {
   set body ""
   foreach {condition function} [lrange $args 0 end-1] {
       append body "if {\[$condition \$x\]} {$function \$x} else"
   }
   append body " {[lindex $args end] \$x}"
   set name [info level 0]
   proc $name x $body
   set name
}

#-- Testing, with K in another role as Konstant function :)
Def abs = {cond {> 0} -- id}

proc > {a b} {expr {$a>$b}}
proc < {a b} {expr {$a<$b}}
proc -- x {expr -$x}
puts [abs -42],[abs 0],[abs 42]

Def sgn = {cond {< 0} {K 1} {> 0} {K -1} {K 0}}
proc K {a b} {set a}

puts [sgn 42]/[sgn 0]/[sgn -42]

#--Another famous toy example, reading a file's contents:
Def readfile = {o 1 {constr read close} open}

#--where Backus' selector (named just as integer) is here:
proc 1 x {lindex $x 0}

Reusable functional components

edit

Say you want to make a multiplication table for an elementary school kid near you. Easily done in a few lines of Tcl code:

proc multable {rows cols} {
   set res ""
   for {set i 1} {$i <= $rows} {incr i} {
       for {set j 1} {$j <= $cols} {incr j} {
           append res [format %4d [expr {$i*$j}]]
       }
       append res \n
   }
   set res
}

The code does not directly puts its results, but returns them as a string — you might want to do other things with it, e.g. save it to a file for printing. Testing:

% multable 3 10
  1   2   3   4   5   6   7   8   9  10
  2   4   6   8  10  12  14  16  18  20
  3   6   9  12  15  18  21  24  27  30

Or print the result directly from wish:

 catch {console show}
 puts "[multable 3 10]"

Here's a different way to do it à la functional programming:

proc multable2 {rows cols} {
   formatMatrix %4d [outProd * [iota 1 $rows] [iota 1 $cols]]
}

The body is nice and short, but consists of all unfamiliar commands. They are however better reusable than the multable proc above. The first formats a matrix (a list of lists to Tcl) with newlines and aligned columns for better display:

proc formatMatrix {fm matrix} {
   join [lmap row $matrix {join [lmap i $row {format $fm $i}] ""}] \n
}

Short again, and slightly cryptic, as is the "outer product" routine, which takes a function f and two vectors, and produces a matrix where f was applied to every pair of a x b — in APL they had special compound operators for this job, in this case "°.x":

proc outProd {f a b} {
   lmap i $a {lmap j $b {$f $i $j}}
}

Again, lmap (the collecting foreach) figures prominently, so here it is in all its simplicity:

proc lmap {_var list body} {
   upvar 1 $_var var
   set res {}
   foreach var $list {lappend res [uplevel 1 $body]}
   set res
}

#-- We need multiplication from expr exposed as a function:
proc * {a b} {expr {$a * $b}}

#-- And finally, iota is an integer range generator:
proc iota {from to} {
   set res {}
   while {$from <= $to} {lappend res $from; incr from}
   set res
}

With these parts in place, we can see that multable2 works as we want:

% multable2 3 10
  1   2   3   4   5   6   7   8   9  10
  2   4   6   8  10  12  14  16  18  20
  3   6   9  12  15  18  21  24  27  30

So why write six procedures, where one did the job already? A matter of style and taste, in a way — multable is 10 LOC and depends on nothing but Tcl, which is good; multable2 describes quite concisely what it does, and builds on a few other procs that are highly reusable.

Should you need a unit matrix (where the main diagonal is 1, and the rest is 0), just call outProd with a different function (equality, ==):

% outProd == [iota 1 5] [iota 1 5]
{1 0 0 0 0} {0 1 0 0 0} {0 0 1 0 0} {0 0 0 1 0} {0 0 0 0 1}

which just requires expr's equality to be exposed too:

proc == {a b} {expr {$a == $b}}

One of the fascinations of functional programming is that one can do the job in a simple and clear way (typically a one-liner), while using a collection of reusable building-blocks like lmap and iota. And formatMatrix and outProd are so general that one might include them in some library, while the task of producing a multiplication table may not come up any more for a long time...

Modelling an RPN language

edit

Tcl follows strictly the Polish notation, where an operator or function always precedes its arguments. It is however easy to build an interpreter for a language in Reverse Polish Notation (RPN) like Forth, Postscript, or Joy, and experiment with it.

The "runtime engine" is just called "r" (not to be confused with the R language), and it boils down to a three-way switch done for each word, in eleven lines of code:

  • "tcl" evaluates the top of stack as a Tcl script
  • known words in the ::C array are recursively evaluated in "r"
  • other words are just pushed

Joy's rich quoting for types ([list], {set}, "string", 'char) conflict with the Tcl parser, so lists in "r" are {braced} if their length isn't 1, and (parenthesized) if it is — but the word shall not be evaluated now. This looks better to me than /slashing as in Postscript.

As everything is a string, and to Tcl "a" is {a} is a , Joy's polymorphy has to be made explicit. I added converters between characters and integers, and between strings and lists (see the dictionary below). For Joy's sets I haven't bothered yet — they are restricted to the domain 0..31, probably implemented with bits in a 32-bit word.

Far as this is from Joy, it was mostly triggered by the examples in Manfred von Thun's papers, so I tongue-in-cheek still call it "Pocket Joy" — it was for me, at last, on the iPaq... The test suite at end should give many examples of what one can do in "r". }

proc r args {
   foreach a $args {
     dputs [info level]:$::S//$a
     if {$a eq "tcl"} {
             eval [pop]
     } elseif [info exists ::C($a)] {
             eval r $::C($a)
     } else {push [string trim $a ()]}
   }
   set ::S
}

# That's it. Stack (list) and Command array are global variables:

set S {}; unset C

#-- A tiny switchable debugger:

proc d+ {} {proc dputs s {puts $s}}
proc d- {}  {proc dputs args {}}
d- ;#-- initially, debug mode off

Definitions are in Forth style — ":" as initial word, as they look much more compact than Joy's DEFINE n == args;

proc : {n args} {set ::C($n) $args}

expr functionality is exposed for binary operators and one-arg functions:

proc 2op op {
   set t [pop]
   push [expr {[pop]} $op {$t}]
}
foreach op {+ - * / > >= != <= <} {: $op [list 2op $op] tcl}
: =    {2op ==} tcl

proc 1f  f {push [expr $f\([pop])]}
foreach f {abs double exp int sqrt sin cos acos tan} {: $f [list 1f $f] tcl}

interp alias {} pn {} puts -nonewline

#----- The dictionary has all one-liners:
: .      {pn "[pop] "} tcl
: .s    {puts $::S} tcl
: '      {push [scan [pop] %c]} tcl   ;# char -> int
: `     {push [format %c [pop]]} tcl  ;# int -> char
: and  {2op &&} tcl
: at     1 - swap {push [lindex [pop] [pop]]} tcl
: c      {set ::S {}} tcl ;# clear stack
: choice {choice [pop] [pop] [pop]} tcl
: cleave {cleave [pop] [pop] [pop]} tcl
: cons {push [linsert [pop] 0 [pop]]} tcl
: dup  {push [set x [pop]] $x} tcl
: dupd {push [lindex $::S end-1]} tcl
: emit {pn [format %c [pop]]} tcl
: even  odd not
: explode  {push [split [pop] ""]} tcl  ;# string -> char list
: fact  1 (*) primrec
: filter  split swap pop
: first  {push [lindex [pop] 0]} tcl
: fold  {rfold [pop] [pop] [pop]} tcl
: gcd  swap {0 >} {swap dupd rem swap gcd} (pop) ifte
: has  swap in
: i      {eval r [pop]} tcl
: ifte   {rifte [pop] [pop] [pop]} tcl
: implode  {push [join [pop] ""]} tcl ;# char list -> string
: in  {push [lsearch [pop] [pop]]} tcl 0 >=
: map  {rmap [pop] [pop]} tcl
: max  {push [max [pop] [pop]]} tcl
: min  {push [min [pop] [pop]]} tcl
: newstack  c
: not   {1f !} tcl
: odd  2 rem
: of  swap at
: or    {2op ||} tcl
: pop  (pop) tcl
: pred 1 -
: primrec {primrec [pop] [pop] [pop]} tcl
: product 1 (*) fold
: qsort (lsort) tcl
: qsort1 {lsort -index 0} tcl
: rem  {2op %} tcl
: rest  {push [lrange [pop] 1 end]} tcl
: reverse {} swap (swons) step
: set  {set ::[pop] [pop]} tcl
: $     {push [set ::[pop]]} tcl
: sign  {0 >}  {0 <} cleave -
: size  {push [llength [pop]]} tcl
: split  {rsplit [pop] [pop]} tcl
: step  {step [pop] [pop]} tcl
: succ  1 +
: sum   0 (+) fold
: swap  {push [pop] [pop]} tcl
: swons  swap cons
: xor  !=

Helper functions written in Tcl:

proc rifte {else then cond} {
   eval r dup $cond
   eval r [expr {[pop]? $then: $else}]
}
proc choice {z y x} {
   push [expr {$x? $y: $z}]
}
proc cleave { g f x} {
   eval [list r $x] $f [list $x] $g
}
proc max {x y} {expr {$x>$y?$x:$y}}
proc min {x y} {expr {$x<$y? $x:$y}}
proc rmap {f list} {
   set res {}
   foreach e $list {
      eval [list r $e] $f
      lappend res [pop]
   }
   push $res
}
proc step {f list} {
   foreach e $list {eval [list r ($e)] $f}
}
proc rsplit {f list} {
   foreach i {0 1} {set $i {}}
   foreach e $list {
      eval [list r $e] $f
      lappend [expr {!![pop]}] $e
   }
   push $0 $1
}
proc primrec {f init n} {
   if {$n>0} {
      push $n
      while {$n>1} {
          eval [list r [incr n -1]] $f
      }
   } else {push $init}
}
proc rfold {f init list} {
   push $init
   foreach e $list {eval [list r $e] $f}
}

#------------------ Stack routines
proc push args {
  foreach a $args {lappend ::S $a}
}
proc pop {} {
   if [llength $::S] {
      K [lindex $::S end] \
         [set ::S [lrange $::S 0 end-1]]
   } else {error "stack underflow"}
}
proc K {a b} {set a}

#------------------------ The test suite:
proc ? {cmd expected} {
   catch {uplevel 1 $cmd} res
   if {$res ne $expected} {puts "$cmd->$res, not $expected"}
}
? {r 2 3 +} 5
? {r 2 *}   10
? {r c 5 dup *} 25
: sqr dup *
: hypot sqr swap sqr + sqrt
? {r c 3 4 hypot} 5.0
? {r c {1 2 3} {dup *} map} { {1 4 9}}
? {r size} 3
? {r c {2 5 3} 0 (+) fold} 10
? {r c {3 4 5} product} 60
? {r c {2 5 3} 0 {dup * +} fold} 38
? {r c {1 2 3 4} dup sum swap size double /} 2.5
? {r c {1 2 3 4} (sum)  {size double} cleave /} 2.5
: if0 {1000 >} {2 /} {3 *} ifte
? {r c 1200 if0} 600
? {r c 600 if0}  1800
? {r c 42 sign}   1
? {r c 0 sign}     0
? {r c -42 sign} -1
? {r c 5 fact} 120
? {r c 1 0 and} 0
? {r c 1 0 or}   1
? {r c 1 0 and not} 1
? {r c 3 {2 1} cons} { {3 2 1}}
? {r c {2 1} 3 swons} { {3 2 1}}
? {r c {1 2 3} first} 1
? {r c {1 2 3} rest} { {2 3}}
? {r c {6 1 5 2 4 3} {3 >} filter} { {6 5 4}}
? {r c 1 2 {+ 20 * 10 4 -} i} {60 6}
? {r c 42 succ} 43
? {r c 42 pred} 41
? {r c {a b c d} 2 at} b
? {r c 2 {a b c d} of} b
? {r c 1 2 pop} 1
? {r c A ' 32 + succ succ `} c
? {r c {a b c d} reverse} { {d c b a}}
? {r c 1 2 dupd} {1 2 1}
? {r c 6 9 gcd} 3
? {r c true yes no choice} yes
? {r c false yes no choice} no
? {r c {1 2 3 4} (odd) split} { {2 4} {1 3}}
? {r c a {a b c} in} 1
? {r c d {a b c} in} 0
? {r c {a b c} b has} 1
? {r c {a b c} e has} 0
? {r c 3 4 max} 4
? {r c 3 4 min}  3
? {r c hello explode reverse implode} olleh
: palindrome dup explode reverse implode =
? {r c hello palindrome} 0
? {r c otto palindrome}  1

#-- reading (varname $) and setting (varname set) global Tcl vars
set tv 42
? {r c (tv) $ 1 + dup (tv) set} 43
? {expr $tv==43} 1

Tacit programming

edit

The J programming language is the "blessed successor" to APL, where "every function is an infix or prefix operator", x?y (dyadic) or ?y (monadic), for ? being any pre- or user-defined function).

"Tacit programming" (tacit: implied; indicated by necessary connotation though not expressed directly) is one of the styles possible in J, and means coding by combining functions, without reference to argument names. This idea may have been first brought up in Functional programming (Backus 1977), if not in Forth and Joy, and it's an interesting simplification compared to the lambda calculus.

For instance, here's a breathtakingly short J program to compute the mean of a list of numbers:

mean=.+/%#

Let's chew this, byte by byte :)

=.   is assignment to a local variable ("mean") which can be called
+/%# is the "function body"
+    (dyadic) is addition
/    folds the operator on its left over the list on its right
+/   hence being the sum of a list
%    (dyadic) is division, going double on integer arguments when needed
#    (monadic) is tally, like Tcl's [llength] resp. [string length]

Only implicitly present is a powerful function combinator called "fork". When J parses three operators in a row, gfh, where f is dyadic and g and h are monadic, they are combined like the following Tcl version does:

proc fork {f g h x} {$f [$g $x] [$h $x]}

In other words, f is applied to the results of applying g and h to the single argument. Note that +/ is considered one operator, which applies the "adverb" folding to the "verb" addition (one might well call it "sum"). When two operands occur together, the "hook" pattern is implied, which might in Tcl be written as:

proc hook {f g x} {$f $x [$g $x]}

As KBK pointed out in the Tcl chatroom, the "hook" pattern corresponds to Schönfinkel/Curry's S combinator (see Hot Curry and Combinator Engine), while "fork" is called S' there.

Unlike in earlier years when I was playing APL, this time my aim was not to parse and emulate J in Tcl — I expected hard work for a dubitable gain, and this is a weekend fun project after all. I rather wanted to explore some of these concepts and how to use them in Tcl, so that in slightly more verbose words I could code (and call)

Def mean = fork /. sum llength

following Backus' FP language with the "Def" command. So let's get the pieces together. My "Def" creates an interp alias, which is a good and simple Tcl way to compose partial scripts (the definition, here) with one or more arguments, also known as "currying":

proc Def {name = args} {eval [list interp alias {} $name {}] $args}

The second parameter, "=", is for better looks only and evidently never used.

Testing early and often is a virtue, as is documentation — to make the following code snippets clearer, I tuned my little tester for better looks, so that the test cases in the source code also serve as well readable examples — they look like comments but are code! The cute name "e.g." was instigated by the fact that in J, "NB." is used as comment indicator, both being well known Latin abbreviations:

proc e.g. {cmd -> expected} {
   catch {uplevel 1 $cmd} res
   if {$res != $expected} {puts "$cmd -> $res, not $expected"}
}

Again, the "->" argument is for eye-candy only — but it feels better to me at least. See the examples soon to come.

For recursive functions and other arithmetics, func makes better reading, by accepting expr language in the body:

proc func {name argl body} {proc $name $argl [list expr $body]}

We'll use this to turn expr's infix operators into dyadic functions, plus the "slashdot" operator that makes division always return a real number, hence the dot :

foreach op {+ &mdash; * /} {func $op {a b} "\$a $op \$b"}
        e.g. {+ 1 2} -> 3
        e.g. {/ 1 2} -> 0        ;# integer division
func /. {a b} {double($a)/$b}
        e.g. {/. 1 2} -> 0.5     ;# "real" division

#-- Two abbreviations for frequently used list operations:
proc head list {lindex $list 0}
          e.g. {head {a b c}} -> a
proc tail list {lrange $list 1 end}
          e.g. {tail {a b c}} -> {b c}

For "fold", this time I devised a recursive version:

func fold {neutral op list} {
   $list eq [] ? $neutral
   : [$op [head $list] [fold $neutral $op [tail $list]]]
}
        e.g. {fold 0 + {1 2 3 4}} -> 10

#-- A "Def" alias does the same job:
Def sum = fold 0 +
        e.g. {sum      {1 2 3 4}} -> 10

#-- So let's try to implement "mean" in tacit Tcl!
Def mean = fork /. sum llength
         e.g. {mean {1 2 3 40}} -> 11.5

Tacit enough (one might have picked fancier names like +/ for "sum" and # as alias for llength), but in principle it is equivalent to the J version, and doesn't name a single argument. Also, the use of llength demonstrates that any good old Tcl command can go in here, not just the artificial Tacit world that I'm just creating...

In the next step, I want to reimplement the "median" function, which for a sorted list returns the central element if its length is odd, or the mean of the two elements adjacent to the (virtual) center for even length. In J, it looks like this:

median=.(mean@:\{~medind@#)@sortu
medind=.((<.,>.)@half) ` half @.(2&|)
half=.-:@<:                        NB. halve one less than rt. argument
sortu=.\{~/:                       NB. sort upwards

which may better explain why I wouldn't want to code in J :^) J has ASCIIfied the zoo of APL strange character operators, at the cost of using braces and brackets as operators too, without regard for balancing, and extending them with dots and colons, so e.g.

-   monadic: negate; dyadic: minus
-.  monadic: not
-:  monadic: halve

J code sometimes really looks like an accident in a keyboard factory... I won't go into all details of the above code, just some:

@ ("atop") is strong linkage, sort of functional composition
<. (monadic) is floor()
>. (monadic) is ceil()

(<.,>.) is building a list of the floor and the ceiling of its single argument, the comma being the concatenation operator here, comparable to Backus' "construction" or Joy's cleave. The pattern

a ` b @. c

is a kind of conditional in J, which could in Tcl be written

if {[$c $x]} {$a $x} else {$b $x}

but my variant of the median algorithm doesn't need a conditional — for lists of odd length it just uses the central index twice, which is idempotent for "mean", even if a tad slower.

J's "from" operator { takes zero or more elements from a list, possibly repeatedly. For porting this, lmap is a good helper, even though not strictly functional:

proc lmap {_v list body} {
   upvar 1 $_v v
   set res {}
   foreach v $list {lappend res [uplevel 1 $body]}
   set res
}
e.g. {lmap i {1 2 3 4} {* $i $i}} -> {1 4 9 16}

#-- So here's my 'from':
proc from {indices list} {lmap i $indices {lindex $list $i}}
          e.g. {from {1 0 0 2} {a b c}} -> {b a a c}

We furtheron borrow some more content from expr:

func ceil  x {int(ceil($x))}
func floor x {int(floor($x))}
   e.g. {ceil 1.5}  -> 2
   e.g. {floor 1.5} -> 1
   e.g. {fork list floor ceil 1.5} -> {1 2}

We'll need functional composition, and here's a recursive de-luxe version that takes zero or more functions, hence the name o*:

func o* {functions x} {
   $functions eq []? $x
   : [[head $functions] [o* [tail $functions] $x]]
}
e.g. {o* {} hello,world} -> hello,world

Evidently, identity as could be written

proc I x {set x}

is the neutral element of variadic functional composition, when called with no functions at all.

If composite functions like 'fork' are arguments to o*, we'd better let unknown know that we want auto-expansion of first word:

proc know what {proc unknown args $what\n[info body unknown]}
know {
   set cmd [head $args]
   if {[llength $cmd]>1} {return [eval $cmd [tail $args]]}
}

Also, we need a numeric sort that's good for integers as well as reals ("Def" serves for all kinds of aliases, not just combinations of functions):

Def sort = lsort -real
         e.g. {sort {2.718 10 1}} -> {1 2.718 10}
         e.g. {lsort {2.718 10 1}} -> {1 10 2.718} ;# lexicographic

#-- And now for the median test:
Def median = o* {mean {fork from center sort}}
Def center = o* {{fork list floor ceil} {* 0.5} -1 llength}

func -1 x {$x &mdash; 1}
        e.g. {-1 5} -> 4 ;# predecessor function, when for integers

#-- Trying the whole thing out:
e.g. {median {1 2 3 4 5}} -> 3
e.g. {median {1 2 3 4}}   -> 2.5

As this file gets tacitly sourced, I am pretty confident that I've reached my goal for this weekend — even though my median doesn't remotely look like the J version: it is as "wordy" as Tcl usually is. But the admittedly still very trivial challenge was met in truly function-level style, concerning the definitions of median, center and mean — no variable left behind. And that is one, and not the worst, Tcl way of Tacit programming...

Vector arithmetics

edit

APL and J (see Tacit programming) have the feature that arithmetics can be done with vectors and arrays as well as scalar numbers, in the varieties (for any operator @):

  • scalar @ scalar → scalar (like expr does)
  • vector @ scalar → vector
  • scalar @ vector → vector
  • vector @ vector → vector (all of same dimensions, element-wise)

Here's experiments how to do this in Tcl. First lmap is a collecting foreach — it maps the specified body over a list:

proc lmap {_var list body} {
    upvar 1 $_var var
    set res {}
    foreach var $list {lappend res [uplevel 1 $body]}
    set res
}

#-- We need basic scalar operators from expr factored out:
foreach op {+ - * / % ==} {proc $op {a b} "expr {\$a $op \$b}"}

The following generic wrapper takes one binary operator (could be any suitable function) and two arguments, which may be scalars, vectors, or even matrices (lists of lists), as it recurses as often as needed. Note that as my lmap above only takes one list, the two-list case had to be made explicit with foreach.

proc vec {op a b} {
    if {[llength $a] == 1 && [llength $b] == 1} {
        $op $a $b
    } elseif {[llength $a]==1} {
        lmap i $b {vec $op $a $i}
    } elseif {[llength $b]==1} {
        lmap i $a {vec $op $i $b}
    } elseif {[llength $a] == [llength $b]} {
        set res {}
        foreach i $a j $b {lappend res [vec $op $i $j]}
        set res
    } else {error "length mismatch [llength $a] != [llength $b]"}
}

Tests are done with this minimal "framework":

proc e.g. {cmd -> expected} {
    catch $cmd res
    if {$res ne $expected} {puts "$cmd -> $res, not $expected"}
}

Scalar + Scalar

e.g. {vec + 1 2} -> 3

Scalar + Vector

e.g. {vec + 1 {1 2 3 4}} -> {2 3 4 5}

Vector / Scalar

e.g. {vec / {1 2 3 4} 2.} -> {0.5 1.0 1.5 2.0}

Vector + Vector

e.g. {vec + {1 2 3} {4 5 6}} -> {5 7 9}

Matrix * Scalar

e.g. {vec * {{1 2 3} {4 5 6}} 2} -> {{2 4 6} {8 10 12}}

Multiplying a 3x3 matrix with another:

e.g. {vec * {{1 2 3} {4 5 6} {7 8 9}} {{1 0 0} {0 1 0} {0 0 1}}} -> \
 {{1 0 0} {0 5 0} {0 0 9}}

The dot product of two vectors is a scalar. That's easily had too, given a sum function:

proc sum list {expr [join $list +]+0}
sum [vec * {1 2} {3 4}]

should result in 11 (= (1*3)+(2*4)).

Here's a little application for this: a vector factorizer, that produces the list of divisors for a given integer. For this we again need a 1-based integer range generator:

proc iota1 x {
    set res {}
    for {set i 1} {$i<=$x} {incr i} {lappend res $i}
    set res
}
e.g. {iota1 7}           -> {1 2 3 4 5 6 7}

#-- We can compute the modulo of a number by its index vector:
e.g. {vec % 7 [iota1 7]} -> {0 1 1 3 2 1 0}

#-- and turn all elements where the remainder is 0 to 1, else 0:
e.g. {vec == 0 [vec % 7 [iota1 7]]} -> {1 0 0 0 0 0 1}

At this point, a number is prime if the sum of the latest vector is 2. But we can also multiply out the 1s with the divisors from the i ndex vector:

e.g. {vec * [iota1 7] [vec == 0 [vec % 7 [iota1 7]]]} -> {1 0 0 0 0 0 7}

#-- Hence, 7 is only divisible by 1 and itself, hence it is a prime.
e.g. {vec * [iota1 6] [vec == 0 [vec % 6 [iota1 6]]]} -> {1 2 3 0 0 6}

So 6 is divisible by 2 and 3; non-zero elements in (lrange $divisors 1 end-1) gives the "proper" divisors. And three nested calls to vec are sufficient to produce the divisors list :)

Just for comparison, here's how it looks in J:

   iota1=.>:@i.
   iota1 7
1 2 3 4 5 6 7
   f3=.iota1*(0&=@|~iota1)
   f3 7
1 0 0 0 0 0 7
   f3 6
1 2 3 0 0 6

Integers as Boolean functions

edit

Boolean functions, in which arguments and result are in the domain {true, false}, or {1, 0} as expr has it, and operators are e.g. {AND, OR, NOT} resp. {&&, ||, !}, can be represented by their truth table, which for example for {$a && $b} looks like:

a b  a&&b
0 0  0
1 0  0
0 1  0
1 1  1

As all but the last column just enumerate all possible combinations of the arguments, first column least-significant, the full representation of a&&b is the last column, a sequence of 0s and 1s which can be seen as binary integer, reading from bottom up: 1 0 0 0 == 8. So 8 is the associated integer of a&&b, but not only of this — we get the same integer for !(!a || !b), but then again, these functions are equivalent.

To try this in Tcl, here's a truth table generator that I borrowed from a little proving engine, but without the lsort used there — the order of cases delivered makes best sense when the first bit is least significant: }

proc truthtable n {
   # make a list of 2**n lists, each with n truth values 0|1
   set res {}
   for {set i 0} {$i < (1<<$n)} {incr i} {
       set case {}
       for {set j  0} {$j <$n} {incr j } {
           lappend case [expr {($i & (1<<$j)) != 0}]
       }
       lappend res $case
   }
   set res
}

Now we can write n(f), which, given a Boolean function of one or more arguments, returns its characteristic number, by iterating over all cases in the truth table, and setting a bit where appropriate:

proc n(f) expression {
   set vars [lsort -unique [regsub -all {[^a-zA-Z]} $expression " "]]
   set res 0
   set bit 1
   foreach case [truthtable [llength $vars]] {
       foreach $vars $case break
       set res [expr $res | ((($expression)!=0)*$bit)]
       incr bit $bit ;#-- <<1, or *2
   }
   set res
}

Experimenting:

% n(f) {$a && !$a} ;#-- contradiction is always false
0
% n(f) {$a || !$a} ;#-- tautology is always true
3
% n(f) {$a}        ;#-- identity is boring
2
% n(f) {!$a}       ;#-- NOT
1
% n(f) {$a && $b}  ;#-- AND
8
% n(f) {$a || $b}  ;#-- OR
14
% n(f) {!($a && $b)} ;#-- de Morgan's laws:
7
% n(f) {!$a || !$b}  ;#-- same value = equivalent
7

So the characteristic integer is not the same as the Goedel number of a function, which would encode the structure of operators used there.

% n(f) {!($a || $b)} ;#-- interesting: same as unary NOT
1
% n(f) {!$a && !$b}
1

Getting more daring, let's try a distributive law:

% n(f) {$p && ($q || $r)}
168
% n(f) {($p && $q) || ($p && $r)}
168

Daring more: what if we postulate the equivalence?

% n(f) {(($p && $q) || ($p && $r)) == ($p && ($q || $r))}
255

Without proof, I just claim that every function of n arguments whose characteristic integer is 2^(2^n) — 1 is a tautology (or a true statement — all bits are 1). Conversely, postulating non-equivalence turns out to be false in all cases, hence a contradiction:

% n(f) {(($p && $q) || ($p && $r)) != ($p && ($q || $r))}
0

So again, we have a little proving engine, and simpler than last time.

In the opposite direction, we can call a Boolean function by its number and provide one or more arguments — if we give more than the function can make sense of, non-false excess arguments lead to constant falsity, as the integer can be considered zero-extended:

proc f(n) {n args} {
   set row 0
   set bit 1
   foreach arg $args {
       set row [expr {$row | ($arg != 0)*$bit}]
       incr bit $bit
   }
   expr !!($n &(1<<$row))
}

Trying again, starting at OR (14):

% f(n) 14 0 0
0
% f(n) 14 0 1
1
% f(n) 14 1 0
1
% f(n) 14 1 1
1

So f(n) 14 indeed behaves like the OR function — little surprise, as its truth table (the results of the four calls), read bottom-up, 1110, is decimal 14 (8 + 4 + 2). Another test, inequality:

% n(f) {$a != $b}
6
% f(n) 6 0 0
0
% f(n) 6 0 1
1
% f(n) 6 1 0
1
% f(n) 6 1 1
0

Trying to call 14 (OR) with more than two args:

% f(n) 14 0 0 1
0
% f(n) 14 0 1 1
0
53 % f(n) 14 1 1 1
0

The constant 0 result is a subtle indication that we did something wrong :)

Implication (if a then b, a -> b) can in expr be expressed as $a <= $b — just note that the "arrow" seems to point the wrong way. Let's try to prove "Modus Barbara" — "if a implies b and b implies c, then a implies c":

% n(f) {(($a <= $b) && ($b <= $c)) <= ($a <= $c)}
255

With less abstract variable names, one might as well write

% n(f) {(($Socrates <= $human) && ($human <= $mortal)) <= ($Socrates <= $mortal)}
255

But this has been verified long ago, by Socrates' death :^)

Let unknown know

edit

To extend Tcl, i.e. to make it understand and do things that before raised an error, the easiest way is to write a proc. Any proc must however be called in compliance with Tcl's fundamental syntax: first word is the command name, then the arguments separated by whitespace. Deeper changes are possible with the unknown command, which is called if a command name is, well, unknown, and in the standard version tries to call executables, to auto-load scripts, or do other helpful things (see the file init.tcl). One could edit that file (not recommended), or rename unknown to something else and provide one's own unknown handler, that falls through to the original proc if unsuccessful, as shown in Radical language modification.

Here is a simpler way that allows to extend unknown "in place" and incrementally: We let unknown "know" what action it shall take under what conditions. The know command is called with a condition that should result in an integer when given to expr, and a body that will be executed if cond results in nonzero, returning the last result if not terminated with an explicit return. In both cond and body you may use the variable args that holds the problem command unknown was invoked with.

proc know what {
   if ![info complete $what] {error "incomplete command(s) $what"}
   proc unknown args $what\n[info body unknown]
} ;# RS

The extending code what is prepended to the previous unknown body. This means that subsequent calls to know stack up, last condition being tried first, so if you have several conditions that fire on the same input, let them be "known" from generic to specific.

Here's a little debugging helper, to find out why "know" conditions don't fire:

proc know? {} {puts [string range [info body unknown] 0 511]}

Now testing what new magic this handful of code allows us to do. This simple example invokes expr if the "command" is digestible for it:

% know {if {![catch {expr $args} res]} {return $res}}
% 3+4
7

If we had no if

edit

Imagine the makers of Tcl had failed to provide the if command. All the rest would be there. Doing more steps towards functional programming, I came upon this interesting problem, and will shortly demonstrate that it can easily be solved in pure-Tcl.

We still have the canonical truth values 0 and 1 as returned from expr with a comparison operator. The idea in the paper I read is to use them as names of very simple functions:

proc 0 {then else} {uplevel 1 $else}
proc 1 {then else} {uplevel 1 $then} ;# the famous K combinator

Glory be to the 11 rules of man Tcl that this is already a crude though sufficient reimplementation:

set x 42
[expr $x<100] {puts Yes} {puts No}

The bracketed expr command is evaluated first, returning 0 or 1 as result of the comparison. This result (0 or 1) is substituted for the first word of this command. The other words (arguments) are not substituted because they're curly-braced, so either 0 or 1 is invoked, and does its simple job. (I used uplevel instead of eval to keep all side effects in caller's scope). Formally, what happened to the bracketed call is that it went through "applicative order" evaluation (i.e., do it now), while the braced commands wait for "normal order" evaluation (i.e., do when needed, maybe never — the need is expressed through eval/upvar or similar commands).

Though slick at first sight, we actually have to type more. As a second step, we create the If command that wraps the expr invocation:

proc If {cond then else} {
   [uplevel 1 [list expr ($cond)!=0]] {uplevel 1 $then} {uplevel 1 $else}
}
If {$x>40} {puts Indeed} {puts "Not at all"}

This again passes impromptu tests, and adds the feature that any non-zero value counts as true and returns 1 — if we neglect the other syntactic options of if, especially the elseif chaining. However, this is no fundamental problem — consider that

if A then B elseif C then D else E

can be rewritten as

if A then B else {if C then D else E}

so the two-way If is about as mighty as the real thing, give or take a few braces and redundant keywords (then, else).

Luckily we have an if in Tcl (and it certainly fares better in byte-code compilation), but on leisurely evenings it's not the microseconds that count (for me at least) — it's rather reading on the most surprising (or fundamental) ideas, and demonstrating how easily Tcl can bring them to life...

Brute force meets Goedel

edit

Never afraid of anything (as long as everything is a string), a discussion in the Tcl chatroom brought me to try the following: let the computer write ("discover") its own software, only given specifications of input and output. In truly brute force, up to half a million programs are automatically written and (a suitable subset of them) tested to find the one that passes the tests.

To make things easier, this flavor of "software" is in a very simple RPN language similar to, but much smaller than, the one presented in Playing bytecode: stack-oriented like Forth, each operation being one byte (ASCII char) wide, so we don't even need whitespace in between. Arguments are pushed on the stack, and the result of the "software", the stack at end, is returned. For example, in

ebc ++ 1 2 3

execution of the script "++" should sum its three arguments (1+(2+3)), and return 6.

Here's the "bytecode engine" (ebc: execute byte code), which retrieves the implementations of bytecodes from the global array cmd:

proc ebc {code argl} {
   set ::S $argl
   foreach opcode [split $code ""] {
       eval $::cmd($opcode)
   }
   set ::S
}

Let's now populate the bytecode collection. The set of all defined bytecodes will be the alphabet of this little RPN language. It may be interesting to note that this language has truly minimal syntax — the only rule is: each script ("word") composed of any number of bytecodes is well-formed. It just remains to check whether it does what we want.

Binary expr operators can be treated generically:

foreach op {+ - * /} {
   set cmd($op) [string map "@ $op" {swap; push [expr {[pop] @ [pop]}]}]
}

#-- And here's some more hand-crafted bytecode implementations
set cmd(d) {push [lindex $::S end]} ;# dup
set cmd(q) {push [expr {sqrt([pop])}]}
set cmd(^) {push [swap; expr {pow([pop],[pop])}]}
set cmd(s) swap

#-- The stack routines imply a global stack ::S, for simplicity
interp alias {} push {} lappend ::S
proc pop {}  {K [lindex $::S end] [set ::S [lrange $::S 0 end-1]]}
proc K {a b} {set a}
proc swap {} {push [pop] [pop]}

Instead of enumerating all possible bytecode combinations beforehand (which grows exponentially by alphabet and word length), I use this code from Mapping words to integers to step over their sequence, uniquely indexed by an increasing integer. This is something like the Goedel number of the corresponding code. Note that with this mapping, all valid programs (bytecode sequences) correspond to one unique non-negative integer, and longer programs have higher integers associated:

proc int2word {int alphabet} {
   set word ""
   set la [llength $alphabet]
   while {$int > 0} {
       incr int -1
       set word  [lindex $alphabet [expr {$int % $la}]]$word
       set int   [expr {$int/$la}]
   }
   set word
}

Now out for discovery! The toplevel proc takes a paired list of inputs and expected output. It tries in brute force all programs up to the specified maximum Goedel number and returns the first one that complies with all tests:

proc discover0 args {
   set alphabet [lsort [array names ::cmd]]
   for {set i 1} {$i<10000} {incr i} {
       set code [int2word $i $alphabet]
       set failed 0
       foreach {inputs output} $args {
           catch {ebc $code $inputs} res
           if {$res != $output} {incr failed; break}
       }
       if {!$failed} {return $code}
   }
}

But iterating over many words is still pretty slow, at least on my 200 MHz box, and many useless "programs" are tried. For instance, if the test has two inputs and wants one output, the stack balance is -1 (one less out than in). This is provided e.g. by one the binary operators +-*/. But the program "dd" (which just duplicates the top of stack twice) has a stack balance of +2, and hence can never pass the example test. So, on a morning dogwalk, I thought out this strategy:

  • measure the stack balance for each bytecode
  • iterate once over very many possible programs, computing their stack balance
  • partition them (put into distinct subsets) by stack balance
  • perform each 'discovery' call only on programs of matching stack balance

Here's this version. Single bytecodes are executed, only to measure their effect on the stack. The balance of longer programs can be computed by just adding the balances of their individual bytecodes:

proc bc'stack'balance bc {
   set stack {1 2} ;# a bytecode will consume at most two elements
   expr {[llength [ebc $bc $stack]]-[llength $stack]}
}
proc stack'balance code {
   set res 0
   foreach bc [split $code ""] {incr res $::balance($bc)}
   set res
}

The partitioning will run for some seconds (depending on nmax — I tried with several ten thousand), but it's needed only once. The size of partitions is further reduced by excluding programs which contain redundant code, that will have no effect, like swapping the stack twice, or swapping before an addition or multiplication. A program without such extravaganzas is shorter and yet does the same job, so it will have been tested earlier anyway.

proc partition'programs nmax {
   global cmd partitions balance
   #-- make a table of bytecode stack balances
   set alphabet [array names cmd]
   foreach bc $alphabet {
       set balance($bc) [bc'stack'balance $bc]
   }
   array unset partitions ;# for repeated sourcing
   for {set i 1} {$i<=$nmax} {incr i} {
       set program [int2word $i $alphabet]
       #-- "peephole optimizer" - suppress code with redundancies
       set ok 1
       foreach sequence {ss s+ s*} {
           if {[string first $sequence $program]>=0} {set ok 0}
       }
       if {$ok} {
           lappend partitions([stack'balance $program]) $program
       }
   }
   set program ;# see how far we got
}

The discoverer, Second Edition, determines the stack balance of the first text, and tests only those programs of the same partition:

proc discover args {
   global partitions
   foreach {in out} $args break
   set balance [expr {[llength $out]-[llength $in]}]
   foreach code $partitions($balance) {
       set failed 0
       foreach {input output} $args {
           catch {ebc $code $input} res
           if {$res != $output} {incr failed; break}
       }
       if {!$failed} {return $code}
   }
}

But now for the trying. The partitioning helps very much in reducing the number of candidates. For the 1000 programs with Goedel numbers 1..1000, it retains only a fraction for each stack balance:

-2: 75 
-1: 155 (this and 0 will be the most frequently used) 
0: 241 
1: 274 
2: 155 
3: 100

Simple starter — discover the successor function (add one):

% discover 5 6  7 8
dd/+

Not bad: duplicate the number twice, divide by itself to get the constant 1, and add that to the original number. However, it fails to work if we add the successor of 0 as another test case:

% discover 5 6  7 8  0 1

Nothing coming — because zero division made the last test fail. If we give only this test, another solution is found:

% discover 0 1
d^

"Take x to the x-th" power" — pow(0,0) gives indeed 1, but that's not the generic successor function.

More experiments to discover the hypot() function:

% discover {4 3} 5
d/+

Hm — the 3 is duplicated, divided by itself (=1), which is added to 4. Try to swap the inputs:

% discover {3 4} 5
q+

Another dirty trick: get square root of 4, add to 3 — presto, 5. The correct hypot() function would be

d*sd*+q

but my program set (nmax=30000) ends at 5-byte codes, so even by giving another test to force discovery of the real thing, it would never reach a 7-byte code. OK, I bite the bullet, set nmax to 500000, wait 5 minutes for the partitioning, and then:

% discover {3 4} 5  {11 60}  61
sd/+

Hm.. cheap trick again — it was discovered that the solution is just the successor of the second argument. Like in real life, test cases have to be carefully chosen. So I tried with another a^2+b^2=c^2 set, and HEUREKA! (after 286 seconds):

% discover {3 4} 5  {8 15} 17
d*sd*+q

After partitioning, 54005 programs had the -1 stack balance, and the correct result was on position 48393 in that list...

And finally, with the half-million set of programs, here's a solution for the successor function too:

% discover  0 1  4711 4712
ddd-^+

"d-" subtracts top of stack from itself, pushing 0; the second duplicate to the 0-th power gives 1, which is added to the original argument. After some head-scratching, I find it plausible, and possibly it is even the simplest possible solution, given the poorness of this RPN language.

Lessons learned:

  • Brute force is simple, but may demand very much patience (or faster hardware)
  • The sky, not the skull is the limit what all we can do with Tcl :)

Object orientation

edit

OO (Object Orientation) is a style in programming languages popular since Smalltalk, and especially C++, Java, etc. For Tcl, there have been several OO extensions/frameworks (incr Tcl, XOTcl, stooop, Snit to name a few) in different flavors, but none can be considered as standard followed by a majority of users. However, most of these share the features

  • classes can be defined, with variables and methods
  • objects are created as instances of a class
  • objects are called with messages to perform a method

Of course, there are some who say: "Advocating object-orientated programming is like advocating pants-oriented clothing: it covers your behind, but often doesn't fit best" ...

Bare-bones OO

edit

Quite a bunch of what is called OO can be done in pure Tcl without a "framework", only that the code might look clumsy and distracting. Just choose how to implement instance variables:

  • in global variables or namespaces
  • or just as parts of a transparent value, with TOOT

The task of frameworks, be they written in Tcl or C, is just to hide away gorey details of the implementation — in other words, sugar it :) On the other hand, one understands a clockwork best when it's outside the clock, and all parts are visible — so to get a good understanding of OO, it might be most instructive to look at a simple implementation.

As an example, here's a Stack class with push and pop methods, and an instance variable s — a list that holds the stack's contents:

namespace eval Stack {set n 0}

proc Stack::Stack {} { #-- constructor
  variable n
  set instance [namespace current]::[incr n]
  namespace eval $instance {variable s {}}
  interp alias {} $instance {} ::Stack::do $instance
}

The interp alias makes sure that calling the object's name, like

::Stack::1 push hello

is understood and rerouted as a call to the dispatcher below:

::Stack::do ::Stack::1 push hello

The dispatcher imports the object's variables (only s here) into local scope, and then switches on the method name:

proc Stack::do {self method args} { #-- Dispatcher with methods
  upvar #0 ${self}::s s
  switch -- $method {
      push {eval lappend s $args}
      pop  {
          if ![llength $s] {error "stack underflow"}
          K [lindex $s end] [set s [lrange $s 0 end-1]]
      }
      default {error "unknown method $method"}
  }
}
proc K {a b} {set a}

A framework would just have to make sure that the above code is functionally equivalent to, e.g. (in a fantasy OO style):

class Stack {
   variable s {}
   method push args {eval lappend s $args}
   method pop {} {
          if ![llength $s] {error "stack underflow"}
          K [lindex $s end] [set s [lrange $s 0 end-1]]
   }
}

which, I admit, reads definitely better. But bare-bones has its advantages too: in order to see how a clockwork works, you'd better have all parts visible :)

Now testing in an interactive tclsh:

% set s [Stack::Stack] ;#-- constructor
::Stack::1             ;#-- returns the generated instance name

% $s push hello
hello
% $s push world
hello world

% $s pop
world
% $s pop
hello
% $s pop
stack underflow       ;#-- clear enough error message

% namespace delete $s ;#-- "destructor"

TOOT: transparent OO for Tcl

edit

Transparent OO for Tcl, or TOOT for short, is a very amazing combination of Tcl's concept of transparent values, and the power of OO concepts. In TOOT, the values of objects are represented as a list of length 3: the class name (so much for "runtime type information" :-), a "|" as separator and indicator, and the values of the object, e.g.

{class | {values of the object}}

Here's my little take on toot in a nutshell. Classes in C++ started out as structs, so I take a minimal struct as example, with generic get and set methods. We will export the get and set methods:

namespace eval toot {namespace export get set}

proc toot::struct {name members} {
   namespace eval $name {namespace import -force ::toot::*}
   #-- membership information is kept in an alias:
   interp alias {} ${name}::@ {} lsearch $members
}

The two generic accessor functions will be inherited by "struct"s

proc toot::get {class value member} {
   lindex $value [${class}::@ $member]
}

The set method does not change the instance (it couldn't, as it sees it only "by value") — it just returns the new composite toot object, for the caller to do with it what he wants:

proc toot::set {class value member newval} {
   ::set pos [${class}::@ $member]
   list $class | [lreplace $value $pos $pos $newval]
}

For the whole thing to work, here's a simple overloading of unknown — see "Let unknown know". It augments the current unknown code, at the top, with a handler for

{class | values} method args

patterns, which converts it to the form

::toot::(class)::(method) (class) (values) (args)

and returns the result of calling that form:

proc know what {proc unknown args $what\n[info body unknown]}

Now to use it (I admit the code is no easy reading):

know {
   set first [lindex $args 0]
   if {[llength $first]==3 && [lindex $first 1] eq "|"} {
       set class [lindex $first 0]
       return [eval ::toot::${class}::[lindex $args 1] \
           $class [list [lindex $first 2]] [lrange $args 2 end]]
   }
}

Testing: we define a "struct" named foo, with two obvious members:

toot::struct foo {bar grill}

Create an instance as pure string value:

set x {foo | {hello world}}
puts [$x get bar] ;# -> hello (value of the "bar" member)

Modify part of the foo, and assign it to another variale:

set y [$x set grill again]
puts $y ;# -> foo | {hello again}

Struct-specific methods can be just procs in the right namespace. The first and second arguments are the class (disregarded here, as the dash shows) and the value, the rest is up to the coder. This silly example demonstrates member access and some string manipulation:

proc toot::foo::upcase {- values which string} {
   string toupper [lindex $values [@ $which]]$string
}

puts [$y upcase grill !] ;# -> AGAIN!

A little deterministic Turing machine

edit

At university, I never learned much about Turing machines. Only decades later, a hint in the Tcl chatroom pointed me to http://csc.smsu.edu/~shade/333/project.txt , an assignment to implement a Deterministic Turing Machine (i.e. one with at most one rule per state and input character), which gives clear instructions and two test cases for input and output, so I decided to try my hand in Tcl.

Rules in this little challenge are of the form a bcD e, where

  • a is the state in which they can be applied
  • b is the character that must be read from tape if this rule is to apply
  • c is the character to write to the tape
  • D is the direction to move the tape after writing (R(ight) or L(eft))
  • e is the state to transition to after the rule was applied

Here's my naive implementation, which takes the tape just as the string it initially is. I only had to take care that when moving beyond its ends, I had to attach a space (written as _) on that end, and adjust the position pointer when at the beginning. Rules are also taken as strings, whose parts can easily be extracted with string index — as it's used so often here, I alias it to @. }

proc dtm {rules tape} {
   set state 1
   set pos 0
   while 1 {
       set char [@ $tape $pos]
       foreach rule $rules {
           if {[@ $rule 0] eq $state && [@ $rule 2] eq $char} {
               #puts rule:$rule,tape:$tape,pos:$pos,char:$char
               #-- Rewrite tape at head position.
               set tape [string replace $tape $pos $pos [@ $rule 3]]
               #-- Move tape Left or Right as specified in rule.
               incr pos [expr {[@ $rule 4] eq "L"? -1: 1}]
               if {$pos == -1} {
                   set pos 0
                   set tape _$tape
               } elseif {$pos == [string length $tape]} {
                   append tape _
               }
               set state [@ $rule 6]
               break
           }
       }
       if {$state == 0} break
   }
   #-- Highlight the head position on the tape.
   string trim [string replace $tape $pos $pos \[[@ $tape $pos]\]] _
}

interp alias {} @ {} string index

Test data from http://csc.smsu.edu/~shade/333/project.txt

set rules {
   {1 00R 1}
   {2 01L 0}
   {1 __L 2}
   {2 10L 2}
   {2 _1L 0}
   {1 11R 1}
}
set tapes {
   0
   10011
   1111
}
set rules2 {
   {3 _1L 2}
   {1 _1R 2}
   {1 11L 3}
   {2 11R 2}
   {3 11R 0}
   {2 _1L 1}
}
set tapes2 _

Testing:

foreach tape $tapes {puts [dtm $rules $tape]}
puts *
puts [dtm $rules2 $tapes2]

reports the results as wanted in the paper, on stdout:

>tclsh turing.tcl
[_]1
1[0]100
[_]10000
*
1111[1]1

Streams

edit

Streams are a powerful concept in (not only functional) programming. In SICP chapter 3.5, streams are introduced as data structures characterized as "delayed lists", whose elements are produced and returned only on demand (deferred evaluation). This way, a stream can promise to be a potentially endless source of data, while taking only finite time to process and deliver what's really wanted. Other streams may provide a finite but very large number of elements, which would be impractical to process in one go. In Tcl, the two ways of reading a file are a good example:

  • read $fp returns the whole contents, which then can be processed;
  • while {[gets $fp line]>-1} {...} reads line by line, interleaved with processing

The second construct may be less efficient, but is robust for gigabyte-sized files. A simpler example is pipes in Unix/DOS (use TYPE for cat there):

cat foo.bar | more

where the "cat" delivers lines of the file as long as "more" will take them, and waits otherwise (after all, stdin and stdout are just streams...). Such process chains can be emulated in Tcl with the following rules:

A stream is modelled here as a procedure that returns one stream item on each call. The special item "" (the empty string) indicates that the stream is exhausted. Streams are interesting if they don't deliver the same result on every call, which requires them to maintain state between calls e.g. in static variables (here implemented with the fancy remember proc) — examples are intgen that delivers ever increasing integers, or gets $fp where the file pointer advances at each call, so potentially all lines of the file are returned over time.

A filter takes one or more streams, and possibly other arguments, and reacts like a stream too. Hence, streams can be (and typically are) nested for processing purposes. If a filter meets end-of-stream, it should return that too. Filters may be characterized as "selectors" (who may return only part of their input, like "grep") and/or "appliers" who call a command on their input and return the result. Note that on infinite streams, selectors may never return, e.g. if you want the second even prime... Streams in general should not be written in brackets (then the Tcl parser would eagerly evaluate them before evaluating the command), but braced, and stream consumers eval the stream at their discretion.

Before we start, a word of warning: maintaining state of a procedure is done with default arguments that may be rewritten. To prevent bugs from procedures whose defaults have changed, I've come up with the following simple architecture — procs with static variables are registered as "sproc"s, which remembers the initial defaults, and with a reset command you can restore the initial values for one or all sprocs:

proc sproc {name head body} {
   set ::sproc($name) $head
   proc $name $head $body
}

proc reset { {what *}} {
   foreach name [array names ::sproc $what] {
       proc $name $::sproc($name) [info body $name]
   }
}

Now let's start with a simple stream source, "cat", which as a wrapper for gets returns the lines of a file one by one until exhausted (EOF), in which case an empty string is returned (this requires that empty lines in the files, which would look similarly, are represented as a single blank):

sproc cat {filename {fp {}} } {
   if {$fp==""} {
       remember fp [set fp [open $filename]]
   }
   if {[gets $fp res]<0} {
       remember fp [close $fp] ;# which returns an empty string ;-)
   } elseif {$res==""} {set res " "} ;# not end of stream!
   set res
}

proc remember {argn value} {
   # - rewrite a proc's default arg with given value
   set procn [lindex [info level -1] 0] ;# caller's name
   set argl {}
   foreach arg [info args $procn] {
       if [info default $procn $arg default] {
           if {$arg==$argn} {set default $value}
           lappend argl [list $arg $default]
       } else {
           lappend argl $arg
       }
   }
   proc $procn $argl [info body $procn]
   set value
}
# This simple but infinite stream source produces all positive integers:
sproc intgen { {seed -1}} {remember seed [incr seed]}

# This produces all (well, very many) powers of 2:
sproc powers-of-2 { {x 0.5}} {remember x [expr $x*2]}

# A filter that reads and displays a stream until user stops it:
proc more {stream} {
   while 1 {
       set res [eval $stream]
       if {$res==""} break ;# encountered end of stream
       puts -nonewline $res; flush stdout
       if {[gets stdin]=="q"} break
   }
}

Usage example:

more {cat streams.tcl}

which crudely emulates the Unix/DOS pipe mentioned above (you'll have to hit ↵ Enter after every line, and q↵ Enter to quit..). more is the most important "end-user" of streams, especially if they are infinite. Note however that you need stdin for this implementation, which excludes wishes on Windows (one might easily write a UI-more that reacts on mouse clicks, though).

A more generic filter takes a condition and a stream, and on each call returns an element of the input stream where the condition holds — if ever one comes along:

proc filter {cond stream} {
   while 1 {
       set res [eval $stream]
       if {$res=="" || [$cond $res]} break
   }
   set res
}

# Here is a sample usage with famous name:
proc grep {re stream} {
   filter [lambda [list x [list re $re]] {regexp $re $x}] $stream
}

#.... which uses the (less) famous function maker:
proc lambda {args body} {
   set name [info level 0]
   proc $name $args $body
   set name
}
# Usage example: more {grep this {cat streams.tcl}}

Friends of syntactic sugar might prefer shell style:

$ cat streams.tcl | grep this | more

and guess what, we can have that in Tcl too (and not in Scheme !-), by writing a proc, that also resets all sprocs, with the fancy name "$" (in Unix, this could be the shell prompt that you don't type, but for Tcl we always have to have the command name as first word):

proc $ args {
    reset
    set cmd {}
    foreach arg $args {
       if {$arg != "|"} {
           lappend tmp $arg
       } else {
           set cmd [expr {$cmd==""? $tmp: [lappend tmp $cmd]}]
           set tmp {}
       }
   }
   uplevel 1 [lappend tmp $cmd]
}

To prove that we haven't cheated by using exec, let's introduce a line counter filter:

sproc -n {stream {n 0}} {
   set res [eval $stream]
   if {$res!=""} {set res [remember n [incr n]]:$res}
}

This can be added to filter chains, to count lines in the original file, or only the results from grep:

$ cat streams.tcl | -n | grep this | more
$ cat streams.tcl | grep this | -n | more

We further observe that more has a similar structure to filter, so we could also rewrite it in terms of that:

proc more2 stream {
   filter [lambda x {
       puts -nonewline $x; flush stdout
       expr {[gets stdin]=="q"}
   }] $stream
}

# Here is another stream producer that returns elements from a list:
sproc streamlist {list {todo {}} {firstTime 1} } {
   if $firstTime {set todo $list; remember firstTime 0}
   remember todo [lrange $todo 1 end]
   lindex $todo 0
}

# This one repeats its list endlessly, so better use it with 'more':
sproc infinite-streamlist {list {todo {}} } {
   initially todo $list
   remember  todo [lrange $todo 1 end]
   lindex   $todo 0
}

# This is sugar for first-time assignment of static variables:
proc initially {varName value} {
   upvar 1 $varName var
   if {$var==""} {set var $value}
}

# But for a simple constant stream source, just use [subst]:
# more {subst 1} ;# will produce as many ones as you wish

# This filter collects its input (should be finite ;-) into a list:
proc collect stream {
   set res {}
   while 1 {
       set element [eval $stream]
       if {$element==""} break
       lappend res $element
   }
   set res
}

The sort filter is unusual in that it consumes its whole (finite!) input, sorts it, and acts as a stream source on the output:

sproc sort {stream {todo {}} {firstTime 1}} {
   if $firstTime {
       set todo [lsort [collect $stream]]
       remember firstTime 0
   }
   remember todo [lrange $todo 1 end]
   lindex $todo 0
}
# $ streamlist {foo bar grill a} | sort | collect => a bar foo grill

proc apply {f stream} {$f [eval $stream]}

#... This can be plugged into a filter chain to see what's going on:
proc observe stream {apply [lambda y {puts $y; set y}] $stream}

# ... or, to get a stream of even numbers, starting from 0:
more {apply [lambda x {expr $x*2}] intgen}

Now for the example in SICP: find the second prime in the interval between 10000 and 1000000.

sproc interval {from to {current {}} } {
   initially current $from
   if {$current<=$to} {
       remember current [expr $current+1]
   }
}
proc prime? x {
   if {$x<2} {return 0}
   set max [expr sqrt($x)]
   set try 2
   while {$try<=$max} {
       if {$x%$try == 0} {return 0}
       incr try [expr {2-($try==2)}]
   }
   return 1
}
proc stream-index {stream index} {
   for {set i 0} {$i<=$index} {incr i} {
       set res [eval $stream]
   }
   set res
}
sproc stream-range {stream from to {pos 0}} {
   while {$pos<$from} {
       set res [eval $stream] ;# ignore elements before 'from'
       if {$res==""} return   ;# might be end-of-stream
       incr pos
   }
   if {$to!="end" && $pos > $to} return
   remember pos [incr pos]
   eval $stream
}

stream-index {filter prime? {interval 10000 1000000}} 1 ==> 10009

Another idea from SICP is a "smoothing" function, that averages each pair of values from the input stream. For this we need to introduce a short-term memory also in the filter:

sproc average {stream {previous {}} } {
   if {$previous==""} {set previous [eval $stream]}
   remember previous [set current [eval $stream]]
   if {$current!=""} {expr {($previous+$current)/2.}}
}

which, tested on a n-element stream, returns n-1 averages:

collect {average {streamlist {1 2 3 4 5}}} ==> 1.5 2.5 3.5 4.5

Yet another challenge was to produce an infinite stream of pairs {i j} of positive integers, i <= j, ordered by their sum, so that more pairs produces consecutively

{1 1} {1 2} {1 3} {2 2} {1 4} {2 3} {1 5} {2 4} {3 3} {1 6} ...

Here's my solution which does that:

sproc pairs { {last {}} } {
   if {$last==""} {
       set last [list 1 1] ;# start of iteration
   } else {
       foreach {a b} $last break
       if {$a >= $b-1} {
           set last [list 1 [expr {$a+$b}]] ;# next sum level
       } else {
           set last [list [incr a] [incr b -1]]
       }
   }
   remember last $last
}

Ramanujan numbers: The pairs generator can be used to find Ramanujan numbers, which can be represented as the sum of two integer cubes in more than one way. Here I use a global array for recording results:

sproc Ramanujan {stream {firstTime 1}} {
   if $firstTime {unset ::A; remember firstTime 0}
   while 1 {
       set pair [eval $stream]
       foreach {a b} $pair break
       set n [expr {$a*$a*$a + $b*$b*$b}]
       if [info exists ::A($n)] {
           lappend ::A($n) $pair
           break
       } else {set ::A($n) [list $pair]}
   }
   list $n $::A($n)
}

more {Ramanujan pairs} ;# or: $ pairs | Ramanujan | more

delivers in hardly noticeable time the R. numbers 1729, 4104, 13832... Or, how's this infinite Fibonacchi number generator, which on more fibo produces all the F.numbers (0,1,1,2,3,5,8,13,21...) you might want?

sproc fibo { {a ""} {b ""}} {
   if {$a==""} {
       remember a 0
   } elseif {$b==""} {
       remember b 1
   } else {
       if {$b > 1<<30} {set b [expr double($b)]}
       remember a $b
       remember b [expr $a+$b]
   }
}

Discussion: With the above code, it was possible to reproduce quite some behavior of streams as documented in SICP, not as data structures but with Tcl procs (though procs are data too, in some sense...). What's missing is the capability to randomly address parts of a stream, as is possible in Scheme (and of course their claim to do without assignment, or mutable data...) Tcl lists just don't follow LISP's CAR/CDR model (though KBK demonstrated in Tcl and LISP that this structure can be emulated, also with procs), but rather C's flat *TclObject[] style. The absence of lexical scoping also led to constructs like sproc/reset, which stop a gap but aren't exactly elegant — but Tcl's clear line between either local or global variables allows something like closures only by rewriting default arguments like done in remember (or like in Python).

Don't take this as a fundamental critique of Tcl, though — its underlying model is far more simple and elegant than LISP's (what with "special forms", "reader macros"...), and yet powerful enough to do just about everything possible...

Playing with Laws of Form

edit

After many years, I re-read

G. Spencer-Brown, "Laws of Form". New York: E.P. Dutton 1979

which is sort of a mathematical thriller, if you will. Bertrand Russell commented that the author "has revealed a new calculus, of great power and simplicity" (somehow sounds like Tcl ;^). In a very radical simplification, a whole world is built up by two operators, juxtaposition without visible symbol (which could be likened to or) and a overbar-hook (with the meaning of not) that I can't type here — it's a horizontal stroke over zero or more operands, continued at right by a vertical stroke going down to the baseline. In these Tcl experiments, I use "" for "" and angle-brackets <> for the overbar-hook (with zero or more operands in between).

One point that was new for me is that the distinction between operators and operands is not cast in stone. Especially constants (like "true" and "false" in Boolean algebras) can be equally well expressed as neutral elements of operators, if these are considered variadic, and having zero arguments. This makes sense, even in Tcl, where one might implement them as

proc and args {
   foreach arg $args {if {![uplevel 1 expr $arg]} {return 0}}
   return 1
}

proc or args {
   foreach arg $args {if {[uplevel 1 expr $arg]} {return 1}}
   return 0
}

which, when called with no arguments, return 1 or 0, respectively. So [or] == 0 and [and] == 1. In Spencer-Brown's terms, [] (which is "", the empty string with no arguments) is false ("nil" in LISP), and [<>] is the negation of "", i.e. true. His two axioms are:

<><> == <> "to recall is to call       -- (1 || 1) == 1"
<<>> ==    "to recross is not to cross -- !!0 == 0"

and these can be implemented by a string map that is repeated as long as it makes any difference (sort of a trampoline) to simplify any expression consisting only of operators and constants (which are operators with zero arguments):

proc lf'simplify expression {
   while 1 {
       set res [string map {<><> <> <<>> ""} $expression]
       if {$res eq $expression} {return $res}
       set expression $res
   }
}

Testing:

% lf'simplify <<><>><>
<>

which maps <><> to <>, <<>> to "", and returns <> for "true".

% lf'simplify <a>a
<a>a

In the algebra introduced here, with a variable "a", no further simplification was so far possible. Let's change that — "a" can have only two values, "" or <>, so we might try to solve the expression by assuming all possible values for a, and see if they differ. If they don't, we have found a fact that isn't dependent on the variable's value, and the resulting constant is returned, otherwise the unsolved expression:

proc lf'solve {expression var} {
   set results {}
   foreach value {"" <>} {
       set res [lf'simplify [string map [list $var $value] $expression]]
       if {![in $results $res]} {lappend results $res}
       if {[llength $results] > 1} {return $expression}
   }
   set results
}

with a helper function in that reports containment of an element in a list:

proc in {list element} {expr {[lsearch -exact $list $element] >= 0}}

Testing:

% lf'solve <a>a a
<>

which means, in expr terms, {(!$a || $a) == 1}, for all values of a. In other words, a tautology. All of Boole's algebra can be expressed in this calculus:

* (1) not a       == !$a       == <a>
* (2) a or b      == $a || $b  == ab
* (3) a and b     == $a && $b  == <<a>&lt;b&gt;>
* (4) a implies b == $a <= $b  == <a>b

We can test it with the classic "ex contradictione quodlibet" (ECQ) example — "if p and not p, then q" for any q:

% lf'solve <&lt;p><&lt;p>>>q p
q

So formally, q is true, whatever it is :) If this sounds overly theoretic, here's a tricky practical example in puzzle solving, Lewis Carroll's last sorites (pp. 123f.). The task is to conclude something from the following premises:

  • The only animals in this house are cats
  • Every animal is suitable for a pet, that loves to gaze at the moon
  • When I detest an animal, I avoid it
  • No animals are carnivorous, unless they prowl at night
  • No cat fail to kill mice
  • No animals ever take to me, except what are in this house
  • Kangaroos are not suitable for pets
  • None but carnivora kill mice
  • I detest animals that do not take to me
  • Animals that prowl at night always love to gaze at the moon

These are encoded to the following one-letter predicates:

a
avoided by me
c
cat
d
detested by me
h
house, in this
k
kill mice
m
moon, love to gaze at
n
night, prowl at
p
pet, suitable for
r
(kanga)roo
t
take to me
v
(carni)vorous

So the problem set can be restated, in Spencer-Brown's terms, as

<h>c <m>p <d>a <v>n <c>k <t>h <r><p> <k>v td <n>m

I first don't understand why all premises can be just written in a row, which amounts to implicit "or", but it seems to work out well. As we've seen that <x>x is true for any x, we can cancel out such tautologies. For this, we reformat the expression to a list of values of type x or !x, that is in turn dumped into a local array for existence checking. And when both x and !x exist, they are removed from the expression:

proc lf'cancel expression {
   set e2 [string map {"< " ! "> " ""} [split $expression ""]]
   foreach term $e2 {if {$term ne ""} {set a($term) ""}}
   foreach var [array names a ?] {
       if [info exists a(!$var)] {
           set expression [string map [list <$var> "" $var ""] $expression]
       }
   }
   set expression
}

puts [lf'cancel {<h>c <m>p <d>a <v>n <c>k <t>h <r>&lt;p> <k>v td <n>m}]

which results in:

  • a <r>

translated back: "I avoid it, or it's not a kangaroo", or, reordered, "<r> a" which by (4) means, "All kangaroos are avoided by me".

A little IRC chat bot

edit

Here is a simple example of a "chat bot" — a program that listens on an IRC chatroom, and sometimes also says something, according to its programming. The following script

  • connects to channel #tcl on IRC
  • listens to what is said
  • if someone mentions its name (minibot), tries to parse the message and answer.
#!/usr/bin/env tclsh
set ::server irc.freenode.org
set ::chan   #tcl
set ::me     minibot
proc recv {} {
    gets $::fd line
    puts $line
    # handle PING messages from server
    if {[lindex [split $line] 0] eq "PING"} {
       send "PONG [info hostname] [lindex [split $line] 1]"; return
    }
    if {[regexp {:([^!]*)![^ ].* +PRIVMSG ([^ :]+) +(.*[Mm]inibot)(.+)} $line -> \
        nick target msg cmd]} {
           if {$nick eq "ijchain"} {regexp {<([^>]+)>(.+)} $msg -> nick msg}
           set hit 0
           foreach pattern [array names ::patterns] {
               if [string match "*$pattern*" $cmd] {
                   set cmd [string trim $cmd {.,:? }]
                   if [catch {mini eval $::patterns($pattern) $cmd} res] {
                       set res $::errorInfo
                   }
                   foreach line [split $res \n] {
                       send "PRIVMSG $::chan :$line"
                   }
                   incr hit
                   break
               }
           }
           if !$hit {send "PRIVMSG $::chan :Sorry, no idea."}
    }
}

#----------- Patterns for response:

set patterns(time) {clock format [clock sec] ;#}
set patterns(expr) safeexpr
proc safeexpr args {expr [string map {\[ ( \] ) expr ""} $args]}
set patterns(eggdrop) {set _ "Please check http://wiki.tcl.tk/6601" ;#}
set patterns(toupper) string
set patterns(Windows) {set _ "I'd prefer not to discuss Windows..." ;#}
set {patterns(translate "good" to Russian)} {set _ \u0425\u043E\u0440\u043E\u0448\u043E ;#}
set patterns(Beijing) {set _ \u5317\u4EAC ;#}
set patterns(Tokyo) {set _ \u4E1C\u4EAC ;#}
set {patterns(your Wiki page)} {set _ http://wiki.tcl.tk/20205 ;#}
set patterns(zzz) {set _ "zzz well!" ;#}
set patterns(man) safeman
proc safeman args {return http://www.tcl.tk/man/tcl8.4/TclCmd/[lindex $args 1].htm}
set {patterns(where can I read about)} gotowiki
proc gotowiki args {return "Try http://wiki.tcl.tk/[lindex $args end]"}
set patterns(thank) {set _ "You're welcome." ;#}
set patterns(worry) worry
proc worry args {
   return "Why do [string map {I you my your your my you me} $args]?"
}

#-- let the show begin... :^)
interp create -safe mini
foreach i {safeexpr safeman gotowiki worry} {
    interp alias mini $i {} $i
}
proc in {list element} {expr {[lsearch -exact $list $element]>=0}}
proc send str {puts $::fd $str;flush $::fd}

set ::fd [socket $::server 6667]
fconfigure $fd  -encoding utf-8
send "NICK minibot"
send "USER $::me 0 * :PicoIRC user"
send "JOIN $::chan"
fileevent $::fd readable recv

vwait forever

Examples from the chat:

suchenwi  minibot, which is your Wiki page?
<minibot> http://wiki.tcl.tk/20205
suchenwi  ah, thanks
suchenwi  minibot expr 6*7
<minibot> 42
suchenwi  minibot, what's your local time?
<minibot> Sun Oct 21 01:26:59 (MEZ) - Mitteleurop. Sommerzeit 2007

Tk: the cross-platform GUI toolkit

edit

Introduction

edit

The Tk (Tool Kit) is the most popular Tcl extension for designing graphical user interfaces (GUI) on Macintosh, Unix/Linux, or Windows operating systems.

With little effort, it allows to put together useful

  • windows (forms, as some call them) consisting of
  • widgets, which are managed by
  • geometry managers. Also, you can easily define
  • bindings of mouse or keyboard events to trigger the actions you want.

Example: calculator

edit

Here is a very simple, complete Tcl/Tk script that implements a calculator:

package require Tk
pack [entry .e -textvar e -width 50]
bind .e <Return> {
   set e  [regsub { *=.*} $e ""] ;# remove evaluation (Chris)
   catch {expr [string map {/ *1./} $e]} res
   append e " = $res"
} 

It creates an entry widget named .e, into which you can type from the keyboard, with an associated variable e (which will mirror the entry's content), and manages it with pack.

One binding is defined: if with keyboard focus on .e, the <Return> key is hit, then

  • all division operators (/) in the variable e are mapped to "*1./" (this forces floating point division),
  • the resulting string is fed to expr to evaluate it as an arithmetic/logic expression
  • as there may be errors in the user input, the expr call is wrapped into a catch which assigns either the result of expr, or the error message if one occurred, into the variable res
  • the result of the last evaluation is cleared by deleting everything after =
  • finally, an equal sign and the value of the res variable are appended to e, making input and result immediately visible in the entry.

Example: a tiny IRC client

edit

As before in the Tcl section, here's a working little script again: a client for IRC (Internet Relay Chat) in 38 lines of code, that features a text and an entry widget:

 

package require Tk
set ::server irc.freenode.org
set ::chan   #tcl
set ::me     $tcl_platform(user)
text .t -height 30 -wrap word -font {Arial 9}
.t tag config bold   -font [linsert [.t cget -font] end bold]
.t tag config italic -font [linsert [.t cget -font] end italic]
.t tag config blue   -foreground blue
entry .cmd
pack .cmd -side bottom -fill x
pack .t -fill both -expand 1
bind .cmd <Return> post
proc recv {} {
    gets $::fd line
    if {[regexp {:([^!]*)![^ ].* +PRIVMSG ([^ :]+) +:(.*)} $line -> \
        nick target msg]} {
        set tag ""
        if [regexp {\001ACTION(.+)\001} $msg -> msg] {set tag italic}
        if [in {azbridge ijchain} $nick] {regexp {<([^>]+)>(.+)} $msg -> nick msg}
       .t insert end $nick\t bold $msg\n $tag
    } else {.t insert end $line\n italic}
    .t yview end
}
proc in {list element} {expr {[lsearch -exact $list $element]>=0}}
proc post {} {
    set msg [.cmd get]
    if [regexp {^/me (.+)} $msg -> action] {set msg "\001ACTION $action\001"}
    foreach line [split $msg \n] {send "PRIVMSG $::chan :$line"}
    .cmd delete 0 end
    set tag ""
    if [regexp {\001ACTION(.+)\001} $msg -> msg] {set tag italic}
    .t insert end $::me\t {bold blue} $msg\n [list blue $tag]
    .t yview end
}
proc send str {puts $::fd $str; flush $::fd}
set ::fd [socket $::server 6667]
send "NICK $::me"
send "USER $::me 0 * :PicoIRC user"
send "JOIN $::chan"
fileevent $::fd readable recv
bind . <Escape> {exec wish $argv0 &; exit}

The last line is a quick debugging helper: if you modified the script in the editor, save to disk, then hit <Escape> in the app to start it anew.

Widgets

edit

Widgets are GUI objects that, when mapped by a geometry manager, correspond to a rectangular area on the screen, with different properties and capabilities.

Widgets are named with path names, somewhat resembling file system path names, except that the separator for widgets is a period ".". For example, .foo.bar is a child of foo which is a child of "." Parent-child relation occurs typically with containers, e.g. toplevel, frame, canvas, text. The name parts (i.e. between the dots) can be almost any string that does not contain "." of course, or starts with a Capital Letter (this is because widget class names with capital initial have a special meaning, they denote class-wide options).

If you have a proc that takes a parent widget w and adds children, it may be important to distinguish whether w is "." or another - because you can't concatenate a child widget name directly to "." - $w.kid will make an invalid name if $w == ".". Here's how to avoid this:

set w2 [expr {$w eq "."? "": $w}]
button $w2.$kid ...

Widgets are created by a command named after their class, which then takes the path name (requiring that parent widgets have to exist by that time) and any number of -key value options that make up the original configuration, e.g.

button .b -text "Hello!" -command {do something}

After creation, communication with the widget goes via a created command that corresponds to its name. For instance, the configuration options can be queried

set text [.b cget -text]

or changed

.b configure -text Goodbye! -background red

Some "methods" like cget, configure are generic to all widget classes, others are specific to one or a few. For instance, both text and entry accept the insert method. See the manual pages for all the details.

Widgets appear on screen only after they have been given to a geometry manager (see more below). Example:

text .t -wrap word
pack .t -fill both -expand 1

As widget creation commands return the pathname, these two steps can also be nested like

pack [text .t -wrap word] -fill both -expand 1

The destroy command deletes a widget and all of its child widgets, if present:

destroy .b

The following sub-chapters describe the widgets available in Tk.


button

edit

With text and/or image, calls a configurable command when clicked. Example:

button .b -text Hello! -command {puts "hello, world"}

canvas

edit

Scrollable graphic surface for line, rectangle, polygon, oval, and text items, as well as bitmaps and photo images and whole embedded windows. See for example "A tiny drawing program" below. Example:

pack [canvas .c -background white]
.c create line 50 50 100 100 150 50 -fill red -width 3
.c create text 100 50 -text Example

Pan a canvas (scroll it inside its window with middle mouse-button held down) by inheriting the text widget bindings:

 bind .c <2> [bind Text <2>]
 bind .c <B2-Motion> [bind Text <B2-Motion>]

entry

edit

One-line editable text field, horizontally scrollable (see example above). You can specify a validation function to constrain what is entered. Example:

entry .e -width 20 -textvariable myVar
set myVar "this text appears in the entry"


frame

edit

Container for several widgets, often used with pack, or for wrapping "megawidgets"

label

edit

One- or multiline field for text display, can be tied to a text variable to update when that variable changes. Linebreaks are specified by \n in the text to be displayed.

labelframe

edit

A container similar to a frame, decorated with a thin rectangle around, and a label at top-left position. Example (a tiny radio band selector):

pack [labelframe .lf -text Band]
pack [radiobutton .lf.am -text AM -variable band -value AM]
pack [radiobutton .lf.fm -text FM -variable band -value FM]
set band AM

 

listbox

edit

Multiline display of a list, scrollable. Single or multiple items can be selected with the mouse.

edit

To add a menu to a GUI application, take steps of three kinds:

  • Create the toplevel horizontal menu (needed only once):
. configure -menu [menu .m]
  • For each item in the top menu, create a cascaded submenu, e.g.
.m add cascade -label File -menu [menu .m.mFile]
  • For each entry in a submenu, add it like this:
.m.mFile add command -label Open -command {openFile ...}
.m.mFile add separator

As these commands are a bit verbose, one can wrap them into a little helper:

proc m+ {head name {cmd ""}} {
   if {![winfo exists .m.m$head]} {
        .m add cascade -label $head -menu [menu .m.m$head -tearoff 0]
   }
   if [regexp ^-+$ $name] {
           .m.m$head add separator
   } else {.m.m$head add command -label $name -comm $cmd}
}

Demo example - now menu items can be created in a much clearer way:

. configure -menu [menu .m]
m+ File Open {.t insert end "opened\n"}
m+ File Save {.t insert end "saved\n"}
m+ File -----
m+ File Exit exit
m+ Edit Cut ...
pack [text .t -wrap word] -fill both -expand 1

radiobutton

edit

A button with a selector field which can be on or off, and a label. Clicking the selector field with the mouse changes the value of an associated global variable. Typically, multiple radiobuttons are tied to the same variable. For examples, see labelframe above.

scrollbar

edit

Horizontal or vertical scrollbars (vertical being the default, for horizontal specify: -orientation horizontal, or -ori hori if you like it short) can be tied to canvas, entry, listbox or text widgets. The interaction between a scrollbar and its scrolled widget goes with callbacks, in which one notifies the other:

  • scrollbar to widget: xview or yview method
  • widget to scrollbar: set method

Arguments to these methods will be automatically added when the callbacks are invoked.

For example, here's how to connect a text widget with a vertical scrollbar:

pack [scrollbar .y -command ".t yview"] -side right -fill y
pack [text .t -wrap word -yscrollc ".y set"] \
     -side right -fill both -expand 1

With most widgets, scrollbars adjust automatically to the widget's contents. For canvas widgets, you need to update the scrollregion after adding new items. most simply like this:

$canvas configure -scrollregion [$canvas bbox all]

text

edit

Scrollable editable multiline text with many formatting options. Can also contain images and embedded widgets. The default wrapping setting is "none", so you might need a horizontal scrollbar to see all of long lines. In many cases it's more user-friendly to configure a text widget as -wrap word - then you only need a vertical scrollbar at most.

Positions in a text widget are specified as line.column, where line starts from 1, and column from 0, so 1.0 is the very first character in a text widget. Example how to delete all contents:

$t delete 1.0 end

For highlighting part of the contents, you can define tags and assign them to subsequences:

$t tag configure ul -underline 1
$t insert end "The next word is " {} underlined ul ", the rest is not."

toplevel

edit

Standalone frame window, mostly with decorations (title bar, buttons) from the window manager. When you start Tk, you receive an initially empty toplevel named "." (one dot). If you want more toplevels, create them like this:

toplevel .mySecondWindow

Such toplevels are logically children of ".". To assign a nice title to a toplevel, use

wm title $toplevel "This is the title"

You can also control the geometry (size and position) of a toplevel with

wm geometry $toplevel ${width}x$height+$x+$y

Geometry managers

edit

The purpose of geometry managers is to compute the required height, width, and location of widgets, and map them on screen. Besides grid, pack and place, the canvas and text widgets can also manage embedded widgets.

Calls to geometry managers always start with the manager's name, then (mostly) one or more widget names, then any number of -name value options

grid

edit

This geometry manager is best suited for tabular window layout consisting of rows and columns. Example, to put three widgets in a horizontal row:

grid .1 .2 .3 -sticky news

The -sticky option indicates what side of its box the widget should stick to, in compass direction; "news" is north-east-west-south", i.e. all four sides.

Here's a minimal solution to display a table (list of lists) in a grid of labels:

 

package require Tk
proc table {w content args} {
    frame $w -bg black
    set r 0
    foreach row $content {
        set fields {}
        set c 0
        foreach col $row {
            lappend fields [label $w.$r/$c -text $col]
            incr c
        }
        eval grid $fields -sticky news -padx 1 -pady 1
        incr r
    }
    set w
}
#--- Test:
table .t {
   {Row Head1 Head2}
   {1   foo   42}
   {2   bar   1234}
   {3   grill testing}
}
pack .t
#--- Changing the contents, given row and column number:
after 2000 .t.3/2 config -text Coucou

pack

edit

This used to be the workhorse manager, but in recent years has been less popular than grid. Anyway, it is still good for cases where you have widgets aligned in only one direction (horizontally or vertically). For more complex layouts, one used to insert intermediate frames, but grid makes such jobs just easier. Example:

pack .x -fill both -expand 1 -side left

place

edit

This geometry manager is not often used, mostly for special applications, like when you want to highlight the current tab of a tab notebook. It allows pixel-precise placement of widgets, but is less dynamic in reaction to resizing of the toplevel or inner widgets.

Dialogs

edit

Dialogs are toplevels that are to tell a message, or answer a question. You don't have to assign a widget path name to them. Just call them as functions and evaluate the result (often "" if the dialog was canceled).

tk_getOpenFile

edit

A file selector dialog (limited to existing files). Returns the selected file with path name, or "" if canceled.

tk_getSaveFile

edit

A file selector dialog, which also allows specification of a not existing file. Returns the selected file with path name, or "" if canceled.

tk_messageBox

edit

A simple dialog that displays a string and can be closed with an "OK" button. Example:

tk_messageBox -message "hello, world!"

tk_chooseColor

edit

Displays a dialog for color selection. The returned value is the selected color in one of the representations conformant to Tcl's comprehension of such information; on Microsoft Windows systems this might constitute a hexadecimal string in the format “#RRGGBB”. Upon abortion of the process, the empty string is instead delivered. The dialog may be configured to preselect a certain default color via the “-initialcolor” option, a subordination into a parent widget with “-parent”, and a title caption through “-title”.

 tk_chooseColor -initialcolor #FF0000 -parent . -title "What tincture do you wish?"

Custom dialogs

edit

Besides the prefabricated dialogs that come with Tk, it's also not too hard to build custom ones. As a very simple example, here's a "value dialog" that prompts the user for to type in a value:

 

proc value_dialog {string} {
   set w [toplevel .[clock seconds]]
   wm resizable $w 0 0
   wm title $w "Value request"
   label  $w.l -text $string
   entry  $w.e -textvar $w -bg white
   bind $w.e <Return> {set done 1}
   button $w.ok     -text OK     -command {set done 1}
   button $w.c      -text Clear  -command "set $w {}"
   button $w.cancel -text Cancel -command "set $w {}; set done 1"
   grid $w.l  -    -        -sticky news
   grid $w.e  -    -        -sticky news
   grid $w.ok $w.c $w.cancel
   vwait done
   destroy $w
   set ::$w
}

Test:

set test [value_dialog "Give me a value please:"]
puts test:$test
pack [ label .l -text "Value: '$test' " ]

For a more elaborate example, here is a record editor dialog (multiple fields, each with a label and entry (or text for multi-line input)):

proc editRecord {title headers fields} {
    set oldfocus [focus]
    set w [toplevel .[clock clicks]]
    wm resizable $w 1 0
    wm title $w $title
    set n 0
    foreach h $headers f $fields {
        if ![regexp {(.+)([=+])} $h -> hdr type] {set hdr $h; set type ""}
        label $w.h$n -text $hdr -anchor ne
        switch -- $type {
            = {label $w.e$n -width [string length $f] -text $f -anchor w -bg white}
            + {[text $w.e$n -width 20 -height 6] insert end $f}
            default {[entry $w.e$n -width [string length $f]] insert end $f}
        }
        grid $w.h$n $w.e$n -sticky news
        incr n
    }
    button $w.ok -text OK -width 5 -command [list set $w 1]
    button $w.cancel -text Cancel -command [list set $w 0]
    grid $w.ok $w.cancel -pady 5
    grid columnconfigure $w 1 -weight 1
    vwait ::$w
    if [set ::$w] { #-- collect the current entry contents
        set n 0
        foreach h $headers f $fields {
            regexp {([^=+].+)([=+]?)} $h -> hdr type
            switch -- $type {
                "" {lappend res [$w.e$n get]}
                =  {lappend res [$w.e$n cget -text]}
                +  {lappend res [$w.e$n get 1.0 end]}
            }
            incr n
        }
    } else {set res {}}
    destroy $w
    unset ::$w ;#-- clean up the vwait variable
    focus $oldfocus
    return $res
}

Quick test:

editRecord Test {foo= bar grill+} {one two three}

Megawidgets made easy

edit

The term "megawidgets" is popular for compound widgets that in themselves contain other widgets, even though they will hardly number a million (what the prefix Mega- suggests), more often the child widgets' number will not exceed ten.

To create a megawidget, one needs one proc with the same signature as Tk widget creation commands. This proc will, when called, create another proc named after the widget, and dispatch methods either to specific handlers, or the generic widget command created by Tk.

A little notebook

edit

Plain Tk does not contain a "notebook" widget, with labeled tabs on top that raise one of the "pages", but it's easy to make one. This example demonstrates how the tabs are implemented as buttons in a frame, and how the original Tk command named like the frame is "overloaded" to accept the additional add and raise methods:

proc notebook {w args} {
   frame $w
   pack [frame $w.top] -side top -fill x -anchor w
   rename $w _$w
   proc $w {cmd args} { #-- overloaded frame command
       set w [lindex [info level 0] 0]
       switch -- $cmd {
           add     {notebook'add   $w $args}
           raise   {notebook'raise $w $args}
           default {eval [linsert $args 0 _$w $cmd]}
       }
   }
   return $w
}
proc notebook'add {w title} {
   set btn [button $w.top.b$title -text $title -command [list $w raise $title]]
   pack $btn -side left -ipadx 5
   set f [frame $w.f$title -relief raised -borderwidth 2]
   pack $f -fill both -expand 1
   $btn invoke
   bind $btn <3> "destroy {$btn}; destroy {$f}" ;# (1)
   return $f
}
proc notebook'raise {w title} {
   foreach i [winfo children $w.top] {$i config -borderwidth 0}
   $w.top.b$title config -borderwidth 1
   set frame $w.f$title
   foreach i [winfo children $w] {
       if {![string match *top $i] && $i ne $frame} {pack forget $i}
   }
   pack $frame -fill both -expand 1
}

Test and demo code:

package require Tk
pack [notebook .n] -fill both -expand 1
set p1 [.n add Text]
pack   [text $p1.t -wrap word] -fill both -expand 1
set p2 [.n add Canvas]
pack   [canvas $p2.c -bg yellow] -fill both -expand 1
set p3 [.n add Options]
pack   [button $p3.1 -text Console -command {console show}]
.n raise Text
wm geometry . 400x300

Binding events

edit

Events within Tcl/Tk include actions performed by the user, such as pressing a key or clicking the mouse. To react to mouse and keyboard activity, the bind command is used. As shown in the calculator example:

 pack [entry .e -textvar e -width 50]
 bind .e <Return> {

The bind keyword operates on .e and associates the event related to the <return> event. The following bracket indicates a start of a set of procedures which are executed when the event is performed.

BWidget

edit

BWidget is an extension to Tk written in pure Tcl (therefore it can even run on Windows Mobile-driven cell phones). It offers mega-widgets (all class names starting with Uppercase) like

  • ComboBox
  • NoteBook
  • Tree
 

Screenshot of a NoteBook and a Tree, on a PocketPC under Windows/CE

Tree examples

edit

Here is a "hello world" example of a Tree widget (this is a complete script). The root node is constantly called root, for the others you have to make up names:

package require BWidget
pack [Tree .t]
.t insert end root n1  -text hello
.t insert end root n2  -text world
.t insert end n2   n21 -text (fr:monde)
.t insert end n2   n22 -text (de:Welt)

The famous typewriter test sentence represented as a syntax tree:

package require BWidget
pack [Tree .t -height 16] -fill both -expand 1
foreach {from to text} {
   root S S
   S   np1  NP
   S   vp   VP
   np1 det1 Det:The
   np1 ap1  AP
   ap1 adj1 Adj:quick
   ap1 adj2 Adj:brown
   ap1 n1   N:fox
   vp  v    V:jumps
   vp  pp   PP
   pp  prep Prep:over
   pp  np2  NP
   np2 det2 Det:the
   np2 ap2  AP
   ap2 adj3 Adj:lazy
   ap2 n2   N:dog
   
} {.t insert end $from $to -text $text}
.t opentree S

Tk resources

edit

Colors

edit

Colors in Tk can be specified in three ways:

  • a symbolic name, like: red green blue yellow magenta cyan
  • a hex string preceded by a #: #RGB, #RRGGBB, #RRRRGGGGBBBB
  • a list of three non-negative integers

The last form is only returned by commands. To specify a color to a command, you'll have to hex-format it. For instance, white could be described as #FFFFFF.

To turn a symbolic name into its RGB components:

winfo rgb . $colorname

Here is the list of defined color names (as from X11's rgb.txt):

set COLORS { snow {ghost white} {white smoke} gainsboro {floral white}
   {old lace} linen {antique white} {papaya whip} {blanched almond}
   bisque {peach puff} {navajo white} moccasin cornsilk ivory {lemon
   chiffon} seashell honeydew {mint cream} azure {alice blue}
   lavender {lavender blush} {misty rose} white black {dark slate
   gray} {dim gray} {slate gray} {light slate gray} gray {light grey}
   {midnight blue} navy {cornflower blue} {dark slate blue} {slate
   blue} {medium slate blue} {light slate blue} {medium blue} {royal
   blue} blue {dodger blue} {deep sky blue} {sky blue} {light sky
   blue} {steel blue} {light steel blue} {light blue} {powder blue}
   {pale turquoise} {dark turquoise} {medium turquoise} turquoise
   cyan {light cyan} {cadet blue} {medium aquamarine} aquamarine
   {dark green} {dark olive green} {dark sea green} {sea green}
   {medium sea green} {light sea green} {pale green} {spring green}
   {lawn green} green chartreuse {medium spring green} {green yellow}
   {lime green} {yellow green} {forest green} {olive drab} {dark
   khaki} khaki {pale goldenrod} {light goldenrod yellow} {light
   yellow} yellow gold {light goldenrod} goldenrod {dark goldenrod}
   {rosy brown} {indian red} {saddle brown} sienna peru burlywood
   beige wheat {sandy brown} tan chocolate firebrick brown {dark
   salmon} salmon {light salmon} orange {dark orange} coral {light
   coral} tomato {orange red} red {hot pink} {deep pink} pink {light
   pink} {pale violet red} maroon {medium violet red} {violet red}
   magenta violet plum orchid {medium orchid} {dark orchid} {dark
   violet} {blue violet} purple {medium purple} thistle snow2 snow3
   snow4 seashell2 seashell3 seashell4 AntiqueWhite1 AntiqueWhite2
   AntiqueWhite3 AntiqueWhite4 bisque2 bisque3 bisque4 PeachPuff2
   PeachPuff3 PeachPuff4 NavajoWhite2 NavajoWhite3 NavajoWhite4
   LemonChiffon2 LemonChiffon3 LemonChiffon4 cornsilk2 cornsilk3
   cornsilk4 ivory2 ivory3 ivory4 honeydew2 honeydew3 honeydew4
   LavenderBlush2 LavenderBlush3 LavenderBlush4 MistyRose2 MistyRose3
   MistyRose4 azure2 azure3 azure4 SlateBlue1 SlateBlue2 SlateBlue3
   SlateBlue4 RoyalBlue1 RoyalBlue2 RoyalBlue3 RoyalBlue4 blue2 blue4
   DodgerBlue2 DodgerBlue3 DodgerBlue4 SteelBlue1 SteelBlue2
   SteelBlue3 SteelBlue4 DeepSkyBlue2 DeepSkyBlue3 DeepSkyBlue4
   SkyBlue1 SkyBlue2 SkyBlue3 SkyBlue4 LightSkyBlue1 LightSkyBlue2
   LightSkyBlue3 LightSkyBlue4 SlateGray1 SlateGray2 SlateGray3
   SlateGray4 LightSteelBlue1 LightSteelBlue2 LightSteelBlue3
   LightSteelBlue4 LightBlue1 LightBlue2 LightBlue3 LightBlue4
   LightCyan2 LightCyan3 LightCyan4 PaleTurquoise1 PaleTurquoise2
   PaleTurquoise3 PaleTurquoise4 CadetBlue1 CadetBlue2 CadetBlue3
   CadetBlue4 turquoise1 turquoise2 turquoise3 turquoise4 cyan2 cyan3
   cyan4 DarkSlateGray1 DarkSlateGray2 DarkSlateGray3 DarkSlateGray4
   aquamarine2 aquamarine4 DarkSeaGreen1 DarkSeaGreen2 DarkSeaGreen3
   DarkSeaGreen4 SeaGreen1 SeaGreen2 SeaGreen3 PaleGreen1 PaleGreen2
   PaleGreen3 PaleGreen4 SpringGreen2 SpringGreen3 SpringGreen4
   green2 green3 green4 chartreuse2 chartreuse3 chartreuse4
   OliveDrab1 OliveDrab2 OliveDrab4 DarkOliveGreen1 DarkOliveGreen2
   DarkOliveGreen3 DarkOliveGreen4 khaki1 khaki2 khaki3 khaki4
   LightGoldenrod1 LightGoldenrod2 LightGoldenrod3 LightGoldenrod4
   LightYellow2 LightYellow3 LightYellow4 yellow2 yellow3 yellow4
   gold2 gold3 gold4 goldenrod1 goldenrod2 goldenrod3 goldenrod4
   DarkGoldenrod1 DarkGoldenrod2 DarkGoldenrod3 DarkGoldenrod4
   RosyBrown1 RosyBrown2 RosyBrown3 RosyBrown4 IndianRed1 IndianRed2
   IndianRed3 IndianRed4 sienna1 sienna2 sienna3 sienna4 burlywood1
   burlywood2 burlywood3 burlywood4 wheat1 wheat2 wheat3 wheat4 tan1
   tan2 tan4 chocolate1 chocolate2 chocolate3 firebrick1 firebrick2
   firebrick3 firebrick4 brown1 brown2 brown3 brown4 salmon1 salmon2
   salmon3 salmon4 LightSalmon2 LightSalmon3 LightSalmon4 orange2
   orange3 orange4 DarkOrange1 DarkOrange2 DarkOrange3 DarkOrange4
   coral1 coral2 coral3 coral4 tomato2 tomato3 tomato4 OrangeRed2
   OrangeRed3 OrangeRed4 red2 red3 red4 DeepPink2 DeepPink3 DeepPink4
   HotPink1 HotPink2 HotPink3 HotPink4 pink1 pink2 pink3 pink4
   LightPink1 LightPink2 LightPink3 LightPink4 PaleVioletRed1
   PaleVioletRed2 PaleVioletRed3 PaleVioletRed4 maroon1 maroon2
   maroon3 maroon4 VioletRed1 VioletRed2 VioletRed3 VioletRed4
   magenta2 magenta3 magenta4 orchid1 orchid2 orchid3 orchid4 plum1
   plum2 plum3 plum4 MediumOrchid1 MediumOrchid2 MediumOrchid3
   MediumOrchid4 DarkOrchid1 DarkOrchid2 DarkOrchid3 DarkOrchid4
   purple1 purple2 purple3 purple4 MediumPurple1 MediumPurple2
   MediumPurple3 MediumPurple4 thistle1 thistle2 thistle3 thistle4
   gray1 gray2 gray3 gray4 gray5 gray6 gray7 gray8 gray9 gray10
   gray11 gray12 gray13 gray14 gray15 gray16 gray17 gray18 gray19
   gray20 gray21 gray22 gray23 gray24 gray25 gray26 gray27 gray28
   gray29 gray30 gray31 gray32 gray33 gray34 gray35 gray36 gray37
   gray38 gray39 gray40 gray42 gray43 gray44 gray45 gray46 gray47
   gray48 gray49 gray50 gray51 gray52 gray53 gray54 gray55 gray56
   gray57 gray58 gray59 gray60 gray61 gray62 gray63 gray64 gray65
   gray66 gray67 gray68 gray69 gray70 gray71 gray72 gray73 gray74
   gray75 gray76 gray77 gray78 gray79 gray80 gray81 gray82 gray83
   gray84 gray85 gray86 gray87 gray88 gray89 gray90 gray91 gray92
   gray93 gray94 gray95 gray97 gray98 gray99
}

In addition, the following are defined on Windows:

set WINDOWSCOLORS {
   SystemButtonFace SystemButtonText SystemDisabledText SystemHighlight
   SystemHightlightText SystemMenu SystemMenuText SystemScrollbar
   SystemWindow SystemWindowFrame SystemWindowText
}

Cursors

edit

For every widget, you can specify how the mouse cursor should look when over it. Here's the list of defined cursor names:

set cursors {
   X_cursor arrow based_arrow_down based_arrow_up boat bogosity
   bottom_left_corner bottom_right_corner bottom_side bottom_tee
   box_spiral center_ptr circle clock coffee_mug cross cross_reverse
   crosshair diamond_cross dot dotbox double_arrow draft_large draft_small
   draped_box exchange fleur gobbler gumby hand1 hand2 heart icon
   iron_cross left_ptr left_side left_tee leftbutton ll_angle lr_angle
   man middlebutton mouse pencil pirate plus question_arrow right_ptr
   right_side right_tee rightbutton rtl_logo sailboat sb_down_arrow
   sb_h_double_arrow sb_left_arrow sb_right_arrow sb_up_arrow
   sb_v_double_arrow shuttle sizing spider spraycan star target tcross
   top_left_arrow top_left_corner top_right_corner top_side top_tee 
   trek ul_angle umbrella ur_angle watch xterm
}

A little tool that presents the cursor names, and shows each cursor shape when mousing over:

set ncols 4
for {set i 0} {$i<$ncols} {incr i} {
   lappend cols col$i
}
set nmax [expr {int([llength $cursors]*1./$ncols)}]
foreach col $cols {
   set $col [lrange $cursors 0 $nmax]
   set cursors [lrange $cursors [expr $nmax+1] end]
}
label .top -text "Move the cursor over a name to see how it looks" \
   -relief ridge
grid .top -columnspan $ncols -sticky news -ipady 2
for {set i 0} {$i<[llength $col0]} {incr i} {
   set row {}
   foreach col $cols {
       set name [lindex [set $col] $i]
       if {$name eq ""} break
       lappend row .l$name
       label .l$name -text $name -anchor w
       bind .l$name <Enter> [list %W config -cursor $name]
   }
   eval grid $row -sticky we
}

Fonts

edit

Fonts are provided by the windowing system. Which are available depends on the local installation. Find out which fonts are available with

font families

The typical description of a font is a list of up to three elements:

family size ?style?

Example:

set f {{Bitstream Cyberbit} 10 bold}

Family is a name like Courier, Helvetica, Times, ... Best pick one of the names delivered by font families, though there may be some mappings like "Helvetica" -> "Arial"

Size is point size (a typographer's point is 1/72th of an inch) if positive, or pixel size if negative. Normal display fonts often have a size of 9 or 10.

Style can be a list of zero or more of bold, italic, underlined, ...

Images: photos and bitmaps

Tk allows simple yet powerful operations on images. These come in two varieties: bitmaps and photos. Bitmaps are rather limited in functionality, they can be specified in XBM format, and rendered in configurable colors.

Photos have much more possibilities - you can load and save them in different file formats (Tk itself supports PPM and GIF - for others, use the Img extension), copy them with many options, like scaling/subsampling or mirroring, and get or set the color of single pixels.

Setting can also do rectangular regions in one go, as the following example shall demonstrate that creates a photo image of a tricolore flag (three even-spaced vertical bands - like France, Italy, Belgium, Ireland and many more). The default is France. You can specify the width, the height will be 2/3 of that. The procedure returns the image name - just save it in -format GIF if you want:

proc tricolore {w {colors {blue white red}}} {
   set im [image create photo]
   set fromx 0
   set dx [expr $w/3]
   set tox $dx
   set toy [expr $w*2/3]
   foreach color $colors {
       $im put $color -to $fromx 0 $tox $toy
       incr fromx $dx; incr tox $dx
   }
   set im
}
# Test - display flag on canvas:
pack [canvas .c -width 200 -height 200 -background grey]
.c create image 100 100 -image [tricolore 150]
# Test - save image of flag to file:
[tricolore 300] write tric.gif -format gif

Debugging Tk programs

edit

A Tk program under development can be very rapidly debugged by adding such bindings:

bind . <F1> {console show}

This works only on Windows and Macintosh (pre-OS-X) and brings up a console in which you can interact with the Tcl interpreter, inspect or modify global variables, configure widgets, etc.

bind . <Escape> {eval [list exec wish $argv0] $argv &; exit}

This starts a new instance of the current program (assuming you edited a source file and saved it to disk), and then terminates the current instance.

For short debugging output, one can also use the window's title bar. For example, to display the current mouse coordinates when it moves:

bind . <Motion> {wm title . %X/%Y}

Other languages

edit

Other programming languages have modules that interface and use Tcl/Tk:

  • In R (programming language), there's a tcltk library, invoked with the command library(tcltk)
  • In Python, there's a tkinter module, invoked with import tkinter or from tkinter import *
  • Common Lisp can communicate with Tcl/Tk via several externally available libraries, including CL-TK and LTK

Tk examples

edit

The following examples originally appeared in the Tcler's Wiki http://wiki.tcl.tk . They are all in the public domain - no rights reserved.

A funny cookbook

edit

This funny little program produces random cooking recipes. Small as it is, it can produce 900 different recipes, though they might not be to everybody's taste... The basic idea is to pick an arbitrary element from a list, which is easily done in Tcl with the following:

proc ? L {lindex $L [expr {int(rand()*[llength $L])}]}

This is used several times in:

proc recipe {} {
  set a {
    {3 eggs} {an apple} {a pound of garlic}
    {a pumpkin} {20 marshmallows}
  }
  set b {
    {Cut in small pieces} {Dissolve in lemonade}
    {Bury in the ground for 3 months}
    {Bake at 300 degrees} {Cook until tender}
  }
  set c {parsley snow nutmeg curry raisins cinnamon}
  set d {
     ice-cream {chocolate cake} spinach {fried potatoes} rice {soy sprouts}
  }
  return "   Take [? $a].
  [? $b].
  Top with [? $c].
  Serve with [? $d]."
}

And as modern programs always need a GUI, here is a minimal one that appears when you source this file at top level, and shows a new recipe every time you click on it:

if {[file tail [info script]]==[file tail $argv0]} {
  package require Tk
  pack [text .t -width 40 -height 5]
  bind .t <1> {showRecipe %W; break}
  proc showRecipe w {
    $w delete 1.0 end
    $w insert end [recipe]
  }
  showRecipe .t
}

Enjoy!

A little A/D clock

edit

This is a clock that shows time either analog or digital - just click on it to toggle.

#!/usr/bin/env tclsh
package require Tk

proc every {ms body} {eval $body; after $ms [info level 0]}

proc drawhands w {
    $w delete hands
    set secSinceMidnight [expr {[clock sec]-[clock scan 00:00:00]}]
    foreach divisor {60 3600 43200} length {45 40 30} width {1 3 7} {
       set angle [expr {$secSinceMidnight * 6.283185 / $divisor}]
       set x [expr {50 + $length * sin($angle)}]
       set y [expr {50 - $length * cos($angle)}]
       $w create line 50 50 $x $y -width $width -tags hands
    }
}
proc toggle {w1 w2} {
    if [winfo ismapped $w2] {
        foreach {w2 w1} [list $w1 $w2] break ;# swap
    }
    pack forget $w1
    pack $w2
}
#-- Creating the analog clock:
canvas .analog -width 100 -height 100 -bg white
every 1000 {drawhands .analog}
pack .analog

#-- Creating the digital clock:
label .digital -textvar ::time -font {Courier 24}
every 1000 {set ::time [clock format [clock sec] -format %H:%M:%S]}

bind . <1> {toggle .analog .digital}

A little pie chart

edit

 

Arc elements of a canvas are by default rendered as pie slices (part of the circumference of a circle, connected by radius lines to the center. Hence it s rather easy to produce a pie chart. The following code is a bit more complex, as it also determines positions for the labels of the pies:

proc piechart {w x y width height data} {
   set coords [list $x $y [expr {$x+$width}] [expr {$y+$height}]]
   set xm  [expr {$x+$width/2.}]
   set ym  [expr {$y+$height/2.}]
   set rad [expr {$width/2.+20}]
   set sum 0
   foreach item $data {set sum [expr {$sum + [lindex $item 1]}]}
   set start 270
   foreach item $data {
       foreach {name n color} $item break
       set extent [expr {$n*360./$sum}]
       $w create arc $coords -start $start -extent $extent -fill $color
       set angle [expr {($start-90+$extent/2)/180.*acos(-1)}]
       set tx [expr $xm-$rad*sin($angle)]
       set ty [expr $ym-$rad*cos($angle)]
       $w create text $tx $ty -text $name:$n  -tag txt
       set start [expr $start+$extent]
   }
   $w raise txt
}

Testing:

pack [canvas .c -bg white]
piechart .c 50 50 150 150 {
   {SPD  199 red}
   {CDU  178 gray}
   {CSU   23 blue}
   {FDP   60 yellow}
   {Grüne 58 green}
   {Linke 55 purple}
}

A little 3D bar chart

edit

 

The following script displays a bar chart on a canvas, with pseudo-3-dimensional bars - a rectangle in front as specified, embellished with two polygons - one for the top, one for the side:}

proc 3drect {w args} {
   if [string is int -strict [lindex $args 1]] {
      set coords [lrange $args 0 3]
   } else {
      set coords [lindex $args 0]
   }
   foreach {x0 y0 x1 y1} $coords break
   set d [expr {($x1-$x0)/3}]
   set x2 [expr {$x0+$d+1}]
   set x3 [expr {$x1+$d}]
   set y2 [expr {$y0-$d+1}]
   set y3 [expr {$y1-$d-1}]
   set id [eval [list $w create rect] $args]
   set fill [$w itemcget $id -fill]
   set tag [$w gettags $id]
   $w create poly $x0 $y0 $x2 $y2 $x3 $y2 $x1 $y0 \
       -fill [dim $fill 0.8] -outline black
   $w create poly $x1 $y1 $x3 $y3 $x3 $y2 $x1 $y0 \
       -fill [dim $fill 0.6] -outline black -tag $tag
}

For a more plastic look, the fill color of the polygons is reduced in brightness ("dimmed"):

proc dim {color factor} {
  foreach i {r g b} n [winfo rgb . $color] d [winfo rgb . white] {
     set $i [expr int(255.*$n/$d*$factor)]
  }
  format #%02x%02x%02x $r $g $b
}

Draw a simple scale for the y axis, and return the scaling factor:

proc yscale {w x0 y0 y1 min max} {
  set dy   [expr {$y1-$y0}]
  regexp {([1-9]+)} $max -> prefix
  set stepy [expr {1.*$dy/$prefix}]
  set step [expr {$max/$prefix}]
  set y $y0
  set label $max
  while {$label>=$min} {
     $w create text $x0 $y -text $label -anchor w
     set y [expr {$y+$stepy}]
     set label [expr {$label-$step}]
  }
  expr {$dy/double($max)}
}

An interesting sub-challenge was to round numbers very roughly, to 1 or maximally 2 significant digits - by default rounding up, add "-" to round down:

proc roughly {n {sgn +}} {
  regexp {(.+)e([+-])0*(.+)} [format %e $n] -> mant sign exp
  set exp [expr $sign$exp]
  if {abs($mant)<1.5} {
     set mant [expr $mant*10]
     incr exp -1
  }
  set t [expr round($mant $sgn 0.49)*pow(10,$exp)]
  expr {$exp>=0? int($t): $t}
}

So here is my little bar chart generator. Given a canvas pathname, a bounding rectangle, and the data to display (a list of {name value color} triples), it figures out the geometry. A gray "ground plane" is drawn first. Note how negative values are tagged with "d"(eficit), so they look like they "drop through the plane".

proc bars {w x0 y0 x1 y1 data} {
   set vals 0
   foreach bar $data {
      lappend vals [lindex $bar 1]
   }
   set top [roughly [max $vals]]
   set bot [roughly [min $vals] -]
   set f [yscale $w $x0 $y0 $y1 $bot $top]
   set x [expr $x0+30]
   set dx [expr ($x1-$x0-$x)/[llength $data]]
   set y3 [expr $y1-20]
   set y4 [expr $y1+10]
   $w create poly $x0 $y4 [expr $x0+30] $y3 $x1 $y3 [expr $x1-20] $y4 -fill gray65
   set dxw [expr $dx*6/10]
   foreach bar $data {
      foreach {txt val col} $bar break
      set y [expr {round($y1-($val*$f))}]
      set y1a $y1
      if {$y>$y1a} {swap y y1a}
      set tag [expr {$val<0? "d": ""}]
      3drect $w $x $y [expr $x+$dxw] $y1a -fill $col -tag $tag
      $w create text [expr {$x+12}] [expr {$y-12}] -text $val
      $w create text [expr {$x+12}] [expr {$y1a+2}] -text $txt -anchor n
      incr x $dx
   }
   $w lower d
}

Generally useful helper functions:

proc max list {
   set res [lindex $list 0]
   foreach e [lrange $list 1 end] {
      if {$e>$res} {set res $e}
   }
   set res
}
proc min list {
   set res [lindex $list 0]
   foreach e [lrange $list 1 end] {
      if {$e<$res} {set res $e}
   }
   set res
}
proc swap {_a _b} {
   upvar 1 $_a a $_b b
   foreach {a b} [list $b $a] break
}

Testing the whole thing (see screenshot):

pack [canvas .c -width 240 -height 280]
bars .c 10 20 240 230 {
  {red 765 red}
  {green 234 green}
  {blue 345 blue}
  {yel-\nlow 321 yellow}
  {ma-\ngenta 567 magenta}
  {cyan -123 cyan}
  {white 400 white}
}
.c create text 120 10 -anchor nw -font {Helvetica 18} -text "Bar Chart\nDemo"

A little calculator

edit

 

Here is a small calculator in Tcl/Tk. In addition to the buttons on screen, you can use any of expr's other functionalities via keyboard input.

package require Tk
wm title . Calculator
grid [entry .e -textvar e -just right] -columnspan 5
bind .e <Return> =
set n 0
foreach row {
   {7 8 9 + -}
   {4 5 6 * /}
   {1 2 3 ( )}
   {C 0 . =  }
} {
   foreach key $row {
       switch -- $key {
           =       {set cmd =}
           C       {set cmd {set clear 1; set e ""}}
           default {set cmd "hit $key"}
       }
       lappend keys [button .[incr n] -text $key -command $cmd]
   }
   eval grid $keys -sticky we ;#-padx 1 -pady 1
   set keys [list]
}
grid .$n -columnspan 2 ;# make last key (=) double wide
proc = {} {
   regsub { =.+} $::e "" ::e ;# maybe clear previous result
   if [catch {set ::res [expr [string map {/ *1.0/} $::e]]}] {
       .e config -fg red
   }
   append ::e = $::res 
   .e xview end
   set ::clear 1
}
proc hit {key} {
   if $::clear {
       set ::e ""
       if ![regexp {[0-9().]} $key] {set ::e $::res}
       .e config -fg black
       .e icursor end
       set ::clear 0
   }
   .e insert end $key
}
set clear 0
focus .e           ;# allow keyboard input
wm resizable . 0 0

And, as Cameron Laird noted, this thingy is even programmable: enter for example

[proc fac x {expr {$x<2? 1: $x*[fac [incr x -1]]}}]

into the entry, disregard warnings; now you can do

[fac 10]

and receive [fac 10] = 3628800.0 as result...

A little slide rule

edit

The slide rule was an analog, mechanical device for approximate engineering computing, made obsolete by the pocket calculator since about the 1970-80s. The basic principle is that multiplication is done by adding logarithms, hence most of the scales are logarithmic, with uneven increments.

 

This fun project recreates a slide rule (roughly an Aristo-Rietz Nr. 89 with 7 scales - high-notch ones had up to 24) with a white "body" and a beige "slide" which you can move left or right with mouse button 1 clicked, or in pixel increment with the <Shift-Left>/<Shift-Right> cursor keys. Finally, the blue line represents the "mark" (how is that correctly called? "runner"? "slider"?) which you can move with the mouse over the whole thing to read a value. Fine movements with <Left>/<Right>.

Due to rounding errors (integer pixels), this plaything is even less precise than a physical slide rule was, but maybe you still enjoy the memories... The screenshot shows how I found out that 3 times 7 is approx. 21... (check the A and B scales).

proc ui {} {
   set width 620
   pack [canvas .c -width $width -height 170 -bg white]
   pack [label .l -textvariable info -fg blue] -fill x
   .c create rect 0 50 $width 120 -fill grey90
   .c create rect 0 50 $width 120 -fill beige -outline beige \
       -tag {slide slidebase}
   .c create line 0 0 0 120 -tag mark -fill blue
   drawScale .c K  x3    10 5    5 log10 1 1000 186.6666667
   drawScale .c A  x2    10 50  -5 log10 1 100 280
   drawScale .c B  x2    10 50   5 log10 1 100 280 slide
   drawScale .c CI 1/x   10 90 -5 -log10 1 10  560 slide
   drawScale .c C  x     10 120 -5 log10 1 10  560 slide
   drawScale .c D  x     10 120  5 log10 1 10  560
   drawScale .c L "lg x" 10 160 -5 by100  0 10   5600
   bind .c <Motion> {.c coords mark %x 0 %x 170; set info [values .c]}
   bind .c <1> {set x %x}
   bind .c <B1-Motion> {%W move slide [expr {%x-$x}] 0; set x %x}
   bind . <Shift-Left>  {.c move slide -1 0; set info [values .c]}
   bind . <Shift-Right> {.c move slide  1 0; set info [values .c]}
   bind . <Left>  {.c move mark -1 0; set info [values .c]}
   bind . <Right> {.c move mark  1 0; set info [values .c]}
}
proc drawScale {w name label x y dy f from to fac {tag {}}} {
   set color [expr {[string match -* $f]? "red": "black"}]
   $w create text $x [expr $y+2*$dy] -text $name -tag $tag -fill $color
   $w create text 600 [expr $y+2*$dy] -text $label -tag $tag -fill $color
   set x [expr {[string match -* $f]? 580: $x+10}]
   set mod 5
   set lastlabel ""
   set lastx 0
   for {set i [expr {$from*10}]} {$i<=$to*10} {incr i} {
       if {$i>100} {
           if {$i%10} continue ;# coarser increments
           set mod 50
       }
       if {$i>1000} {
           if {$i%100} continue ;# coarser increments
           set mod 500
       }
       set x0 [expr $x+[$f [expr {$i/10.}]]*$fac]
       set y1 [expr {$i%(2*$mod)==0? $y+2.*$dy: $i%$mod==0? $y+1.7*$dy: $y+$dy}]
       set firstdigit [string index $i 0]
       if {$y1==$y+$dy && abs($x0-$lastx)<2} continue
       set lastx $x0
       if {$i%($mod*2)==0 && $firstdigit != $lastlabel} {
           $w create text $x0 [expr $y+3*$dy] -text $firstdigit \
              -tag $tag -font {Helvetica 7} -fill $color
           set lastlabel $firstdigit
       }
       $w create line $x0 $y $x0 $y1 -tag $tag -fill $color
   }
}
proc values w {
   set x0 [lindex [$w coords slidebase] 0]
   set x1 [lindex [$w coords mark] 0]
   set lgx [expr {($x1-20)/560.}]
   set x [expr {pow(10,$lgx)}]
   set lgxs [expr {($x1-$x0-20)/560.}]
   set xs [expr {pow(10,$lgxs)}]
   set res     K:[format %.2f [expr {pow($x,3)}]]
   append res "  A:[format %.2f [expr {pow($x,2)}]]"
   append res "  B:[format %.2f [expr {pow($xs,2)}]]"
   append res "  CI:[format %.2f [expr {pow(10,-$lgxs)*10}]]"
   append res "  C:[format %.2f $xs]"
   append res "  D:[format %.2f $x]"
   append res "  L:[format %.2f $lgx]"
}
proc pow10 x {expr {pow(10,$x)}}
proc log10 x {expr {log10($x)}}
proc -log10 x {expr {-log10($x)}}
proc by100  x {expr {$x/100.}}
#--------------------------------
ui
bind . <Escape> {exec wish $argv0 &; exit}

A minimal doodler

edit

Here is a tiny but complete script that allows doodling (drawing with the mouse) on a canvas widget:

proc doodle {w {color black}} {
   bind $w <1>         [list doodle'start %W %x %y $color]
   bind $w <B1-Motion> {doodle'move %W %x %y}
}
proc doodle'start {w x y color} {
   set ::_id [$w create line $x $y $x $y -fill $color]
}
proc doodle'move {w x y} {
   $w coords $::_id [concat [$w coords $::_id] $x $y]
}
pack [canvas .c -bg white] -fill both -expand 1
doodle       .c
bind .c <Double-3> {%W delete all}

 

And here it comes again, but this time with explanations:

The "Application Program Interface" (API) for this, if you want such ceremonial language, is the doodle command, where you specify which canvas widget should be enabled to doodle, and in which color (defaults to black):}

proc doodle {w {color black}} {
   bind $w <1>         [list doodle'start %W %x %y $color]
   bind $w <B1-Motion> {doodle'move %W %x %y}
}

It registers two bindings for the canvas, one (<1>) when the left mouse-button is clicked, and the other when the mouse is moved with button 1 (left) down. Both bindings just call one internal function each.

On left-click, a line item is created on the canvas in the specified fill color, but with no extent yet, as start and end points coincide. The item ID (a number assigned by the canvas) is kept in a global variable, as it will have to persist long after this procedure has returned:

proc doodle'start {w x y color} {
   set ::_id [$w create line $x $y $x $y -fill $color]
}

The left-motion procedure obtains the coordinates (alternating x and y) of the globally known doodling line object, appends the current coordinates to it, and makes this the new coordinates - in other words, extends the line to the current mouse position:

proc doodle'move {w x y} {
   $w coords $::_id [concat [$w coords $::_id] $x $y]
}

That's all we need to implement doodling - now let's create a canvas to test it, and pack it so it can be drawn as big as you wish:

pack [canvas .c -bg white] -fill both -expand 1

And this line turns on the doodle functionality created above (defaulting to black):

doodle       .c

Add a binding for double-right-click/double-button-3, to clear the canvas (added by MG, Apr 29 04)

bind .c <Double-3> {%W delete all}

A tiny drawing program

edit

Here is a tiny drawing program on a canvas. Radio buttons on top allow choice of draw mode and fill color. In "Move" mode, you can of course move items around. Right-click on an item to delete it.

 

A radio is an obvious "megawidget" to hold a row of radiobuttons. This simple one allows text or color mode: }

proc radio {w var values {col 0}} {
   frame $w
   set type [expr {$col? "-background" : "-text"}]
   foreach value $values {
       radiobutton $w.v$value $type $value -variable $var -value $value \
           -indicatoron 0
       if $col {$w.v$value config -selectcolor $value -borderwidth 3}
   }
   eval pack [winfo children $w] -side left
   set ::$var [lindex $values 0]
   set w
}

Depending on draw mode, the mouse events "Down" and "Motion" have different handlers, which are dispatched by names that look like array elements. So for a mode X, we need a pair of procs, down(X) and move(X). Values used between calls are kept in global variables.

First, the handlers for free-hand line drawing:

proc down(Draw) {w x y} {
   set ::ID [$w create line $x $y $x $y -fill $::Fill]
}
proc move(Draw) {w x y} {
   $w coords $::ID [concat [$w coords $::ID] $x $y]
}
#-- Movement of an item
proc down(Move) {w x y} {
   set ::ID [$w find withtag current]
   set ::X $x; set ::Y $y
}
proc move(Move) {w x y} {
   $w move $::ID [expr {$x-$::X}] [expr {$y-$::Y}]
   set ::X $x; set ::Y $y
}
#-- Clone an existing item
proc serializeCanvasItem {c item} {
   set data [concat [$c type $item] [$c coords $item]]
   foreach opt [$c itemconfigure $item] {
       # Include any configuration that deviates from the default
       if {[lindex $opt end] != [lindex $opt end-1]} {
           lappend data [lindex $opt 0] [lindex $opt end]
           }
       }
   return $data
   }
proc down(Clone) {w x y} {
   set current [$w find withtag current]
   if {[string length $current] > 0} {
       set itemData [serializeCanvasItem $w [$w find withtag current]]
       set ::ID [eval $w create $itemData]
       set ::X $x; set ::Y $y
   }
}
interp alias {} move(Clone) {} move(Move)
#-- Drawing a rectangle
proc down(Rect) {w x y} {
   set ::ID [$w create rect $x $y $x $y -fill $::Fill]
}
proc move(Rect) {w x y} {
   $w coords $::ID [lreplace [$w coords $::ID] 2 3 $x $y]
}
#-- Drawing an oval (or circle, if you're careful)
proc down(Oval) {w x y} {
   set ::ID [$w create oval $x $y $x $y -fill $::Fill]
}
proc move(Oval) {w x y} {
   $w coords $::ID [lreplace [$w coords $::ID] 2 3 $x $y]
}

Polygons are drawn by clicking the corners. When a corner is close enough to the first one, the polygon is closed and drawn.

proc down(Poly) {w x y} {
   if [info exists ::Poly] {
       set coords [$w coords $::Poly]
       foreach {x0 y0} $coords break
       if {hypot($y-$y0,$x-$x0)<10} {
           $w delete $::Poly
           $w create poly [lrange $coords 2 end] -fill $::Fill
           unset ::Poly
       } else {
           $w coords $::Poly [concat $coords $x $y]
       }
   } else {
       set ::Poly [$w create line $x $y $x $y -fill $::Fill]
   }
}
proc move(Poly) {w x y} {#nothing}
#-- With little more coding, the Fill mode allows changing an item's fill color:
proc down(Fill) {w x y} {$w itemconfig current -fill $::Fill}
proc move(Fill) {w x y} {}
#-- Building the UI
set modes {Draw Move Clone Fill Rect Oval Poly}
set colors {
   black white magenta brown red orange yellow green green3 green4
   cyan blue blue4 purple
}
grid [radio .1 Mode $modes] [radio .2 Fill $colors 1] -sticky nw
grid [canvas .c -relief raised -borderwidth 1] - -sticky news
grid rowconfig . 0 -weight 0
grid rowconfig . 1 -weight 1
#-- The current mode is retrieved at runtime from the global Mode variable:
bind .c <1>         {down($Mode) %W %x %y}
bind .c <B1-Motion> {move($Mode) %W %x %y}
bind .c <3>         {%W delete current}

For saving the current image, you need the Img extension, so just omit the following binding if you don't have Img:

bind . <F1> {
   package require Img
   set img [image create photo -data .c]
   set name [tk_getSaveFile -filetypes {{GIFF .gif} {"All files" *}}\
       -defaultextension .gif]
   if {$name ne ""} {$img write $name; wm title . $name}
}
#-- This is an always useful helper in development:
bind . <Escape> {exec wish $argv0 &; exit}

A minimal editor

edit

Here's an utterly simple editor, in 26 lines of code, which just allows to load and save files, and of course edit, and cut and paste, and whatever is built-in into the text widget anyway. And it has a bit "online help"... ;-)

It is always a good idea to start a source file with some explanations on the name, purpose, author, and date. I have recently picked up the habit to put this information into a string variable (which in Tcl can easily span multiple lines), so the same info is presented to the reader of the source code, and can be displayed as online help: }

set about "minEd - a minimal editor
Richard Suchenwirth 2003
F1: help
F2: load
F3: save
"

The visible part of a Graphical User Interface (GUI) consists of widgets. For this editor, I of course need a text widget, and a vertical scrollbar. With the option "-wrap word" for the text widget, another horizontal scrollbar is not needed - lines longer than the window just wrap at word boundaries.

Tk widgets come on the screen in two steps: first, they are created with an initial configuration; then, handed to a "geometry manager" for display. As widget creation commands return the pathname, they can be nested into the manager command (pack in this case), to keep all settings for a widget in one place. This may lead to over-long lines, though.

Although the scrollbar comes to the right of the text, I create and pack it first. The reason is that when a window is made smaller by the user, the widgets last packed first lose visibility.

These two lines also illustrate the coupling between a scrollbar and the widget it controls:

  • the scrollbar sends a yview message to it when moved
  • the widget sends a set message to the scrollbar when the view changed, for instance from cursor keys

And these two lines already give us an editor for arbitrarily long text, with built-in capabilities of cut, copy, and paste - see the text man page. Only file I/O has to be added by us to make it really usable.

pack [scrollbar .y -command ".t yview"] -side right -fill y
pack [text .t -wrap word -yscrollc ".y set"] -side right -fill both -expand 1

Are you targetting 8.4 or later? If so, add -undo 1 to the options to text and get full undo/redo support!

pack [text .t -wrap word -yscrollc ".y set" -undo 1] -side right -fill both -expand 1

The other important part of a GUI are the bindings - what event shall trigger what action. For simplicity, I've limited the bindings here to a few of the function keys on top of typical keyboards:

bind . <F1> {tk_messageBox -message $about}

Online help is done with a no-frills tk_messageBox with the "about" text defined at start of file. - The other bindings call custom commands, which get a filename argument from Tk's file selector dialogs:

bind . <F2> {loadText .t [tk_getOpenFile]}
bind . <F3> {saveText .t [tk_getSaveFile]}

These dialogs can also be configured in a number of ways, but even in this simple form they are quite powerful - allow navigation around the file system, etc. On Windows they call the native file selectors, which have a history of previously opened files, detail view (size/date etc.)

When this editor is called with a filename on the command line, that file is loaded on startup (simple as it is, it can only handle one file at a time):

if {$argv != ""} {loadText .t [lindex $argv 0]}

The procedures for loading and saving text both start with a sanity check of the filename argument - if it's an empty string, as produced by file selector dialogs when the user cancels, they return immediately. Otherwise, they transfer file content to text widget or vice-versa. loadText adds the "luxury" that the name of the current file is also put into the window title. Then it opens the file, clears the text widget, reads all file contents in one go, and puts them into the text widget.

proc loadText {w fn} {
   if {$fn==""} return
   wm title . [file tail $fn]
   set fp [open $fn]
   $w delete 1.0 end
   $w insert end [read $fp]
   close $fp
}

saveText takes care not to save the extra newline that text widgets append at end, by limiting the range to "end - 1 c"(haracter).

proc saveText {w fn} {
   if {$fn==""} return
   set fp [open $fn w]
   puts -nonewline $fp [$w get 1.0 "end - 1 c"]
   close $fp
}

File watch

edit

Some editors (e.g. PFE, MS Visual Studio) pop up an alert dialog when a file was changed on disk while being edited - that might lead to edit conflicts. Emacs shows a more subtle warning at the first attempt to change a file that has changed on disk.

Here I try to emulate this feature. It is oversimplified because it does not update the mtime (file modification time) to check, once you saved it from the editor itself. So make sure to call text'watch'file again after saving.

Using the global variable ::_twf it is at least possible to avoid false alarms - for a more serious implementation one might use a namespaced array of watchers, indexed by file name, in case you want multiple edit windows. }

proc text'watch'file {w file {mtime -}} {
   set checkinterval 1000 ;# modify as needed
   if {$mtime eq "-"} {
       if [info exists ::_twf] {after cancel $::_twf}
       set file [file join [pwd] $file]
       text'watch'file $w $file [file mtime $file]
   } else {
       set newtime [file mtime $file]
       if {$newtime != $mtime} {
           set answer [tk_messageBox -type yesno -message \
               "The file\n$file\nhas changed on disk. Reload it?"]
           if {$answer eq "yes"} {text'read'file $w $file}
           text'watch'file $w $file
       } else {set ::_twf [after $checkinterval [info level 0]]}
   }
}
proc text'read'file {w file} {
   set f [open $file]
   $w delete 1.0 end
   $w insert end [read $f]
   close $f
}
#-- Testing:
pack [text .t -wrap word] -fill both -expand 1
set file textwatch.tcl
text'read'file  .t $file
text'watch'file .t $file

The dialog should come up when you change the file externally, say by touch-ing it in pure Tcl, which might be done with editing it in another editor, or

file mtime $filename [clock seconds]

Tiny presentation graphics

edit

This is a crude little canvas presentation graphics that runs on PocketPCs, but also on bigger boxes (one might scale fonts and dimensions there). Switch pages with Left/Right cursor, or left/right mouseclick (though a stylus cannot right-click).

Not many features, but the code is very compact, and with a cute little language for content specification, see example at end (which shows what I presented at the 2003 Euro-Tcl convention in Nuremberg...)}

proc slide args {
  global slides
  if {![info exist slides]} slide'init
  incr slides(N)
  set slides(title,$slides(N)) [join $args]
}
proc slide'line {type args} {
  global slides
  lappend slides(body,$slides(N)) [list $type [join $args]]
}
foreach name {* + -} {interp alias {} $name {} slide'line $name}
proc slide'init {} {
  global slides
  array set slides {
     canvas .c  N 0  show 1 dy 20
     titlefont {Tahoma 22 bold} * {Tahoma 14 bold} + {Tahoma 12}
     - {Courier 10}
  }
  pack [canvas .c -bg white] -expand 1 -fill both
  foreach e {<1> <Right>} {bind . $e {slide'show 1}}
  foreach e {<3> <Left>} {bind . $e {slide'show -1}}
  wm geometry . +0+0
  after idle {slide'show 0}
}
proc slide'show delta {
  upvar #0 slides s
  incr s(show) $delta
  if {$s(show)<1 || $s(show)>$s(N)} {
     incr s(show) [expr -$delta]
  } else {
     set c $s(canvas)
     $c delete all
     set x 10; set y 20
     $c create text $x $y -anchor w -text $s(title,$s(show))\
        -font $s(titlefont) -fill blue
     incr y $s(dy)
     $c create line $x $y 2048 $y -fill red -width 4
     foreach line $s(body,$s(show)) {
        foreach {type text} $line break
        incr y $s(dy)
        $c create text $x $y -anchor w -text $text \
        -font $s($type)
     }
  }
}
bind . <Up> {exec wish $argv0 &; exit} ;# dev helper

The rest is data - or is it code? Anyway, here's my show:

slide i18n - Tcl for the world
+ Richard Suchenwirth, Nuremberg 2003
+
* i18n: internationalization
+ 'make software work with many languages'
+
* l10n: localization
+ 'make software work with the local language'
slide Terminology
* Glyphs:
+ visible elements of writing
* Characters:
+ abstract elements of writing
* Byte sequences:
+ physical text data representation
* Rendering: character -> glyph
* Encoding: character <-> byte sequence
slide Before Unicode
* Bacon (1580), Baudot: 5-bit encodings
* Fieldata (6 bits), EBCDIC (8 bits)
* ASCII (7 bits)
+ world-wide "kernel" of encodings
* 8-bit codepages: DOS, Mac, Windows
* ISO 8859-x: 16 varieties
slide East Asia
* Thousands of characters/country
+ Solution: use 2 bytes, 94x94 matrix
+ Japan: JIS C-6226/0208/0212
+ China: GB2312-80
+ Korea: KS-C 5601
+
* coexist with ASCII in EUC encodings
slide Unicode covers all
* Common standard of software industry
* kept in synch with ISO 10646
+ Used to be 16 bits, until U 3.1
+ Now needs up to 31 bits
* Byte order problem:
+ little-endian/big-endian
+ U+FEFF "Byte Order Mark"
+ U+FFFE --illegal--
slide UTF-8
* Varying length: 1..3(..6) bytes
+ 1 byte: ASCII
+ 2 bytes: pages 00..07, Alphabets
+ 3 bytes: pages 08..FF, rest of BMP
+ >3 bytes: higher pages
+
* Standard in XML, coming in Unix
slide Tcl i18n
* Everything is a Unicode string (BMP)
+ internal rep: UTF-8/UCS-2
* Important commands:
- fconfigure \$ch -encoding \$e
- encoding convertfrom \$e \$s
- encoding convertto   \$e \$s
+
* msgcat supports l10n:
- {"File" -> [mc "File"]}
slide Tk i18n
* Any widget text is Unicoded
* Automatic font finding
+ Fonts must be provided by system
+
* Missing: bidi treatment
+ right-to-left conversion (ar,he)
slide Input i18n
* Keyboard rebinding (bindtags)
* East Asia: keyboard buffering
+ Menu selection for ambiguities
+
* Virtual keyboard (buttons, canvas)
* String conversion: *lish family
- {[ruslish Moskva]-[greeklish Aqh'nai]}
slide i18n - Tcl for the world
+
+
+ Thank you.

Timeline display

edit

Yet another thing to do with a canvas: history visualisation of a horizontal time-line, for which a year scale is displayed on top. The following kinds of objects are so far available:

  • "eras", displayed in yellow below the timeline in boxes
  • "background items" that are grey and stretch over all the canvas in height
  • normal items, which get displayed as stacked orange bars

 

You can zoom in with <1>, out with <3> (both only in x direction). On mouse motion, the current year is displayed in the toplevel's title. Normal items can be a single year, like the Columbus example, or a range of years, for instance for lifetimes of persons. (The example shows that Mozart didn't live long...)

namespace eval timeliner {
   variable ""
   array set "" {-zoom 1  -from 0 -to 2000}
}
proc timeliner::create {w args} {
   variable ""
   array set "" $args
   #-- draw time scale
   for {set x [expr ($(-from)/50)*50]} {$x<=$(-to)} {incr x 10} {
       if {$x%50 == 0} {
           $w create line $x 8 $x 0
           $w create text $x 8 -text $x -anchor n
       } else {
           $w create line $x 5 $x 0
       }
   }
   bind $w <Motion> {timeliner::title %W %x ; timeliner::movehair %W %x}
   bind $w <1> {timeliner::zoom %W %x 1.25}
   bind $w <2> {timeliner::hair %W %x}
   bind $w <3> {timeliner::zoom %W %x 0.8}
}
proc timeliner::movehair {w x} {
   variable ""
   if {[llength [$w find withtag hair]]} {
       set x [$w canvasx $x]
       $w move hair [expr {$x - $(x)}] 0
       set (x) $x
   }
}
proc timeliner::hair {w x} {
   variable ""
   if {[llength [$w find withtag hair]]} {
       $w delete hair
   } else {
       set (x) [$w canvasx $x]
       $w create line $(x) 0 $(x) [$w cget -height] \
                 -tags hair -width 1 -fill red
   }
}
proc timeliner::title {w x} {
   variable ""
   wm title . [expr int([$w canvasx $x]/$(-zoom))]
}
proc timeliner::zoom {w x factor} {
   variable ""
   $w scale all 0 0 $factor 1
   set (-zoom) [expr {$(-zoom)*$factor}]
   $w config -scrollregion [$w bbox all]
   if {[llength [$w find withtag hair]]} {
       $w delete hair
       set (x) [$w canvasx $x]
       $w create line $(x) 0 $(x) [$w cget -height] \
                 -tags hair -width 1 -fill red
   }
}

This command adds an object to the canvas. The code for "item" took me some effort, as it had to locate a free "slot" on the canvas, searching top-down:

proc timeliner::add {w type name time args} {
   variable ""
   regexp {(\d+)(-(\d+))?} $time -> from - to
   if {$to eq ""} {set to $from}
   set x0 [expr {$from*$(-zoom)}]
   set x1 [expr {$to*$(-zoom)}]
   switch -- $type {
       era    {set fill yellow; set outline black; set y0 20; set y1 40}
       bgitem {set fill gray; set outline {}; set y0 40; set y1 1024}
       item   {
           set fill orange
           set outline yellow
           for {set y0 60} {$y0<400} {incr y0 20} {
               set y1 [expr {$y0+18}]
               if {[$w find overlap [expr $x0-5] $y0 $x1 $y1] eq ""} break
           }
       }
   }
   set id [$w create rect $x0 $y0 $x1 $y1 -fill $fill -outline $outline]
   if {$type eq "bgitem"} {$w lower $id}
   set x2 [expr {$x0+5}]
   set y2 [expr {$y0+2}]
   set tid [$w create text $x2 $y2 -text $name -anchor nw]
   foreach arg $args {
       if {$arg eq "!"} {
           $w itemconfig $tid -font "[$w itemcget $tid -font] bold"
       }
   }
   $w config -scrollregion [$w bbox all]
}

Here's a sample application, featuring a concise history of music in terms of composers:

scrollbar .x -ori hori -command {.c xview}
pack      .x -side bottom -fill x
canvas    .c -bg white -width 600 -height 300 -xscrollcommand {.x set}
pack      .c -fill both -expand 1
timeliner::create .c -from 1400 -to 2000

These nifty shorthands for adding items make data specification a breeze - compare the original call, and the shorthand:

   timeliner::add .c item Purcell 1659-1695
   - Purcell 1659-1695

With an additional "!" argument you can make the text of an item bold:

foreach {shorthand type} {* era  x bgitem - item} {
   interp alias {} $shorthand {} timeliner::add .c $type
}

Now for the data to display (written pretty readably):

* {Middle Ages} 1400-1450
- Dufay 1400-1474
* Renaissance    1450-1600
- Desprez 1440-1521
- Luther 1483-1546
- {Columbus discovers America} 1492
- Palestrina 1525-1594 !
- Lasso 1532-1594
- Byrd 1543-1623
* Baroque        1600-1750
- Dowland 1563-1626
- Monteverdi 1567-1643
- Schütz 1585-1672
- Purcell 1659-1695
- Telemann 1681-1767
- Rameau 1683-1764
- Bach,J.S. 1685-1750 !
- Händel 1685-1759
x {30-years war} 1618-1648
* {Classic era}  1750-1810
- Haydn 1732-1809 !
- Boccherini 1743-1805
- Mozart 1756-1791 !
- Beethoven 1770-1828 !
* {Romantic era} 1810-1914
- {Mendelssohn Bartholdy} 1809-1847
- Chopin 1810-1849
- Liszt 1811-1886
- Verdi 1813-1901
x {French revolution} 1789-1800
* {Modern era}   1914-2000
- Ravel 1875-1937 !
- Bartók 1881-1945
- Stravinskij 1882-1971
- Varèse 1883-1965
- Prokof'ev 1891-1953
- Milhaud 1892-1974
- Honegger 1892-1955
- Hindemith 1895-1963
- Britten 1913-1976
x WW1 1914-1918
x WW2 1938-1945

Fun with functions

edit

 

My teenage daughter hates math. In order to motivate her, I beefed up an earlier little function plotter which before only took one function, in strict Tcl (expr) notation, from the command line. Now there's an entry widget, and the accepted language has also been enriched: beyond exprs rules, you can omit dollar and multiplication signs, like 2x+1, powers can be written as x3 instead of ($x*$x*$x); in simple cases you can omit parens round function arguments, like sin x2. Hitting <Return> in the entry widget displays the function's graph.

If you need some ideas, click on the "?" button to cycle through a set of demo functions, from boring to bizarre (e.g. if rand() is used). Besides default scaling, you can zoom in or out. Moving the mouse pointer over the canvas displays x and y coordinates, and the display changes to white if you're on a point on the curve.

The target was not reached: my daughter still hates math. But at least I had hours of Tcl (and function) fun again, surfing in the Cartesian plane... hope you enjoy it too!

proc main {} {
   canvas .c -bg white -borderwidth 0
   bind   .c <Motion> {displayXY .info %x %y}
   frame  .f
     label  .f.1 -text "f(x) = "
     entry  .f.f -textvar ::function -width 40
       bind .f.f <Return> {plotf .c $::function}
     button .f.demo -text " ? " -pady 0 -command {demo .c}
     label  .f.2 -text " Zoom: "
     entry  .f.fac -textvar ::factor -width 4
       set                  ::factor 32
       bind .f.fac <Return>               {zoom .c 1.0}
     button .f.plus  -text " + " -pady 0 -command {zoom .c 2.0}
     button .f.minus -text " - " -pady 0 -command {zoom .c 0.5}
     eval pack [winfo children .f] -side left -fill both
   label  .info -textvar ::info -just left
   pack .info .f -fill x -side bottom
   pack .c -fill both -expand 1
   demo .c
}
set ::demos {
       "cos x3" 2 1-x 0.5x2 x3/5 "sin x" "sin x2" 1/x sqrt(x)
       "tan x/5" x+1/x x abs(x) "exp x" "log x" "log x2"
       round(x) "int x%2" "x-int x" "0.2tan x+1/tan x" x*(rand()-0.5)
       x2/5-1/(2x) "atan x" sqrt(1-x2) "abs(x-int(x*2))" (x-1)/(x+1)
       "sin x-tan x" "sin x-tan x2" "x-abs(int x)" 0.5x-1/x
       -0.5x3+x2+x-1 3*sin(2x) -0.05x4-0.2x3+1.5x2+2x-3 "9%int x"
       0.5x2/(x3-3x2+4) "abs x2-3 int x" "int x%3"
}
proc displayXY {w cx cy} {
       set x [expr {double($cx-$::dx)/$::factor}]
       set y [expr {double(-$cy+$::dy)/$::factor}]
       set ::info [format "x=%.2f y=%.2f" $x $y]
       catch {
       $w config -fg [expr {abs([expr $::fun]-$y)<0.01?"white":"black"}]
       } ;# may divide by zero, or other illegal things
}
proc zoom {w howmuch} {
   set ::factor [expr round($::factor*$howmuch)]
   plotf $w $::function
}
proc plotf {w function} {
   foreach {re subst} {
       {([a-z]) +(x[0-9]?)} {\1(\2)}   " " ""   {([0-9])([a-z])} {\1*\2}
       x2 x*x   x3 x*x*x    x4 x*x*x*x   x \$x   {e\$xp} exp
   } {regsub -all $re $function $subst function}
   set ::fun $function
   set ::info "Tcl: expr $::fun"
   set color [lpick {red blue purple brown green}]
   plotline $w [fun2points $::fun] -fill $color
}
proc lpick L {lindex $L [expr {int(rand()*[llength $L])}]}
proc fun2points {fun args} {
   array set opt {-from -10.0 -to 10.0 -step .01}
   array set opt $args
   set res "{"
   for {set x $opt(-from)} {$x<= $opt(-to)} {set x [expr {$x+$opt(-step)}]} {
       if {![catch {expr $fun} y]} {
           if {[info exists lasty] && abs($y-$lasty)>100} {
               append res "\} \{" ;# incontinuity
           }
           append res " $x $y"
           set lasty $y
       } else {append res "\} \{"}
   }
   append res "}"
}
proc plotline {w points args} {
   $w delete all
   foreach i $points {
       if {[llength $i]>2} {eval $w create line $i $args -tags f}
   }
   set fac $::factor
   $w scale all 0 0 $fac -$fac
   $w create line -10000 0 10000 0      ;# X axis
   $w create line 0 -10000 0 10000      ;# Y axis
   $w create line $fac 0     $fac -3    ;# x=1 tick
   $w create line -3   -$fac 0    -$fac ;# y=1 tick
   set ::dx [expr {[$w cget -width]/2}]
   set ::dy [expr {[$w cget -height]/2}]
   $w move all $::dx $::dy
   $w raise f
}
proc demo {w} {
   set ::function [lindex $::demos 0] ;# cycle through...
   set ::demos [concat [lrange $::demos 1 end] [list $::function]]
   set ::factor 32
   plotf $w $::function
}
main

Functional imaging

edit

In Conal Elliott's Pan project ("Functional Image Synthesis", [1]), images (of arbitrary size and resolution) are produced and manipulated in an elegant functional way. Functions written in Haskell (see Playing Haskell) are applied, mostly in functional composition, to pixels to return their color value. FAQ: "Can we have that in Tcl too?"

 

As the funimj demo below shows, in principle yes; but it takes some patience (or a very fast CPU) - for a 200x200 image the function is called 40000 times, which takes 9..48 seconds on my P200 box. Still, the output often is worth waiting for... and the time used to write this code was negligible, as the Haskell original could with few modifications be represented in Tcl. Functional composition had to be rewritten to Tcl's Polish notation - Haskell's

foo 1 o bar 2 o grill

(where "o" is the composition operator) would in Tcl look like

o {foo 1} {bar 2} grill

As the example shows, additional arguments can be specified; only the last argument is passed through the generated "function nest":

proc f {x} {foo 1 [bar 2 [grill $x]]}

But the name of the generated function is much nicer than "f": namely, the complete call to "o" is used, so the example proc has the name

"o {foo 1} {bar 2} grill"

which is pretty self-documenting ;-) I implemented "o" like this:

proc o args {
   # combine the functions in args, return the created name
   set name [info level 0]
   set body "[join $args " \["] \$x"
   append body [string repeat \] [expr {[llength $args]-1}]]
   proc $name x $body
   set name
}
# Now for the rendering framework:
proc fim {f {zoom 100} {width 200} {height -}} {
   # produce a photo image by applying function f to pixels
   if {$height=="-"} {set height $width}
   set im [image create photo -height $height -width $width]
   set data {}
   set xs {}
   for {set j 0} {$j<$width} {incr j} {
       lappend xs [expr {($j-$width/2.)/$zoom}]
   }
   for {set i 0} {$i<$height} {incr i} {
       set row {}
       set y [expr {($i-$height/2.)/$zoom}]
       foreach x $xs {
           lappend row [$f [list $x $y]]
       }
       lappend data $row
   }
   $im put $data
   set im
}

Basic imaging functions ("drawers") have the common functionality point -> color, where point is a pair {x y} (or, after applying a polar transform, {r a}...) and color is a Tk color name, like "green" or #010203:

proc  vstrip p {
   # a simple vertical bar
   b2c [expr {abs([lindex $p 0]) < 0.5}]
}
proc udisk p {
   # unit circle with radius 1
   foreach {x y} $p break
   b2c [expr {hypot($x,$y) < 1}]
}
proc xor {f1 f2 p} {
   lappend f1 $p; lappend f2 $p
   b2c [expr {[eval $f1] != [eval $f2]}]
}
proc and {f1 f2 p} {
   lappend f1 $p; lappend f2 $p
   b2c [expr {[eval $f1] == "#000" && [eval $f2] == "#000"}]
}
proc checker p {
   # black and white checkerboard
   foreach {x y} $p break
   b2c [expr {int(floor($x)+floor($y)) % 2 == 0}]
}
proc gChecker p {
   # greylevels correspond to fractional part of x,y
   foreach {x y} $p break
   g2c [expr {(fmod(abs($x),1.)*fmod(abs($y),1.))}]
}
proc bRings p {
   # binary concentric rings
   foreach {x y} $p break
   b2c [expr {round(hypot($x,$y)) % 2 == 0}]
}
proc gRings p {
   # grayscale concentric rings
   foreach {x y} $p break
   g2c [expr {(1 + cos(3.14159265359 * hypot($x,$y))) / 2.}]
}
proc radReg {n p} {
   # n wedge slices starting at (0,0)
   foreach {r a} [toPolars $p] break
   b2c [expr {int(floor($a*$n/3.14159265359))%2 == 0}]
}
proc xPos p {b2c [expr {[lindex $p 0]>0}]}
proc cGrad p {
   # color gradients - best watched at zoom=100
   foreach {x y} $p break
   if {abs($x)>1.} {set x 1.}
   if {abs($y)>1.} {set y 1.}
   set r [expr {int((1.-abs($x))*255.)}]
   set g [expr {int((sqrt(2.)-hypot($x,$y))*180.)}]
   set b [expr {int((1.-abs($y))*255.)}]
   c2c $r $g $b
}

Beyond the examples in Conal Elliott's paper, I found out that function imaging can also be abused for a (slow and imprecise) function plotter, which displays the graph for y = f(x) if you call it with $y + f($x) as first argument:

proc fplot {expr p} {
   foreach {x y} $p break
   b2c [expr abs($expr)<=0.04] ;# double eval required here!
}

Here is a combinator for two binary images that shows in different colors for which point both or either are "true" - nice but slow:}

proc bin2 {f1 f2 p} {
   set a [eval $f1 [list $p]]
   set b [eval $f2 [list $p]]
   expr {
       $a == "#000" ?
           $b == "#000" ? "green"
           : "yellow"
       : $b == "#000" ? "blue"
       : "black"
   }
}
#--------------------------------------- Pixel converters:
proc g2c {greylevel} {
   # convert 0..1 to #000000..#FFFFFF
   set hex [format %02X [expr {round($greylevel*255)}]]
   return #$hex$hex$hex
}
proc b2c {binpixel} {
   # 0 -> white, 1 -> black
   expr {$binpixel? "#000" : "#FFF"}
}
proc c2c {r g b} {
   # make Tk color name: {0 128 255} -> #0080FF
   format #%02X%02X%02X $r $g $b
}
proc bPaint {color0 color1 pixel} {
   # convert a binary pixel to one of two specified colors
   expr {$pixel=="#000"? $color0 : $color1}
}

This painter colors a grayscale image in hues of the given color. It normalizes the given color through dividing by the corresponding values for "white", but appears pretty slow too:

proc gPaint {color pixel} {
   set abspixel [lindex [rgb $pixel] 0]
   set rgb [rgb $color]
   set rgbw [rgb white]
   foreach var {r g b} in $rgb ref $rgbw {
       set $var [expr {round(double($abspixel)*$in/$ref/$ref*255.)}]
   }
   c2c $r $g $b
}

This proc caches the results of [winfo rgb] calls, because these are quite expensive, especially on remote X displays - rmax

proc rgb {color} {
   upvar "#0" rgb($color) rgb
   if {![info exists rgb]} {set rgb [winfo rgb . $color]}
   set rgb
}
#------------------------------ point -> point transformers
proc fromPolars p {
   foreach {r a} $p break
   list [expr {$r*cos($a)}] [expr {$r*sin($a)}]
}
proc toPolars p {
   foreach {x y} $p break
   # for Sun, we have to make sure atan2 gets no two 0's
   list [expr {hypot($x,$y)}] [expr {$x||$y? atan2($y,$x): 0}]
}
proc radInvert p {
   foreach {r a} [toPolars $p] break
   fromPolars [list [expr {$r? 1/$r: 9999999}] $a]
}
proc rippleRad {n s p} {
   foreach {r a} [toPolars $p] break
   fromPolars [list [expr {$r*(1.+$s*sin($n*$a))}] $a]
}
proc slice {n p} {
   foreach {r a} $p break
   list $r [expr {$a*$n/3.14159265359}]
}
proc rotate {angle p} {
   foreach {x y} $p break
   set x1 [expr {$x*cos(-$angle) - $y*sin(-$angle)}]
   set y1 [expr {$y*cos(-$angle) + $x*sin(-$angle)}]
   list $x1 $y1
}
proc swirl {radius p} {
   foreach {x y} $p break
   set angle [expr {hypot($x,$y)*6.283185306/$radius}]
   rotate $angle $p
}

Now comes the demo program. It shows the predefined basic image operators, and some combinations, on a button bar. Click on one, have some patience, and the corresponding image will be displayed on the canvas to the right. You can also experiment with image operators in the entry widget at bottom - hit <Return> to try. The text of sample buttons is also copied to the entry widget, so you can play with the parameters, or rewrite it as you wish. Note that a well-formed funimj composition consists of:

  • the composition operator "o"
  • zero or more "painters" (color -> color)
  • one "drawer" (point -> color)
  • zero or more "transformers" (point -> point)

}

proc fim'show {c f} {
   $c delete all
   set ::try $f ;# prepare for editing
   set t0 [clock seconds]
   . config -cursor watch
   update ;# to make the cursor visible
   $c create image 0 0 -anchor nw -image [fim $f $::zoom]
   wm title . "$f: [expr [clock seconds]-$t0] seconds"
   . config -cursor {}
}
 proc fim'try {c varName} {
   upvar #0 $varName var
   $c delete all
   if [catch {fim'show $c [eval $var]}] {
       $c create text 10 10 -anchor nw -text $::errorInfo
   }
}

Composed functions need only be mentioned once, which creates them, and they can later be picked up by info procs. The o looks nicely bullet-ish here..

o bRings
o cGrad
o checker
o gRings
o vstrip
o xPos
o {bPaint brown beige} checker
o checker {slice 10} toPolars
o checker {rotate 0.1}
o vstrip {swirl 1.5}
o checker {swirl 16}
o {fplot {$y + exp($x)}}
o checker radInvert
o gRings {rippleRad 8 0.3}
o xPos {swirl .75}
o gChecker
o {gPaint red} gRings
o {bin2 {radReg 7} udisk}
#----------------------------------------------- testing
frame .f2
set c [canvas .f2.c]
set e [entry .f2.e -bg white -textvar try]
bind $e <Return> [list fim'try $c ::try]
scale .f2.s -from 1 -to 100 -variable zoom -ori hori -width 6
#--------------------------------- button bar:
frame .f
set n 0
foreach imf [lsort [info procs "o *"]] {
   button .f.b[incr n] -text $imf -anchor w -pady 0 \
       -command [list fim'show $c $imf]
}
set ::zoom 25
eval pack [winfo children .f] -side top -fill x -ipady 0
eval pack [winfo children .f2] -side top -fill x
pack .f .f2 -side left -anchor n
bind . <Escape> {exec wish $argv0 &; exit} ;# dev helper
bind . ? {console show} ;# dev helper, Win/Mac only

TkPhotoLab

edit

The following code can be used for experiments in image processing, including

  • convolutions (see below)
  • conversion from color to greylevel
  • conversion from greylevel to faux color
  • brightness and contrast modification

Tcl is not the fastest in heavy number-crunching, as needed when going over many thousands of pixels, but I wouldn't consider C for a fun project ;) So take your time, or get a real CPU. At least you can watch the progress, as the target image is updated after every row.

File:TkPhotoLab.jpg

Edge enhancement by Laplace5 filter

The demo UI shows two images, the original on the left, the processing result on the right. You can push the result to the left with Options/Accept. See the menus for what goodies I have supplied. But what most interested me were "convolutions", for which you can edit the matrix (fixed at 3x3 - slow enough..) and click "Apply" to run it over the input image. "C" to set the matrix to all zeroes.

Convolution is a technique where a target pixel is colored according to the sum of the product of a given matrix and its neighbors. As an example, the convolution matrix

1 1 1
1 1 1
1 1 1

colors the pixel in the middle with the average of itself and its eight neighbors, which will myopically blur the picture.

0 0 0
0 1 0
0 0 0

should just faithfully repeat the input picture. These

0  -1  0       -1 -1 -1
-1  5 -1  or:  -1  9 -1
0  -1  0       -1 -1 -1

enhance {horizont,vertic}al edges, and make the image look "crispier". }

proc convolute {inimg outimg matrix} {
   set w [image width  $inimg]
   set h [image height $inimg]
   set matrix [normalize $matrix]
   set shift  [expr {[matsum $matrix]==0? 128: 0}]
   set imat [photo2matrix $inimg]
   for {set i 1} {$i<$h-1} {incr i} {
       set row {}
       for {set j 1} {$j<$w-1} {incr j} {
          foreach var {rsum gsum bsum} {set $var 0.0}
          set y [expr {$i-1}]
          foreach k {0 1 2} {
             set x [expr {$j-1}]
             foreach l {0 1 2} {
                if {[set fac [lindex $matrix $k $l]]} {
                    foreach {r g b} [lindex $imat $y $x] {}
                    set rsum [expr {$rsum + $r * $fac}]
                    set gsum [expr {$gsum + $g * $fac}]
                    set bsum [expr {$bsum + $b * $fac}]
                }
                incr x
             }
             incr y
           }
           if {$shift} {
               set rsum [expr {$rsum + $shift}]
               set gsum [expr {$gsum + $shift}]
               set bsum [expr {$bsum + $shift}]
           }
           lappend row [rgb [clip $rsum] [clip $gsum] [clip $bsum]]
       }
       $outimg put [list $row] -to 1 $i
       update idletasks
   }
}
proc alias {name args} {eval [linsert $args 0 interp alias {} $name {}]}
alias rgb   format #%02x%02x%02x
proc lambda {argl body} {K [set n [info level 0]] [proc $n $argl $body]}
proc K      {a b} {set a}
proc clip   x {expr {$x>255? 255: $x<0? 0: int($x)}}
proc photo2matrix image {
   set w [image width  $image]
   set h [image height $image]
   set res {}
   for {set y 0} {$y<$h} {incr y} {
       set row {}
       for {set x 0} {$x<$w} {incr x} {
           lappend row [$image get $x $y]
       }
       lappend res $row
   }
   set res
}
proc normalize matrix {
    #-- make sure all matrix elements add up to 1.0
    set sum [matsum $matrix]
    if {$sum==0} {return $matrix} ;# no-op on zero sum
    set res {}
    foreach inrow $matrix {
        set row {}
        foreach el $inrow {lappend row [expr {1.0*$el/$sum}]}
        lappend res $row
    }
    set res
}
proc matsum matrix {expr [join [join $matrix] +]}

The following routines could also be generified into one:

proc color2gray image {
   set w [image width  $image]
   set h [image height $image]
   for {set i 0} {$i<$h} {incr i} {
       set row {}
       for {set j 0} {$j<$w} {incr j} {
           foreach {r g b} [$image get $j $i] break
           set y [expr {int(0.299*$r + 0.587*$g + 0.114*$b)}]
           lappend row [rgb $y $y $y]
       }
       $image put [list $row] -to 0 $i
       update idletasks
   }
}
proc color2gray2 image {
   set i -1
   foreach inrow [photo2matrix $image] {
       set row {}
       foreach pixel $inrow {
           foreach {r g b} $pixel break
           set y [expr {int(($r + $g + $b)/3.)}]
           lappend row [rgb $y $y $y]
       }
       $image put [list $row] -to 0 [incr i]
       update idletasks
   }
}

An experiment in classifying graylevels into unreal colors:

proc gray2color image {
   set i -1
   set colors {black darkblue blue purple red orange yellow white}
   set n [llength $colors]
   foreach inrow [photo2matrix $image] {
       set row {}
       foreach pixel $inrow {
           set index [expr {[lindex $pixel 0]*$n/256}]
           lappend row [lindex $colors $index]
       }
       $image put [list $row] -to 0 [incr i]
       update idletasks
   }
}
proc grayWedge image {
   $image blank
   for {set i 0} {$i<256} {incr i} {
       $image put [rgb $i $i $i] -to $i 0 [expr {$i+1}] 127
   }
}

A number of algorithms are very similar, distinguished only by a few commands in the center. Hence I made them generic, and they take a function name that is applied to every pixel rgb, resp. a pair of pixel rgb's. They are instantiated by an alias that sets the function fancily as a lambda:

proc generic_1 {f target source} {
   set w [image width  $source]
   set h [image height $source]
   for {set i 0} {$i<$h} {incr i} {
       set row {}
       for {set j 0} {$j<$w} {incr j} {
           foreach {r g b} [$source get $j $i] break
           lappend row [rgb [$f $r] [$f $g] [$f $b]]
       }
       $target put [list $row] -to 0 $i
       update idletasks
   }
}
alias invert    generic_1 [lambda x {expr {255-$x}}]
alias contrast+ generic_1 [lambda x {clip [expr {128+($x-128)*1.25}]}]
alias contrast- generic_1 [lambda x {clip [expr {128+($x-128)*0.8}]}]
proc generic_2 {f target with} {
   set w [image width  $target]
   set h [image height $target]
   for {set i 0} {$i<$h} {incr i} {
       set row {}
       for {set j 0} {$j<$w} {incr j} {
           foreach {r g b} [$target get $j $i] break
           foreach {r1 g1 b1} [$with get $j $i] break
           lappend row [rgb [$f $r $r1] [$f $g $g1] [$f $b $b1]]
       }
       $target put [list $row] -to 0 $i
       update idletasks
   }
}
alias blend      generic_2 [lambda {a b} {expr {($a+$b)/2}}]
alias difference generic_2 [lambda {a b} {expr {255-abs($a-$b)}}]

A histogram is a count of which color value occurred how often in the current image, separately for red, green and blue. For graylevel images, the displayed "curves" should exactly overlap, so you see only the blue dots that are drawn last.

proc histogram {image {channel 0}} {
   set w [image width  $image]
   set h [image height $image]
   for {set i 0} {$i<256} {incr i} {set hist($i) 0}
   for {set i 0} {$i<$h} {incr i} {
       for {set j 0} {$j<$w} {incr j} {
           incr hist([lindex [$image get $j $i] $channel])
       }
   }
   set res {}
   for {set i 0} {$i<256} {incr i} {lappend res $hist($i)}
   set res
}
proc drawHistogram {target input} {
   $target blank
   set a [expr {6000./([image height $input]*[image width $input])}]
   foreach color {red green blue} channel {0 1 2} {
       set i -1
       foreach val [histogram $input $channel] {
           $target put $color -to [incr i] \
               [clip [expr {int(128-$val*$a)}]]
       }
       update idletasks
   }
}

Demo UI:

if {[file tail [info script]] eq [file tail $argv0]} {
   package require Img ;# for JPEG etc.
   proc setFilter {w matrix} {
       $w delete 1.0 end
       foreach row $matrix {$w insert end [join $row \t]\n}
       set ::info "Click 'Apply' to use this filter"
   }
   label .title -text TkPhotoLab -font {Helvetica 14 italic} -fg blue
   label .( -text ( -font {Courier 32}
   set txt [text .t -width 20 -height 3]
   setFilter .t {{0 -1 0} {-1 5 -1} {0 -1 0}}
   label .) -text ) -font {Courier 32}
   button .c -text C -command {setFilter .t {{0 0 0} {0 0 0} {0 0 0}}}
   grid .title .( .t .) .c -sticky news
   button .apply -text Apply -command applyConv
   grid x ^ ^ ^ .apply -sticky ew
   grid [label .0 -textvar info] - - -sticky w
   grid [label .1] - [label .2] - - -sticky new
   proc loadImg { {fn ""}} {
       if {$fn==""} {set fn [tk_getOpenFile]}
       if {$fn != ""} {
           cd [file dirname [file join [pwd] $fn]]
           set ::im1 [image create photo -file $fn]
           .1 config -image $::im1
           set ::im2 [image create photo]
           .2 config -image $::im2
           $::im2 copy $::im1 -shrink
           set ::info "Loaded image 1 from $fn"
       }
   }
   proc saveImg { {fn ""}} {
       if {$fn==""} {set fn [tk_getSaveFile]}
       if {$fn != ""} {
           $::im2 write $fn -format JPEG
           set ::info "Saved image 2 to $fn"
       }
   }
   proc applyConv {} {
       set ::info "Convolution running, have patience..."
       set t0 [clock clicks -milliseconds]
       convolute $::im1 $::im2 [split [$::txt get 1.0 end] \n]
       set dt [expr {([clock click -milliseconds]-$t0)/1000.}]
       set ::info "Ready after $dt sec"
   }

A little wrapper for simplified menu creation - see below for its use:

   proc m+ {head name {cmd ""}} {
       if {![winfo exists .m.m$head]} {
           .m add cascade -label $head -menu [menu .m.m$head -tearoff 0]
       }
       if [regexp ^-+$ $name] {
           .m.m$head add separator
       } else {.m.m$head add command -label $name -comm $cmd}
   }
   . config -menu [menu .m]
   m+ File Open.. loadImg
   m+ File Save.. saveImg
   m+ File ---
   m+ File Exit   exit
   m+ Edit Blend      {blend $im2 $im1}
   m+ Edit Difference {difference $im2 $im1}
   m+ Edit ---
   m+ Edit Negative   {invert     $im2 $im1}
   m+ Edit Contrast+  {contrast+  $im2 $im1}
   m+ Edit Contrast-  {contrast-  $im2 $im1}
   m+ Edit ---
   m+ Edit Graylevel  {$im2 copy $im1 -shrink; color2gray  $im2}
   m+ Edit Graylevel2 {$im2 copy $im1 -shrink; color2gray2 $im2}
   m+ Edit "Add Noise" {
       generic_1 [lambda x {expr {rand()<.01? int(rand()*255):$x}}] $im2 $im1
   }
   m+ Edit gray2color {$im2 copy $im1 -shrink; gray2color $im2}
   m+ Edit Octary {generic_1 [lambda x {expr {$x>127? 255:0}}] $im2 $im1}
   m+ Edit ---
   m+ Edit HoriMirror {$im2 copy $im1 -shrink -subsample -1 1}
   m+ Edit VertMirror {$im2 copy $im1 -shrink -subsample 1 -1}
   m+ Edit "Upside down" {$im2 copy $im1 -shrink -subsample -1 -1}
   m+ Edit ---
   m+ Edit "Zoom x 2" {$im2 copy $im1 -shrink -zoom 2}
   m+ Edit "Zoom x 3" {$im2 copy $im1 -shrink -zoom 3}
   m+ Options "Accept (1<-2)" {$im1 copy $im2 -shrink}
   m+ Options ---
   m+ Options "Gray wedge" {grayWedge $im2}
   m+ Options Histogram  {drawHistogram $im2 $im1}
   m+ Filter Clear {setFilter .t {{0 0 0} {0 0 0} {0 0 0}}}
   m+ Filter ---
   m+ Filter Blur0  {setFilter .t {{1 1 1} {1 0 1} {1 1 1}}}
   m+ Filter Blur1  {setFilter .t {{1 1 1} {1 1 1} {1 1 1}}}
   m+ Filter Gauss2 {setFilter .t {{1 2 1} {2 4 2} {1 2 1}}}
   m+ Filter ---
   m+ Filter Laplace5 {setFilter .t {{0 -1 0} {-1 5 -1} {0 -1 0}}}
   m+ Filter Laplace9 {setFilter .t {{-1 -1 -1} {-1 9 -1} {-1 -1 -1}}}
   m+ Filter LaplaceX {setFilter .t {{1 -2 1} {-2 5 -2} {1 -2 1}}}
   m+ Filter ---
   m+ Filter Emboss   {setFilter .t {{2 0 0} {0 -1 0} {0 0 -1}}}
   m+ Filter HoriEdge {setFilter .t {{-1 -1 -1} {0 0 0} {1 1 1}}}
   m+ Filter VertEdge {setFilter .t {{-1 0 1} {-1 0 1} {-1 0 1}}}
   m+ Filter SobelH   {setFilter .t {{1 2 1} {0 0 0} {-1 -2 -1}}}
   m+ Filter SobelV   {setFilter .t {{1 0 -1} {2 0 -2} {1 0 -1}}}
   bind . <Escape> {exec wish $argv0 &; exit}
   bind . <F1> {console show}
   loadImg aaa.jpg
}

Appendix

edit

Resources

edit

For Web development, there are:

  • pure-Tcl Web servers like Tclhttpd, which can be embedded in your Tcl-enabled applications to provide Web-based interfaces,
  • Tcl-based Web servers like AOLServer, which are much faster than their pure-Tcl cousins and able to support a heavy load, and
  • Tcl Web server modules (see Apache Tcl for examples) that enable Tcl Web applications with bog-standard Web servers.

ActiveState Community License

edit

This is the license of the "Batteries-included" ActiveTcl distribution. Note that you cannot redistribute ActiveTcl "outside your organization" without written permission. The part between "AGREEMENT" and "Definitions:" is the more free, BSD-style license of Tcl and Tk itself.

Preamble:

The intent of this document is to state the conditions under which the Package (ActiveTcl) may be copied and distributed, such that ActiveState maintains control over the development and distribution of the Package, while allowing the users of the Package to use the Package in a variety of ways.

The Package may contain software covered by other licenses:

TCL LICENSE AGREEMENT

This software is copyrighted by the Regents of the University of California, Sun Microsystems, Inc., Scriptics Corporation, and other parties. The following terms apply to all files associated with the software unless explicitly disclaimed in individual files.

The authors hereby grant permission to use, copy, modify, distribute, and license this software and its documentation for any purpose, provided that existing copyright notices are retained in all copies and that this notice is included verbatim in any distributions. No written agreement, license, or royalty fee is required for any of the authorized uses. Modifications to this software may be copyrighted by their authors and need not follow the licensing terms described here, provided that the new terms are clearly indicated on the first page of each file where they apply.

IN NO EVENT SHALL THE AUTHORS OR DISTRIBUTORS BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OF THIS SOFTWARE, ITS DOCUMENTATION, OR ANY DERIVATIVES THEREOF, EVEN IF THE AUTHORS HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

THE AUTHORS AND DISTRIBUTORS SPECIFICALLY DISCLAIM ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT. THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, AND THE AUTHORS AND DISTRIBUTORS HAVE NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.

GOVERNMENT USE: If you are acquiring this software on behalf of the U.S. government, the Government shall have only "Restricted Rights" in the software and related documentation as defined in the Federal Acquisition Regulations (FARs) in Clause 52.227.19 (c) (2). If you are acquiring the software on behalf of the Department of Defense, the software shall be classified as "Commercial Computer Software" and the Government shall have only "Restricted Rights" as defined in Clause 252.227-7013 (c) (1) of DFARs. Notwithstanding the foregoing, the authors grant the U.S. Government and others acting in its behalf permission to use and distribute the software in accordance with the terms specified in this license.

Definitions:

"ActiveState" refers to ActiveState Corp., the Copyright Holder of the Package.

"Package" refers to those files, including, but not limited to, source code, binary executables, images, and scripts, which are distributed by the Copyright Holder.

"You" is you, if you are thinking about copying or distributing this Package.

Terms:

1. You may use this Package for commercial or non-commercial purposes without charge.

2. You may make and give away verbatim copies of this Package for personal use, or for use within your organization, provided that you duplicate all of the original copyright notices and associated disclaimers. You may not distribute copies of this Package, or copies of packages derived from this Package, to others outside your organization without specific prior written permission from ActiveState (although you are encouraged to direct them to sources from which they may obtain it for themselves).

3. You may apply bug fixes, portability fixes, and other modifications derived from ActiveState. A Package modified in such a way shall still be covered by the terms of this license.

4. ActiveState's name and trademarks may not be used to endorse or promote packages derived from this Package without specific prior written permission from ActiveState.

5. THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

ActiveState Community License Copyright (C) 2001-2003 ActiveState Corp. All rights reserved.