Tcl Programming/Examples

Most of these example scripts first appeared in the Tclers' Wiki http://wiki.tcl.tk . The author (Richard Suchenwirth) declares them to be fully in the public domain. The following scripts are plain Tcl, they don't use the Tk GUI toolkit (there's a separate chapter for those).

Sets as lists

Tcl's lists are well suited to represent sets. Here's typical set operations. If you use the tiny testing framework explained earlier, the e.g. lines make the self-test; otherwise they just illustrate how the operations should work.

proc set'contains {set el} {expr {[lsearch -exact $set $el]>=0}}

e.g. {set'contains {A B C} A} -> 1
e.g. {set'contains {A B C} D} -> 0

proc set'add {_set args} {
   upvar 1 $_set set
   foreach el $args {
       if {![set'contains $set $el]} {lappend set $el}
   }
   set set
}

set example {1 2 3}
e.g. {set'add example 4} -> {1 2 3 4}
e.g. {set'add example 4} -> {1 2 3 4}

proc set'remove {_set args} {
   upvar 1 $_set set
   foreach el $args {
       set pos [lsearch -exact $set $el]
       set set [lreplace $set $pos $pos]
   }
   set set
}

e.g. {set'remove example 3} -> {1 2 4}

proc set'intersection {a b} {
   foreach el $a {set arr($el) ""}
   set res {}
   foreach el $b {if {[info exists arr($el)]} {lappend res $el}}
   set res

e.g. {set'intersection {1 2 3 4} {2 4 6 8}} -> {2 4}

proc set'union {a b} {
   foreach el $a {set arr($el) ""}
   foreach el $b {set arr($el) ""}
   lsort [array names arr]
}

e.g. {set'union {1 3 5 7} {2 4 6 8}} -> {1 2 3 4 5 6 7 8}

proc set'difference {a b} {
   eval set'remove a $b
}

e.g. {set'difference {1 2 3 4 5} {2 4 6}} -> {1 3 5}

Hex-dumping a file

The following example code opens a file, configures it to binary translation (i.e. line-ends \r\n are not standardized to \n as usual in C), and prints as many lines as needed which each contain 16 bytes in hexadecimal notation, plus, where possible, the ASCII character.

proc file'hexdump filename {
   set fp [open $filename]
   fconfigure $fp -translation binary
   set n 0
   while {![eof $fp]} {
       set bytes [read $fp 16]
       regsub -all {[^\x20-\xfe]} $bytes . ascii
       puts [format "%04X %-48s %-16s" $n [hexdump $bytes] $ascii]
       incr n 16
   }
   close $fp
}

proc hexdump string {
   binary scan $string H* hex
   regexp -all -inline .. $hex
}

The "main routine" is a single line that dumps all files given on the command line:

foreach file $argv {file'hexdump $file}

Sample output, the script applied to itself:

...> tclsh hexdump.tcl hexdump.tcl
0000 0d 0a 20 70 72 6f 63 20 66 69 6c 65 27 68 65 78  .. proc file'hex
0010 64 75 6d 70 20 66 69 6c 65 6e 61 6d 65 20 7b 0d  dump filename {.
0020 0a 20 20 20 20 73 65 74 20 66 70 20 5b 6f 70 65  .    set fp [ope
0030 6e 20 24 66 69 6c 65 6e 61 6d 65 5d 0d 0a 20 20  n $filename]..
...

Roman numerals

Roman numerals are an additive (and partially subtractive) system with the following letter values:

I=1 V=5 X=10 L=50 C=100 D=500 M=1000; MCMXCIX = 1999

Here's some Tcl routines for dealing with Roman numerals.

Sorting roman numerals: I,V,X already come in the right order; for the others we have to introduce temporary collation transformations, which we'll undo right after sorting:

proc roman:sort list {
   set map {IX VIIII L Y XC YXXXX C Z D {\^} ZM {\^ZZZZ} M _}
   foreach {from to} $map {
       regsub -all $from $list $to list
   }
   set list [lsort $list]
   foreach {from to} [lrevert $map] {
       regsub -all $from $list $to list
   }
   set list
}

Roman numerals from integer:

proc roman:numeral {i} {
       set res ""
       foreach {value roman} {
           1000 M 900 CM 500 D 400 CD 100 C 90 XC 50 L 40 XL 
           10 X 9 IX 5 V 4 IV 1 I} {
               while {$i>=$value} {
                       append res $roman
                       incr i -$value
               }
       }
       set res
}

Roman numerals parsed into integer:

proc roman:get {s} {
       array set r_v {M 1000 D 500 C 100 L 50 X 10 V 5 I 1}
       set last 99999; set res 0
       foreach i [split [string toupper $s] ""] {
               if [catch {set val $r_v($i)}] {
                   error "un-Roman digit $i in $s"
               }
               incr res $val
               if {$val>$last} {incr res [expr -2*$last]}
               set last $val
       }
       set res
}

Custom control structures

As "control structures" are really nothing special in Tcl, just a set of commands, it is easier than in most other languages to create one's own. For instance, if you would like to simplify the for loop

for {set i 0} {$i < $max} {incr i} {...}

for the typical simple cases so you can write instead

loop i 0 $max {...}

here is an implementation that even returns a list of the results of each iteration:

proc loop {_var from to body} {
   upvar 1 $_var var
   set res {}
   for {set var $from} {$var < $to} {incr var} {lappend res [uplevel 1 $body]}
   return $res
 }

using this, a string reverse function can be had as a one-liner:

proc sreverse {str} {
   join [loop i 0 [string length $str] {string index $str end-$i}] ""
}

Range-aware switch

Another example is the following range-aware switch variation. A range (numeric or strings) can be given as from..to, and the associated scriptlet gets executed if the tested value lies inside that range.

Like in switch, fall-through collapsing of several cases is indicated by "-", and "default" as final condition fires if none else did. Different from switch, numbers are compared by numeric value, no matter whether given as decimal, octal or hex.

proc rswitch {value body} {
  set go 0
  foreach {cond script} $body {
     if {[regexp {(.+)\.\.(.+)} $cond -> from to]} {
          if {$value >= $from && $value <= $to} {incr go}
     } else {
         if {$value == $cond} {incr go}
     }
     if {$go && $script ne "-"} { #(2)
         uplevel 1 $script
         break
     }
  }
  if {$cond eq "default" && !$go} {uplevel 1 $script} ;#(1)
}

Testing:

% foreach i {A K c z 0 7} {
     puts $i
     rswitch $i {
        A..Z {puts upper} 
        a..z {puts lower} 
        0..9 {puts digit}
     }
}
A
upper
K
upper
c
lower
z
lower
0
digit
7
digit
% rswitch 0x2A {42 {puts magic} default {puts df}}
magic

The K combinator

A very simple control structure (one might also call it a result dispatcher) is the K combinator, which is almost terribly simple:

proc K {a b} {return $a}

It can be used in all situations where you want to deliver a result that is not the last. For instance, reading a file in one go:

proc readfile filename {
   set f [open $filename]
   set data [read $f]
   close $f
   return $data
}

can be simplified, without need for the data variable, to:

proc readfile filename {
   K [read [set f [open $filename]]] [close $f]
}

Another example, popping a stack:

proc pop _stack {
   upvar 1 $_stack stack
   K [lindex $stack end] [set stack [lrange $stack 0 end-1]]
}

This is in some ways similar to LISP's PROG1 construct: evaluate the contained expressions, and return the result of the first one.

Rational numbers

Rational numbers, a.k.a. fractions, can be thought of as pairs of integers {numerator denominator}, such that their "real" numerical value is numerator/denominator (and not in integer nor "double" division!). They can be more precise than any "float" or "double" numbers on computers, as those can't exactly represent any fractions whose denominator isn't a power of 2 — consider ¹⁄₃ which can not at any precision be exactly represented as floating-point number to base 2, nor as decimal fraction (base 10), even if bignum.

An obvious string representation of a rational is of course "n/d". The following "constructor" does that, plus it normalizes the signs, reduces to lowest terms, and returns just the integer n if d==1:

proc rat {n d} {
  if {!$d} {error "denominator can't be 0"}
  if {$d<0} {set n [- $n]; set d [- $d]}
  set g [gcd $n $d]
  set n [/ $n $g]
  set d [/ $d $g]
  expr {$d==1? $n: "$n/$d" }
}

Conversely, this "deconstructor" splits zero or more rational or integer strings into num and den variables, such that [ratsplit 1/3 a b] assigns 1 to a and 3 to b:

proc ratsplit args {
   foreach {r _n _d} $args {
      upvar 1 $_n n  $_d d
      foreach {n d} [split $r /] break
      if {$d eq ""} {set d 1}
   }
}

#-- Four-species math on "rats":
proc rat+ {r s} {
   ratsplit $r a b $s c d
   rat [+ [* $a $d] [* $c $b]] [* $b $d]
}
proc rat- {r s} {
   ratsplit $r a b $s c d
   rat [- [* $a $d] [* $c $b]] [* $b $d]
}
proc rat* {r s} {
   ratsplit $r a b $s c d
   rat [* $a $c] [* $b $d]
}
proc rat/ {r s} {
   ratsplit $r a b $s c d
   rat [* $a $d] [* $b $c]
}

Arithmetical helper functions can be wrapped with func if they only consist of one call of expr:

proc func {name argl body} {proc $name $argl [list expr $body]}

#-- Greatest common denominator:
func gcd {u v} {$u? [gcd [% $v $u] $u]: abs($v)}

#-- Binary expr operators exported:
foreach op {+ * / %} {func $op {a b} \$a$op\$b}

#-- "-" can have 1 or 2 operands:
func - {a {b ""}} {$b eq ""? -$a: $a-$b}

#-- a little tester reports the unexpected:
proc ? {cmd expected} {
   catch {uplevel 1 $cmd} res
   if {$res ne $expected} {puts "$cmd -> $res, expected $expected"}
}

#-- The test suite should silently pass when this file is sourced:
? {rat 42 6} 7
? {rat 1 -2} -1/2
? {rat -1 -2} 1/2
? {rat 1 0} "denominator can't be 0"
? {rat+ 1/3 1/3} 2/3
? {rat+ 1/2 1/2} 1
? {rat+ 1/2 1/3} 5/6
? {rat+ 1 1/2}    3/2
? {rat- 1/2 1/8} 3/8
? {rat- 1/2 1/-8} 5/8
? {rat- 1/7 1/7} 0
? {rat* 1/2 1/2} 1/4
? {rat/ 1/4 1/4} 1
? {rat/ 4 -6} -2/3

Docstrings

Languages like Lisp and Python have the docstring feature, where a string in the beginning of a function can be retrieved for on-line (or printed) documentation. Tcl doesn't have this mechanism built-in (and it would be hard to do it exactly the same way, because everything is a string), but a similar mechanism can easily be adopted, and it doesn't look bad in comparison:

Common Lisp: (documentation 'foo 'function)
Python: foo.__doc__
Tcl: docstring foo

If the docstring is written in comments at the top of a proc body, it is easy to parse it out. In addition, for all procs, even without docstring, you get the "signature" (proc name and arguments with defaults). The code below also serves as usage example: }

proc docstring procname {
   # reports a proc's args and leading comments.
   # Multiple documentation lines are allowed.
   set res "{usage: $procname [uplevel 1 [list info args $procname]]}"
   # This comment should not appear in the docstring
   foreach line [split [uplevel 1 [list info body $procname]] \n] {
       if {[string trim $line] eq ""} continue
       if ![regexp {\s*#(.+)} $line -> line] break
       lappend res [string trim $line]
   }
   join $res \n
}
proc args procname {
   # Signature of a proc: arguments with defaults
   set res ""
   foreach a [info args $procname] {
       if [info default $procname $a default] {
           lappend a $default
       }
       lappend res $a
   }
   set res
}

Testing:

% docstring docstring
usage: docstring procname
reports a proc's args and leading comments.
Multiple documentation lines are allowed.

% docstring args
usage: args procname
Signature of a proc: arguments with defaults

Factorial

Factorial (n!) is a popular function with super-exponential growth. Mathematically put,

  0! = 1
  n! = n (n-1)! if n >0, else undefined

In Tcl, we can have it pretty similarly:

proc fact n {expr {$n<2? 1: $n * [fact [incr n -1]]}}

But this very soon crosses the limits of integers, giving wrong results.

A math book showed me the Stirling approximation to n! for large n (at Tcl's precisions, "large" is > 20 ...), so I built that in:

proc fact n {expr {
    $n<2? 1:
    $n>20? pow($n,$n)*exp(-$n)*sqrt(2*acos(-1)*$n):
           wide($n)*[fact [incr n -1]]}
}

Just in case somebody needs approximated large factorials... But for n>143 we reach the domain limit of floating point numbers. In fact, the float limit is at n>170, so an intermediate result in the Stirling formula must have busted at 144. For such few values it is most efficient to just look them up in a pre-built table, as Tcllib's math::factorial does.

How big is A4?

Letter and Legal paper formats are popular in the US and other places. In Europe and elsewhere, the most widely used paper format is called A4. To find out how big a paper format is, one can measure an instance with a ruler, or look up appropriate documentation. The A formats can also be deduced from the following axioms:

A0 has an area of one square meter
A(n) has half the area of A(n-1)
The ratio between the longer and the shorter side of an A format is constant

How much this ratio is, can easily be computed if we consider that A(n) is produced from A(n-1) by halving it parallel to the shorter side, so

2a : b = b : a, 
2 a2 = b2, 
b=sqrt(2) a, hence 
b : a = sqrt(2) : 1

So here is my Tcl implementation, which returns a list of height and width in centimeters (10000 cm² = 1 m²) with two fractional digits, which delivers a sufficient precision of 1/10 mm: }

proc paperA n {
   set w [expr {sqrt(10000/(pow(2,$n) * sqrt(2)))}]
   set h [expr {$w * sqrt(2)}]
   format "%.2f %.2f" $h $w
}
% paperA 4
29.73 21.02

Bit vectors

Here is a routine for querying or setting single bits in vectors, where bits are addressed by non-negative integers. Implementation is as a "little-endian" list of integers, where bits 0..31 are in the first list element, 32..63 in the second, etc.

Usage: bit varName position ?bitval?

If bitval is given, sets the bit at numeric position position to 1 if bitval != 0, else to 0; in any case returns the bit value at specified position. If variable varName does not exist in caller's scope, it will be created; if it is not long enough, it will be extended to hold at least $position+1 bits, e.g. bit foo 32 will turn foo into a list of two integers, if it was only one before. All bits are initialized to 0.

proc bit {varName pos {bitval {}}} {
   upvar 1 $varName var
   if {![info exist var]} {set var 0}
   set element [expr {$pos/32}]
   while {$element >= [llength $var]} {lappend var 0}
   set bitpos [expr {1 << $pos%32}]
   set word [lindex $var $element]
   if {$bitval != ""} {
       if {$bitval} {
           set word [expr {$word | $bitpos}]
       } else {
           set word [expr {$word & ~$bitpos}]
       }
       lset var $element $word
   }
   expr {($word & $bitpos) != 0}
}

#---------------------- now testing...
if {[file tail [info script]] == [file tail $argv0]} {
   foreach {test      expected} {
       {bit foo 5 1}  1
       {set foo}      32
       {bit foo 32 1} {32 1}
   } {
       catch {eval $test} res
       puts $test:$res/$expected
   }
}

This may be used for Boolean properties of numerically indexed sets of items. Example: An existence map of ZIP codes between 00000 and 99999 can be kept in a list of 3125 integers (where each element requires about 15 bytes overall), while implementing the map as an array would take 100000 * 42 bytes in worst case, but still more than a bit vector if the population isn't extremely sparse — in that case, a list of 1-bit positions, retrieved with lsearch, might be more efficient in memory usage. Runtime of bit vector accesses is constant, except when a vector has to be extended to much larger length.

Bit vectors can also be used to indicate set membership (set operations would run faster if processing 32 bits on one go with bitwise operators (&, |, ~, ^)) — or pixels in a binary imary image, where each row could be implemented by a bitvector.

Here's a routine that returns the numeric indices of all set bits in a bit vector:

proc bits bitvec {
   set res {}
   set pos 0
   foreach word $bitvec {
       for {set i 0} {$i<32} {incr i} {
           if {$word & 1<<$i} {lappend res $pos}
           incr pos
       }
   }
   set res
}
% bit foo 47 1
1
% bit foo 11 1
1
% set foo
2048 32768
% bits $foo
11 47

Sieve of Erastothenes: The following procedure exercises the bit vector functions by letting bits represent integers, and unsetting all that are divisible. The numbers of the bits finally still set are supposed to be primes, and returned:

proc sieve max {
   set maxroot [expr {sqrt($max)}]
   set primes [string repeat " 0xFFFFFFFF" [expr {($max+31)/32}]]
   bit primes 0 0; bit primes 1 0
   for {set i [expr $max+1]} {$i<=(($max+31)/32)*32} {incr i} {
       bit primes $i 0 ;# mask out excess bits
   }
   for {set i 2} {$i<=$maxroot} {incr i} {
      if {[bit primes $i]} {
          for {set j [expr $i<<1]} {$j<=$max} {incr j $i} {
              bit primes $j 0
          }
      }
   }
   bits $primes
}
% time {set res [sieve 10000]}
797000 microseconds per iteration

Here's code to count the number of 1-bits in a bit vector, represented as an integer list. It does so by adding the values of the hex digits:

proc bitcount intlist {
   array set bits {
      0 0  1 1  2 1  3 2  4 1  5 2  6 2  7 3
      8 1  9 2  a 2  b 3  c 2  d 3  e 3  f 4
   }
   set sum 0
   foreach int $intlist {
      foreach nybble [split [format %x $int] ""] {
         incr sum $bits($nybble)
      }
   }
   set sum
}

Stacks and queues

Stacks and queues are containers for data objects with typical access methods:

push: add one object to the container
pop: retrieve and remove one object from the container

In Tcl it is easiest to implement stacks and queues with lists, and the push method is most naturally lappend, so we only have to code a single generic line for all stacks and queues:

interp alias {} push {} lappend

It is pop operations in which stacks, queues, and priority queues differ:

in a stack, the most recently pushed object is retrieved and removed (last in first out, LIFO)
in a (normal) queue, it is the least recently pushed object (first in first out, FIFO)
in a priority queue, the object with the highest priority comes first.

Priority (a number) has to be assigned at pushing time — by pushing a list of two elements, the item itself and the priority, e.g..

push toDo [list "go shopping" 2]
push toDo {"answer mail" 3}
push toDo {"Tcl coding" 1}  ;# most important thing to do

In a frequent parlage, priority 1 is the "highest", and the number increases for "lower" priorities — but you could push in an item with 0 for "ultrahigh" ;-) Popping a stack can be done like this:

proc pop name {
   upvar 1 $name stack
   set res [lindex $stack end]
   set stack [lrange $stack 0 end-1]
   set res
}

Popping a queue is similarly structured, but with so different details that I found no convenient way to factor out things:

proc qpop name {
   upvar 1 $name queue
   set res [lindex $queue 0]
   set queue [lrange $queue 1 end]
   set res
}

Popping a priority queue requires sorting out which item has highest priority. Sorting can be done when pushing, or when popping, and since our push is so nicely generic I prefer the second choice (as the number of pushs and pops should be about equal, it does not really matter). Tcl's lsort is stable, so items with equal priority will remain in the order in which they were queued:

proc pqpop name {
   upvar 1 $name queue
   set queue [lsort -real -index 1 $queue]
   qpop queue ;# fall back to standard queue, now that it's sorted
}

A practical application is e.g. in state space searching, where the kind of container of the to-do list determines the strategy:

stack is depth-first
(normal) queue is breadth-first
priority queue is any of the more clever ways: A*, Greedy, ...

Recent-use lists: A variation that can be used both in a stack or queue fashion is a list of values in order of their last use (which may come handy in an editor to display the last edited files, for instance). Here, pushing has to be done by dedicated code because a previous instance would have to be removed:

proc rupush {listName value} {
     upvar 1 $listName list
     if {![info exist list]} {set list {}}
     set pos [lsearch $list $value]
     set list [lreplace $list $pos $pos]
     lappend list $value
}
% rupush tmp hello
hello
% rupush tmp world
hello world
% rupush tmp again
hello world again
% rupush tmp world
hello again world

The first element is the least recently, the last the most recently used. Elements are not removed by the popping, but (if necessary) when re-pushing. (One might truncate the list at front if it gets too long).

Functions

Functions in Tcl are typically written with the proc command. But I notice more and more that, on my way to functional programming, my proc bodies are a single call to expr which does all the rest (often with the powerful x?y:z operator). So what about a thin abstraction (wrapper) around this recurring pattern?

proc func {name argl body} {proc $name $argl [list expr $body]}

(I might have called it fun as well... it sure is.) That's all. A collateral advantage is that all expressions are braced, without me having to care. But to not make the page look so empty, here's some examples for func uses:

func fac n     {$n<2? 1: $n*[fac [incr n -1]]}
func gcd {u v} {$u? [gcd [expr $v%$u] $u]: $v}
func min {a b} {$a<$b? $a: $b}
func sgn x     {($x>0)-($x<0)} ;# courtesy rmax

Pity we have to make expr explicit again, in nested calls like in gcd ... But func isn't limited to math functions (which, especially when recursive, come out nice), but for expr uses in testing predicates as well:

func atomar list          {[lindex $list 0] eq $list}
func empty  list          {[llength $list] == 0}
func in    {list element} {[lsearch -exact $list $element] >= 0}
func limit {x min max}    {$x<$min? $min: $x>$max? $max: $x}
func ladd  {list e}       {[in $list $e]? $list: [lappend list $e]}

Exposing expr binary arithmetic operators as Tcl commands goes quite easy too:

foreach op {+ * / %} {func $op {a b} "\$a $op \$b"}

For "-", we distinguish unary and binary form:

func - {a {b ""}} {$b eq ""? -$a: $a-$b}

Having the modulo operator exposed, gcd now looks nicer:

func gcd {u v} {$u? [gcd [% $v $u] $u]: abs($v)}

For unary not I prefer that name to "!", as it might also stand for factorial — and see the shortest function body I ever wrote :^) :

func not x {!$x}

Without big mention, functions implemented by recursion have a pattern for which func is well suited (see fac and gcd above). Another example is this integer range generator (starts from 1, and is inclusive, so [iota1 5] == {1 2 3 4 5}):

func iota1 n {$n == 1? 1: [concat [iota1 [- $n 1]] $n]}

Experiments with Boolean functions

"NAND is not AND." Here are some Tcl codelets to demonstrate how all Boolean operations can be expressed in terms of the single NAND operator, which returns true if not both his two inputs are true (NOR would have done equally well). We have Boolean operators in expr, so here goes:

proc nand {A B} {expr {!($A && $B)}}

The only unary operator NOT can be written in terms of nand:

proc not {A} {nand $A $A}

.. and everything else can be built from them too:

proc and {A B} {not [nand $A $B]}

proc or {A B} {nand [not $A] [not $B]}

proc nor {A B} {not [or $A $B]}

proc eq {A B} {or [and $A $B] [nor $A $B]}

proc ne {A B} {nor [and $A $B] [nor $A $B]}

Here's some testing tools — to see whether an implementation is correct, look at its truth table, here done as the four results for A,B combinations 0,0 0,1 1,0 1,1 — side note: observe how easily functions can be passed in as arguments:

proc truthtable f {
   set res {}
   foreach A {0 1} {
       foreach B {0 1} {
           lappend res [$f $A $B]
       }
   }
   set res
}

% truthtable and
0 0 0 1

% truthtable nand
1 1 1 0

% truthtable or
0 1 1 1

% truthtable nor
1 0 0 0

% truthtable eq
1 0 0 1

To see how efficient the implementation is (in terms of NAND units used), try this, which relies on the fact that Boolean functions contain no lowercase letters apart from the operator names:

proc nandcount f {
   regsub -all {[^a-z]} [info body $f] " " list
   set nums [string map {nand 1 not 1 and 2 nor 4 or 3 eq 6} $list]
   expr [join $nums +]
}

As a very different idea, having nothing to do with NAND as elementary function, the following generic code "implements" Boolean functions very intuitively, by just giving their truth table for look-up at runtime:

proc booleanFunction {truthtable a b} {
   lindex $truthtable [expr {!!$a+!!$a+!!$b}]
}

interp alias {} and  {} booleanFunction {0 0 0 1}
interp alias {} or   {} booleanFunction {0 1 1 1}
interp alias {} nand {} booleanFunction {1 1 1 0}

Solving cryptarithms

Cryptarithms are puzzles where digits are represented by letters, and the task is to find out which. The following "General Problem Solver" (for small values of General) uses heavy metaprogramming: it

builds up a nest of foreachs suiting the problem,
quick kills (with continue) to force unique values for the variables, and
returns the first solution found, or else an empty string:

proc solve {problem {domain0 {0 1 2 3 4 5 6 7 8 9}}} {
   set vars [lsort -u [split [regsub -all {[^A-Z]} $problem ""] ""]]
   set map {= ==}
   set outers {}
   set initials [regexp -all -inline {[^A-Z]([A-Z])} /$problem]
   set pos [lsearch $domain0 0]
   set domain1 [lreplace $domain0 $pos $pos]
   foreach var $vars {
       append body "foreach $var \$domain[expr [lsearch $initials $var]>=0] \{\n"
       lappend map $var $$var
       foreach outer $outers {
           append body "if {$$var eq $$outer} continue\n"
       }
       lappend outers $var
       append epilog \}
   }
   set test [string map $map $problem]
   append body "if {\[expr $test\]} {return \[subst $test\]}" $epilog
   if 1 $body
}

This works fine on some well-known cryptarithms:

% solve SEND+MORE=MONEY
9567+1085==10652

% solve SAVE+MORE=MONEY
9386+1076==10462

% solve YELLOW+YELLOW+RED=ORANGE
143329+143329+846==287504

Database experiments

A simple array-based database

There are lots of complex databases around. Here I want to explore how a database can be implemented in the Tcl spirit of simplicity, and how far that approach takes us. Consider the following model:

A database is a set of records
A record is a nonempty set of fields with a unique ID
A field is a pair of tag and nonempty value, both being strings

Fields may well be implemented as array entries, so we could have an array per record, or better one array for the whole database, where the key is composed of ID and tag. Unique IDs can be had by just counting up (incrementing the highest ID so far). The process of creating a simple database consists only of setting an initial value for the ID:

set db(lastid) 0

Let's consider a library application for an example. Adding a book to the database can be simply done by

set id [incr db(lastid)]
set db($id,author) "Shakespeare, William"
set db($id,title) "The Tempest"
set db($id,printed) 1962
set db($id,label) S321-001

Note that, as we never specified what fields a record shall contain, we can add whatever we see fit. For easier handling, it's a good idea to classify records somehow (we'll want to store more than books), so we add

set db($id,isa) book

Retrieving a record is as easy as this (though the fields come in undefined order):

array get db $id,*

and deleting a record is only slightly more convolved:

foreach i [array names db $id,*] {unset db($i)}

or, even easier and faster from Tcl 8.3 on:

array unset db $id,*

Here's how to get a "column", all fields of a given tag:

array get db *,title

But real columns may have empty fields, which we don't want to store. Retrieving fields that may not physically exist needs a tolerant access function:

proc db'get {_db id field} {
   upvar $_db db
   if {[array names db $id,$field]=="$id,$field"} {
       return $db($id,$field)
   } else {return ""}
}

In a classical database we have to define tables: which fields of what type and of which width. Here we can do what we want, even retrieve which fields we have used so far (using a temporary array to keep track of field names):

proc db'fields {_db} {
  upvar $_db db
  foreach i [array names db *,*] {
     set tmp([lindex [split $i ,] 1]) ""
  }
  lsort [array names tmp]
}

Searching for records that meet a certain condition can be done sequentially. For instance, we want all books printed before 1980:

foreach i [array names *,printed] {
   if {$db($i)<1980} {
       set id [lindex [split $i ,] 0]
       puts "[db'get db $id author]: [db'get db $id title] $db($i)"
   }
}

We might also store our patrons in the same database (here in a different style):

set i [incr $db(lastid)]
array set db [list $i,name "John F. Smith" $i,tel (123)456-7890 $i,isa  patron}

Without a concept of "tables", we can now introduce structures like in relational databases. Assume John Smith borrows "The Tempest". We have the patron's and book's ID in variables and do double bookkeeping:

lappend db($patron,borrowed) $book ;# might have borrowed other books
set db($book,borrower) $patron
set db($book,dueback) 2001-06-12

When he returns the book, the process is reversed:

set pos [lsearch $db($patron,borrowed) $book]
set db($patron,borrowed) [lreplace $db($patron,borrowed) $pos $pos]
unset db($book,borrower) ;# we're not interested in empty fields
unset db($book,dueback)

The dueback field (%Y-%M-%d format is good for sorting and comparing) is useful for checking whether books have not been returned in time:

set today [clock format [clock seconds] -format %Y-%M-%d]]
foreach i [array names db *,dueback] {
   if {$db($i)<$today} {
       set book [lindex [split $i ,] 0] ;# or: set book [idof $i] - see below
       set patron $db($book,borrower)
       #write a letter
       puts "Dear $db($patron,name), "
       puts "please return $db($book,title) which was due on\
       $db($book,dueback)"
   }
}

Likewise, parts of the accounting (e.g. orders to, and bills from, booksellers) can be added with little effort, and cross-related also to external files (just set the value to the filename).

Indexes: As shown, we can retrieve all data by sequential searching over array names. But if the database grows in size, it's a good idea to create indexes which cross-reference tags and values to IDs. For instance, here's how to make an authors' index in four lines:

foreach i [array names db *,author] {
   set book [lindex [split $i ,] 0]
   lappend db(author=[string toupper $db($i)]) $book
}
# and then..
foreach i [lsort [array names db author=SHAK*]] {
   puts "[lindex [split $i =] 1]:" ;# could be wrapped as 'valueof'
   foreach id $db($i) {
       puts "[db'get db $id title] - [db'get db $id label]"
   }
}

gives us a books list of all authors matching the given glob pattern (we reuse Tcl's functionality, instead of reinventing it...). Indexes are useful for repeated information that is likely to be searched. Especially, indexing the isa field allows iterating over "tables" (which we still don't explicitly have!;-):

regsub -all isa= [array names db isa=*] "" tables
foreach patron $db(isa=patron) {...}

And beyond industry-standard SQL, we can search multiple indices in one query:

array names db *=*MARK*

gives you all (case-independent) occurrences of MARK, be it in patron's names, book's authors or titles. As versatile as good old grep...

Persistence: Databases are supposed to exist between sessions, so here's how to save a database to a file:

set fp [open Library.db w]
puts $fp [list array set db [array get db]]
close $fp

and loading a database is even easier (on re-loading, better unset the array before):

source Library.db

If you use characters outside your system encoding (no problem to write Japanese book titles in Kanji), you'll have to fconfigure (e.g -encoding utf-8) on saving and loading, but that's just a few more LOC. Saving also goes a good way to what is ceremonially called "committing" (you'll need write-locking for multi-user systems), while loading (without saving before) might be called a "one-level rollback", where you want to discard your latest changes.

Notice that so far we have only defined one short proc, all other operations were done with built-in Tcl commands only. For clearer code, it is advisable to factor out frequent operations into procs, e.g.

proc idof {index} {lindex [split $index ,] 0}
proc db'add {_db data} {
   upvar $_db db
   set id [incr db(lastid)]
   foreach {tag value} $data {set db($id,$tag) $value}
   # might also update indexes here
}
proc db'tablerow {_db id tags} {
   upvar $_db db
   set res {}
   foreach tag $tags {lappend res [db'get db $id $tag]}
   set res
}

Of course, with growing databases we may reach memory limits: arrays need some extra storage for administration. On the other hand, the present approach is pretty economic, since it does not use field widths (all strings are "shrink-wrapped"), and omits empty fields, while at the same time allowing to add whatever fields you wish. A further optimization could be to tally value strings, and replace the frequent ones with "@$id", where db(@$id) holds the value once, and only db'get has to be adapted to redirect the query.

Also, memory limits on modern computers are somewhere up high... so only at some time in the future you might have (but maybe not want) to change to a complex database ;-)

On the limits: Tcl arrays may get quite large (one app was reported to store 800000 keys in Greek characters), and at some point enumerating all keys with array names db (which produces one long list) may exceed your available memory, causing the process to swap. In that situation, you can fall back to the (otherwise slower, and uglier) use of a dedicated iterator:

set search [array startsearch db]
while {[array anymore db $search]} {
   set key [array nextelement db $search]
   # now do something with db($key) - but see below!
}
array donesearch db $search

But neither can you filter the keys you will get with a glob pattern, nor may you add or delete array elements in the loop — the search will be immediately terminated.

Tables as lists of lists

Tables are understood here as rectangular (matrix) arrangements of data in rows (one row per "item"/"record") and columns (one column per "field"/"element"). They are for instance the building blocks of relational databases and spreadsheets. In Tcl, a sensible implementation for compact data storage would be as a list of lists. This way, they are "pure values" and can be passed e.g. through functions that take a table and return a table. No con-/destructors are needed, in contrast to the heavierweight matrix in Tcllib. I know there are many table implementations in Tcl, but like so often I wanted to build one "with my bare hands" and as simple as possible. As you see below, many functionalities can be "implemented" by just using Tcl's list functions.

A nice table also has a header line, that specifies the field names. So to create such a table with a defined field structure, but no contents yet, one just assigns the header list:

set tbl { {firstname lastname phone}}

Note the double bracing, which makes sure tbl is a 1-element list. Adding "records" to the table is as easy as

lappend tbl {John Smith (123)456-7890}

Make sure the fields (cells) match those in the header. Here single bracing is correct. If a field content contains spaces, it must be quoted or braced too:

lappend tbl {{George W} Bush 234-5678}

Sorting a table can be done with lsort -index, taking care that the header line stays on top:

proc tsort args {
   set table [lindex $args end]
   set header [lindex $table 0]
   set res [eval lsort [lrange $args 0 end-1] [list [lrange $table 1 end]]]
   linsert $res 0 $header
}

Removing a row (or contiguous sequence of rows) by numeric index is a job for lreplace:

set tbl [lreplace $tbl $from $to]

Simple printing of such a table, a row per line, is easy with

puts [join $tbl \n]

Accessing fields in a table is more fun with the field names than the numeric indexes, which is made easy by the fact that the field names are in the first row:

proc t@ {tbl field} {lsearch [lindex $tbl 0] $field}
% t@ $tbl phone
2

You can then access cells:

puts [lindex $tbl $rownumber [t@ $tbl lastname]]

and replace cell contents like this:

lset tbl $rownumber [t@ $tbl phone] (222)333-4567

Here is how to filter a table by giving pairs of field name and glob-style expression — in addition to the header line, all rows that satisfy at least one of those come through (you can force AND behavior by just nesting such calls):

proc trows {tbl args} {
   set conditions {}
   foreach {field condition} $args {
       lappend conditions [t@ $tbl $field] $condition
   }
   set res [list [lindex $tbl 0]]
   foreach row [lrange $tbl 1 end] {
       foreach {index condition} $conditions {
           if [string match $condition [lindex $row $index]] {
              lappend res $row
              break; # one hit is sufficient
           }
       }
   }
   set res
}
% trows $tbl lastname Sm*
{firstname lastname} phone {John Smith (123)456-7890}

This filters (and, if wanted, rearranges) columns, sort of what is called a "view":

proc tcols {tbl args} {
   set indices {}
   foreach field $args {lappend indices [t@ $tbl $field]}
   set res {}
   foreach row $tbl {
       set newrow {}
       foreach index $indices {lappend newrow [lindex $row $index]}
       lappend res $newrow
   }
   set res
}

Programming Languages Laboratory

In the following few chapters you'll see how easy it is to emulate or explore other programming languages with Tcl.

GOTO: a little state machine

The GOTO "jumping" instruction is considered harmful in programming for many years now, but still it might be interesting to experiment with. Tcl has no goto command, but it can easily be created. The following code was created in the Tcl chatroom, instigated by the quote: "A computer is a state machine. Threads are for people who can't program state machines."

So here is one model of a state machine in ten lines of code. The "machine" itself takes a list of alternating labels and state code; if a state code does not end in a goto or break, the same state will be repeated as long as not left, with goto or break (implicit endless loop). The goto command is defined "locally", and deleted after leaving the state machine — it is not meaningfully used outside of it. Execution starts at the first of the states.

proc statemachine states {
   array set S $states
   proc goto label {
       uplevel 1 set this $label
       return -code continue
   }
   set this [lindex $states 0]
   while 1 {eval $S($this)}
   rename goto {}
}

Testing: a tiny state machine that greets you as often as you wish, and ends if you only hit Return on the "how often?" question:

statemachine {
   1 {
       puts "how often?"
       gets stdin nmax
       if {$nmax eq ""} {goto 3}
       set n 0
       goto 2
   } 2 {
       if {[incr n] > $nmax} {goto 1}
       puts "hello"
   } 3 {puts "Thank you!"; break}
}

Playing Assembler

In this weekend fun project to emulate machine language, I picked those parts of Intel 8080A/8085 Assembler (because I had a detailed reference handy) that are easily implemented and still somehow educational (or nostalgic ;-).

Of course this is no real assembler. The memory model is constant-size instructions (strings in array elements), which are implemented as Tcl procs. So an "assembler" program in this plaything will run even slower than in pure Tcl, and consume more memory — while normally you associate speed and conciseness with "real" assembler code. But it looks halfway like the real thing: you get sort of an assembly listing with symbol table, and can run it — I'd hardly start writing an assembler in C, but in Tcl it's fun for a sunny Sunday afternoon... }

namespace eval asm {
   proc asm body {
       variable mem
       catch {unset mem} ;# good for repeated sourcing
       foreach line [split $body \n] {
           foreach i {label op args} {set $i ""}
           regexp {([^;]*);} $line -> line ;# strip off comments
           regexp {^ *(([A-Z0-9]+):)? *([A-Z]*) +(.*)} [string toupper $line]\
                ->  -   label           op       args
                puts label=$label,op=$op,args=$args
           if {$label!=""} {set sym($label) $PC}
           if {$op==""}     continue
           if {$op=="DB"}  {set mem($PC) [convertHex $args]; incr PC; continue}
           if {$op=="EQU"} {set sym($label) [convertHex $args]; continue}
           if {$op=="ORG"} {set PC [convertHex $args]; continue}
           regsub -all ", *" $args " " args ;# normalize commas
           set mem($PC) "$op $args"
           incr PC
       }
       substituteSymbols sym
       dump   sym
   }
   proc convertHex s {
       if [regexp {^([0-9A-F]+)H$} [string trim $s] -> s] {set s [expr 0x$s]}
       set s
   }
   proc substituteSymbols {_sym} {
       variable mem
       upvar $_sym sym
       foreach i [array names mem] {
           set tmp [lindex $mem($i) 0]
           foreach j [lrange $mem($i) 1 end] {
               if {[array names sym $j] eq $j} {set j $sym($j)}
               lappend tmp $j
           }
           set mem($i) $tmp
       }
   }
   proc dump {_sym} {
       variable mem
       upvar $_sym sym
       foreach i [lsort -integer [array names mem]] {
           puts [format "%04d %s" $i $mem($i)]
       }
       foreach i [lsort [array names sym]] {
           puts [format "%-10s: %04x" $i $sym($i)]
       }
   }
   proc run { {pc 255}} {
       variable mem
       foreach i {A B C D E Z} {set ::$i 0}
       while {$pc>=0} {
           incr pc
           #puts "$mem($pc)\tA:$::A B:$::B C:$::C D:$::D E:$::E Z:$::Z"
           eval $mem($pc)
       }
   }
#----------------- "machine opcodes" implemented as procs
   proc ADD  {reg reg2}  {set ::Z [incr ::$reg [set ::$reg2]]}
   proc ADI  {reg value} {set ::Z [incr ::$reg $value]}
   proc CALL {name}      {[string tolower $name] $::A}
   proc DCR  {reg}       {set ::Z [incr ::$reg -1]}
   proc INR  {reg}       {set ::Z [incr ::$reg]}
   proc JMP  where       {uplevel 1 set pc [expr $where-1]}
   proc JNZ  where       {if $::Z {uplevel 1 JMP $where}}
   proc JZ   where       {if !$::Z {uplevel 1 JMP $where}}
   proc MOV  {reg adr}   {variable mem; set ::$reg $mem($adr)}
   proc MVI  {reg value} {set ::$reg $value}
}

Now testing:

asm::asm {
       org  100     ; the canonical start address in CP/M
       jmp  START   ; idiomatic: get over the initial variable(s)
DONE:  equ  0       ; warm start in CP/M ;-)
MAX:   equ  5
INCR:  db   2       ; a variable (though we won't vary it)
;; here we go...
START: mvi  c,MAX   ; set count limit
       mvi  a,0     ; initial value
       mov  b,INCR
LOOP:  call puts    ; for now, fall back to Tcl for I/O
       inr  a
       add  a,b     ; just to make adding 1 more complicated
       dcr  c       ; counting down..
       jnz  LOOP    ; jump on non-zero to LOOP
       jmp  DONE    ; end of program
       end
}

The mov b,INCR part is an oversimplification. For a real 8080, one would have to say

LXI H,INCR ; load double registers H+L with the address INCR
MOV B,M    ; load byte to register B from the address pointed to in HL

Since the pseudo-register M can also be used for writing back, it cannot be implemented by simply copying the value. Rather, one could use read and write traces on variable M, causing it to load from, or store to, mem($HL). Maybe another weekend...

Functional programming (Backus 1977)

John Backus turned 80 these days. For creating FORTRAN and the BNF style of language description, he received the ACM Turing Award in 1977. In his Turing Award lecture,

Can Programming Be Liberated from the von Neumann Style? A Functional Style and Its Algebra of Programs. (Comm. ACM 21.8, Aug. 1978, 613-641)

he developed an amazing framework for functional programming, from theoretical foundations to implementation hints, e.g. for installation, user privileges, and system self-protection. In a nutshell, his FP system comprises

a set O of objects (atoms or sequences)
a set F of functions that map objects into objects (f : O |-> O}
an operation, application (very roughly, eval)
a set FF of functional forms, used to combine functions or objects to form new functions in F
a set D of definitions that map names to functions in F

I'm far from having digested it all, but like so often, interesting reading prompts me to do Tcl experiments, especially on weekends. I started with Backus' first Functional Program example,

Def Innerproduct = (Insert +) o (ApplyToAll x) o Transpose

and wanted to bring it to life — slightly adapted to Tcl style, especially by replacing the infix operator "o" with a Polish prefix style:

Def Innerproduct = {o {Insert +} {ApplyToAll *} Transpose}

Unlike procs or lambdas, more like APL or RPN, this definition needs no variables — it declares (from right to left) what to do with the input; the result of each step is the input for the next step (to the left of it). In an RPN language, the example might look like this:

/Innerproduct {Transpose * swap ApplyToAll + swap Insert} def

which has the advantage that execution goes from left to right, but requires some stack awareness (and some swaps to set the stack right ;^)

Implementing Def, I took an easy route by just creating a proc that adds an argument and leaves it to the "functional" to do the right thing (with some quoting heaven :-) }

proc Def {name = functional} {
   proc $name x "\[$functional\] \$x"
}

For functional composition, where, say for two functions f and g,

[{o f g} $x] == [f [g $x]]

again a proc is created that does the bracket nesting:

proc o args {
   set body return
   foreach f $args {append body " \[$f"}
   set name [info level 0]
   proc $name x "$body \$x [string repeat \] [llength $args]]"
   set name
}

Why Backus used Transpose on the input, wasn't first clear to me, but as he (like we Tclers) represents a matrix as a list of rows, which are again lists (also known as vectors), it later made much sense to me. This code for transposing a matrix uses the fact that variable names can be any string, including those that look like integers, so the column contents are collected into variables named 0 1 2 ... and finally turned into the result list:

proc Transpose matrix {
   set cols [iota [llength [lindex $matrix 0]]]
   foreach row $matrix {
       foreach element $row col $cols {
           lappend $col $element
       }
   }
   set res {}
   foreach col $cols {lappend res [set $col]}
   set res
}

An integer range generator produces the variable names, e.g iota 3 => {0 1 2}

proc iota n {
   set res {}
   for {set i 0} {$i<$n} {incr i} {lappend res $i}
   set res
}

#-- This "functional form" is mostly called map in more recent FP:
proc ApplyToAll {f list} {
   set res {}
   foreach element $list {lappend res [$f $element]}
   set res
}

...and Insert is better known as fold, I suppose. My oversimple implementation assumes that the operator is one that expr understands:

proc Insert {op arguments} {expr [join $arguments $op]}

#-- Prefix multiplication comes as a special case of this:
interp alias {} * {} Insert *

#-- Now to try out the whole thing:
Def Innerproduct = {o {Insert +} {ApplyToAll *} Transpose}
puts [Innerproduct {{1 2 3} {6 5 4}}]

which returns 28 just as Dr. Backus ordered (= 1*6 + 2*5 + 3*4). Ah, the joys of weekend Tcl'ing... — and belatedly, Happy Birthday, John! :)

Another example, cooked up by myself this time, computes the average of a list. For this we need to implement the construction operator, which is sort of inverse mapping — while mapping a function over a sequence of inputs produces a sequence of outputs of that function applied to each input, Backus' construction maps a sequence of functions over one input to produce a sequence of results of each function to that input, e.g.

[f,g](x) == <f(x),g(x)>

Of course I can't use circumfix brackets as operator name, so let's call it constr:

proc constr args {
   set functions [lrange $args 0 end-1]
   set x [lindex $args end]
   set res {}
   foreach f $functions {lappend res [eval $f [list $x]]}
   set res
}

#-- Testing:
Def mean = {o {Insert /} {constr {Insert +} llength}}
puts [mean {1 2 3 4 5}]

which returns correctly 3. However, as integer division takes place, it would be better to make that

proc double x {expr {double($x)}}

Def mean    = {o {Insert /} {constr {Insert +} dlength}}
Def dlength = {o double llength}

puts [mean {1 2 3 4}]

giving the correct result 2.5. However, the auxiliary definition for dlength cannot be inlined into the definition of mean — so this needs more work... But this version, that maps double first, works:

Def mean = {o {Insert /} {constr {Insert +} llength} {ApplyToAll double}}

One more experiment, just to get the feel:

Def hypot  = {o sqrt {Insert +} {ApplyToAll square}}
Def square = {o {Insert *} {constr id id}}

proc sqrt x {expr {sqrt($x)}}
proc id x   {set x}

puts [hypot {3 4}]

which gives 5.0. Compared to an RPN language, hypot would be

/hypot {dup * swap dup * + sqrt} def

which is shorter and simpler, but meddles more directly with the stack.

An important functional form is the conditional, which at Backus looks like

p1 -> f; p2 -> g; h

meaning, translated to Tcl,

if {[p1 $x]} then {f $x} elseif {[p2 $x]} then {g $x} else {h $x}

Let's try that, rewritten Polish-ly to:

cond p1 f p2 g h

proc cond args {
   set body ""
   foreach {condition function} [lrange $args 0 end-1] {
       append body "if {\[$condition \$x\]} {$function \$x} else"
   }
   append body " {[lindex $args end] \$x}"
   set name [info level 0]
   proc $name x $body
   set name
}

#-- Testing, with K in another role as Konstant function :)
Def abs = {cond {> 0} -- id}

proc > {a b} {expr {$a>$b}}
proc < {a b} {expr {$a<$b}}
proc -- x {expr -$x}
puts [abs -42],[abs 0],[abs 42]

Def sgn = {cond {< 0} {K 1} {> 0} {K -1} {K 0}}
proc K {a b} {set a}

puts [sgn 42]/[sgn 0]/[sgn -42]

#--Another famous toy example, reading a file's contents:
Def readfile = {o 1 {constr read close} open}

#--where Backus' selector (named just as integer) is here:
proc 1 x {lindex $x 0}

Reusable functional components

Say you want to make a multiplication table for an elementary school kid near you. Easily done in a few lines of Tcl code:

proc multable {rows cols} {
   set res ""
   for {set i 1} {$i <= $rows} {incr i} {
       for {set j 1} {$j <= $cols} {incr j} {
           append res [format %4d [expr {$i*$j}]]
       }
       append res \n
   }
   set res
}

The code does not directly puts its results, but returns them as a string — you might want to do other things with it, e.g. save it to a file for printing. Testing:

% multable 3 10
  1   2   3   4   5   6   7   8   9  10
  2   4   6   8  10  12  14  16  18  20
  3   6   9  12  15  18  21  24  27  30

Or print the result directly from wish:

 catch {console show}
 puts "[multable 3 10]"

Here's a different way to do it à la functional programming:

proc multable2 {rows cols} {
   formatMatrix %4d [outProd * [iota 1 $rows] [iota 1 $cols]]
}

The body is nice and short, but consists of all unfamiliar commands. They are however better reusable than the multable proc above. The first formats a matrix (a list of lists to Tcl) with newlines and aligned columns for better display:

proc formatMatrix {fm matrix} {
   join [lmap row $matrix {join [lmap i $row {format $fm $i}] ""}] \n
}

Short again, and slightly cryptic, as is the "outer product" routine, which takes a function f and two vectors, and produces a matrix where f was applied to every pair of a x b — in APL they had special compound operators for this job, in this case "°.x":

proc outProd {f a b} {
   lmap i $a {lmap j $b {$f $i $j}}
}

Again, lmap (the collecting foreach) figures prominently, so here it is in all its simplicity:

proc lmap {_var list body} {
   upvar 1 $_var var
   set res {}
   foreach var $list {lappend res [uplevel 1 $body]}
   set res
}

#-- We need multiplication from expr exposed as a function:
proc * {a b} {expr {$a * $b}}

#-- And finally, iota is an integer range generator:
proc iota {from to} {
   set res {}
   while {$from <= $to} {lappend res $from; incr from}
   set res
}

With these parts in place, we can see that multable2 works as we want:

% multable2 3 10
  1   2   3   4   5   6   7   8   9  10
  2   4   6   8  10  12  14  16  18  20
  3   6   9  12  15  18  21  24  27  30

So why write six procedures, where one did the job already? A matter of style and taste, in a way — multable is 10 LOC and depends on nothing but Tcl, which is good; multable2 describes quite concisely what it does, and builds on a few other procs that are highly reusable.

Should you need a unit matrix (where the main diagonal is 1, and the rest is 0), just call outProd with a different function (equality, ==):

% outProd == [iota 1 5] [iota 1 5]
{1 0 0 0 0} {0 1 0 0 0} {0 0 1 0 0} {0 0 0 1 0} {0 0 0 0 1}

which just requires expr's equality to be exposed too:

proc == {a b} {expr {$a == $b}}

One of the fascinations of functional programming is that one can do the job in a simple and clear way (typically a one-liner), while using a collection of reusable building-blocks like lmap and iota. And formatMatrix and outProd are so general that one might include them in some library, while the task of producing a multiplication table may not come up any more for a long time...

Modelling an RPN language

Tcl follows strictly the Polish notation, where an operator or function always precedes its arguments. It is however easy to build an interpreter for a language in Reverse Polish Notation (RPN) like Forth, Postscript, or Joy, and experiment with it.

The "runtime engine" is just called "r" (not to be confused with the R language), and it boils down to a three-way switch done for each word, in eleven lines of code:

"tcl" evaluates the top of stack as a Tcl script
known words in the ::C array are recursively evaluated in "r"
other words are just pushed

Joy's rich quoting for types ([list], {set}, "string", 'char) conflict with the Tcl parser, so lists in "r" are {braced} if their length isn't 1, and (parenthesized) if it is — but the word shall not be evaluated now. This looks better to me than /slashing as in Postscript.

As everything is a string, and to Tcl "a" is {a} is a , Joy's polymorphy has to be made explicit. I added converters between characters and integers, and between strings and lists (see the dictionary below). For Joy's sets I haven't bothered yet — they are restricted to the domain 0..31, probably implemented with bits in a 32-bit word.

Far as this is from Joy, it was mostly triggered by the examples in Manfred von Thun's papers, so I tongue-in-cheek still call it "Pocket Joy" — it was for me, at last, on the iPaq... The test suite at end should give many examples of what one can do in "r". }

proc r args {
   foreach a $args {
     dputs [info level]:$::S//$a
     if {$a eq "tcl"} {
             eval [pop]
     } elseif [info exists ::C($a)] {
             eval r $::C($a)
     } else {push [string trim $a ()]}
   }
   set ::S
}

# That's it. Stack (list) and Command array are global variables:

set S {}; unset C

#-- A tiny switchable debugger:

proc d+ {} {proc dputs s {puts $s}}
proc d- {}  {proc dputs args {}}
d- ;#-- initially, debug mode off

Definitions are in Forth style — ":" as initial word, as they look much more compact than Joy's DEFINE n == args;

proc : {n args} {set ::C($n) $args}

expr functionality is exposed for binary operators and one-arg functions:

proc 2op op {
   set t [pop]
   push [expr {[pop]} $op {$t}]
}
foreach op {+ - * / > >= != <= <} {: $op [list 2op $op] tcl}
: =    {2op ==} tcl

proc 1f  f {push [expr $f\([pop])]}
foreach f {abs double exp int sqrt sin cos acos tan} {: $f [list 1f $f] tcl}

interp alias {} pn {} puts -nonewline

#----- The dictionary has all one-liners:
: .      {pn "[pop] "} tcl
: .s    {puts $::S} tcl
: '      {push [scan [pop] %c]} tcl   ;# char -> int
: `     {push [format %c [pop]]} tcl  ;# int -> char
: and  {2op &&} tcl
: at     1 - swap {push [lindex [pop] [pop]]} tcl
: c      {set ::S {}} tcl ;# clear stack
: choice {choice [pop] [pop] [pop]} tcl
: cleave {cleave [pop] [pop] [pop]} tcl
: cons {push [linsert [pop] 0 [pop]]} tcl
: dup  {push [set x [pop]] $x} tcl
: dupd {push [lindex $::S end-1]} tcl
: emit {pn [format %c [pop]]} tcl
: even  odd not
: explode  {push [split [pop] ""]} tcl  ;# string -> char list
: fact  1 (*) primrec
: filter  split swap pop
: first  {push [lindex [pop] 0]} tcl
: fold  {rfold [pop] [pop] [pop]} tcl
: gcd  swap {0 >} {swap dupd rem swap gcd} (pop) ifte
: has  swap in
: i      {eval r [pop]} tcl
: ifte   {rifte [pop] [pop] [pop]} tcl
: implode  {push [join [pop] ""]} tcl ;# char list -> string
: in  {push [lsearch [pop] [pop]]} tcl 0 >=
: map  {rmap [pop] [pop]} tcl
: max  {push [max [pop] [pop]]} tcl
: min  {push [min [pop] [pop]]} tcl
: newstack  c
: not   {1f !} tcl
: odd  2 rem
: of  swap at
: or    {2op ||} tcl
: pop  (pop) tcl
: pred 1 -
: primrec {primrec [pop] [pop] [pop]} tcl
: product 1 (*) fold
: qsort (lsort) tcl
: qsort1 {lsort -index 0} tcl
: rem  {2op %} tcl
: rest  {push [lrange [pop] 1 end]} tcl
: reverse {} swap (swons) step
: set  {set ::[pop] [pop]} tcl
: $     {push [set ::[pop]]} tcl
: sign  {0 >}  {0 <} cleave -
: size  {push [llength [pop]]} tcl
: split  {rsplit [pop] [pop]} tcl
: step  {step [pop] [pop]} tcl
: succ  1 +
: sum   0 (+) fold
: swap  {push [pop] [pop]} tcl
: swons  swap cons
: xor  !=

Helper functions written in Tcl:

proc rifte {else then cond} {
   eval r dup $cond
   eval r [expr {[pop]? $then: $else}]
}
proc choice {z y x} {
   push [expr {$x? $y: $z}]
}
proc cleave { g f x} {
   eval [list r $x] $f [list $x] $g
}
proc max {x y} {expr {$x>$y?$x:$y}}
proc min {x y} {expr {$x<$y? $x:$y}}
proc rmap {f list} {
   set res {}
   foreach e $list {
      eval [list r $e] $f
      lappend res [pop]
   }
   push $res
}
proc step {f list} {
   foreach e $list {eval [list r ($e)] $f}
}
proc rsplit {f list} {
   foreach i {0 1} {set $i {}}
   foreach e $list {
      eval [list r $e] $f
      lappend [expr {!![pop]}] $e
   }
   push $0 $1
}
proc primrec {f init n} {
   if {$n>0} {
      push $n
      while {$n>1} {
          eval [list r [incr n -1]] $f
      }
   } else {push $init}
}
proc rfold {f init list} {
   push $init
   foreach e $list {eval [list r $e] $f}
}

#------------------ Stack routines
proc push args {
  foreach a $args {lappend ::S $a}
}
proc pop {} {
   if [llength $::S] {
      K [lindex $::S end] \
         [set ::S [lrange $::S 0 end-1]]
   } else {error "stack underflow"}
}
proc K {a b} {set a}

#------------------------ The test suite:
proc ? {cmd expected} {
   catch {uplevel 1 $cmd} res
   if {$res ne $expected} {puts "$cmd->$res, not $expected"}
}
? {r 2 3 +} 5
? {r 2 *}   10
? {r c 5 dup *} 25
: sqr dup *
: hypot sqr swap sqr + sqrt
? {r c 3 4 hypot} 5.0
? {r c {1 2 3} {dup *} map} { {1 4 9}}
? {r size} 3
? {r c {2 5 3} 0 (+) fold} 10
? {r c {3 4 5} product} 60
? {r c {2 5 3} 0 {dup * +} fold} 38
? {r c {1 2 3 4} dup sum swap size double /} 2.5
? {r c {1 2 3 4} (sum)  {size double} cleave /} 2.5
: if0 {1000 >} {2 /} {3 *} ifte
? {r c 1200 if0} 600
? {r c 600 if0}  1800
? {r c 42 sign}   1
? {r c 0 sign}     0
? {r c -42 sign} -1
? {r c 5 fact} 120
? {r c 1 0 and} 0
? {r c 1 0 or}   1
? {r c 1 0 and not} 1
? {r c 3 {2 1} cons} { {3 2 1}}
? {r c {2 1} 3 swons} { {3 2 1}}
? {r c {1 2 3} first} 1
? {r c {1 2 3} rest} { {2 3}}
? {r c {6 1 5 2 4 3} {3 >} filter} { {6 5 4}}
? {r c 1 2 {+ 20 * 10 4 -} i} {60 6}
? {r c 42 succ} 43
? {r c 42 pred} 41
? {r c {a b c d} 2 at} b
? {r c 2 {a b c d} of} b
? {r c 1 2 pop} 1
? {r c A ' 32 + succ succ `} c
? {r c {a b c d} reverse} { {d c b a}}
? {r c 1 2 dupd} {1 2 1}
? {r c 6 9 gcd} 3
? {r c true yes no choice} yes
? {r c false yes no choice} no
? {r c {1 2 3 4} (odd) split} { {2 4} {1 3}}
? {r c a {a b c} in} 1
? {r c d {a b c} in} 0
? {r c {a b c} b has} 1
? {r c {a b c} e has} 0
? {r c 3 4 max} 4
? {r c 3 4 min}  3
? {r c hello explode reverse implode} olleh
: palindrome dup explode reverse implode =
? {r c hello palindrome} 0
? {r c otto palindrome}  1

#-- reading (varname $) and setting (varname set) global Tcl vars
set tv 42
? {r c (tv) $ 1 + dup (tv) set} 43
? {expr $tv==43} 1

Tacit programming

The J programming language is the "blessed successor" to APL, where "every function is an infix or prefix operator", x?y (dyadic) or ?y (monadic), for ? being any pre- or user-defined function).

"Tacit programming" (tacit: implied; indicated by necessary connotation though not expressed directly) is one of the styles possible in J, and means coding by combining functions, without reference to argument names. This idea may have been first brought up in Functional programming (Backus 1977), if not in Forth and Joy, and it's an interesting simplification compared to the lambda calculus.

For instance, here's a breathtakingly short J program to compute the mean of a list of numbers:

mean=.+/%#

Let's chew this, byte by byte :)

=.   is assignment to a local variable ("mean") which can be called
+/%# is the "function body"
+    (dyadic) is addition
/    folds the operator on its left over the list on its right
+/   hence being the sum of a list
%    (dyadic) is division, going double on integer arguments when needed
#    (monadic) is tally, like Tcl's [llength] resp. [string length]

Only implicitly present is a powerful function combinator called "fork". When J parses three operators in a row, gfh, where f is dyadic and g and h are monadic, they are combined like the following Tcl version does:

proc fork {f g h x} {$f [$g $x] [$h $x]}

In other words, f is applied to the results of applying g and h to the single argument. Note that +/ is considered one operator, which applies the "adverb" folding to the "verb" addition (one might well call it "sum"). When two operands occur together, the "hook" pattern is implied, which might in Tcl be written as:

proc hook {f g x} {$f $x [$g $x]}

As KBK pointed out in the Tcl chatroom, the "hook" pattern corresponds to Schönfinkel/Curry's S combinator (see Hot Curry and Combinator Engine), while "fork" is called S' there.

Unlike in earlier years when I was playing APL, this time my aim was not to parse and emulate J in Tcl — I expected hard work for a dubitable gain, and this is a weekend fun project after all. I rather wanted to explore some of these concepts and how to use them in Tcl, so that in slightly more verbose words I could code (and call)

Def mean = fork /. sum llength

following Backus' FP language with the "Def" command. So let's get the pieces together. My "Def" creates an interp alias, which is a good and simple Tcl way to compose partial scripts (the definition, here) with one or more arguments, also known as "currying":

proc Def {name = args} {eval [list interp alias {} $name {}] $args}

The second parameter, "=", is for better looks only and evidently never used.

Testing early and often is a virtue, as is documentation — to make the following code snippets clearer, I tuned my little tester for better looks, so that the test cases in the source code also serve as well readable examples — they look like comments but are code! The cute name "e.g." was instigated by the fact that in J, "NB." is used as comment indicator, both being well known Latin abbreviations:

proc e.g. {cmd -> expected} {
   catch {uplevel 1 $cmd} res
   if {$res != $expected} {puts "$cmd -> $res, not $expected"}
}

Again, the "->" argument is for eye-candy only — but it feels better to me at least. See the examples soon to come.

For recursive functions and other arithmetics, func makes better reading, by accepting expr language in the body:

proc func {name argl body} {proc $name $argl [list expr $body]}

We'll use this to turn expr's infix operators into dyadic functions, plus the "slashdot" operator that makes division always return a real number, hence the dot :

foreach op {+ &mdash; * /} {func $op {a b} "\$a $op \$b"}
        e.g. {+ 1 2} -> 3
        e.g. {/ 1 2} -> 0        ;# integer division
func /. {a b} {double($a)/$b}
        e.g. {/. 1 2} -> 0.5     ;# "real" division

#-- Two abbreviations for frequently used list operations:
proc head list {lindex $list 0}
          e.g. {head {a b c}} -> a
proc tail list {lrange $list 1 end}
          e.g. {tail {a b c}} -> {b c}

For "fold", this time I devised a recursive version:

func fold {neutral op list} {
   $list eq [] ? $neutral
   : [$op [head $list] [fold $neutral $op [tail $list]]]
}
        e.g. {fold 0 + {1 2 3 4}} -> 10

#-- A "Def" alias does the same job:
Def sum = fold 0 +
        e.g. {sum      {1 2 3 4}} -> 10

#-- So let's try to implement "mean" in tacit Tcl!
Def mean = fork /. sum llength
         e.g. {mean {1 2 3 40}} -> 11.5

Tacit enough (one might have picked fancier names like +/ for "sum" and # as alias for llength), but in principle it is equivalent to the J version, and doesn't name a single argument. Also, the use of llength demonstrates that any good old Tcl command can go in here, not just the artificial Tacit world that I'm just creating...

In the next step, I want to reimplement the "median" function, which for a sorted list returns the central element if its length is odd, or the mean of the two elements adjacent to the (virtual) center for even length. In J, it looks like this:

median=.(mean@:\{~medind@#)@sortu
medind=.((<.,>.)@half) ` half @.(2&|)
half=.-:@<:                        NB. halve one less than rt. argument
sortu=.\{~/:                       NB. sort upwards

which may better explain why I wouldn't want to code in J :^) J has ASCIIfied the zoo of APL strange character operators, at the cost of using braces and brackets as operators too, without regard for balancing, and extending them with dots and colons, so e.g.

-   monadic: negate; dyadic: minus
-.  monadic: not
-:  monadic: halve

J code sometimes really looks like an accident in a keyboard factory... I won't go into all details of the above code, just some:

@ ("atop") is strong linkage, sort of functional composition
<. (monadic) is floor()
>. (monadic) is ceil()

(<.,>.) is building a list of the floor and the ceiling of its single argument, the comma being the concatenation operator here, comparable to Backus' "construction" or Joy's cleave. The pattern

a ` b @. c

is a kind of conditional in J, which could in Tcl be written

if {[$c $x]} {$a $x} else {$b $x}

but my variant of the median algorithm doesn't need a conditional — for lists of odd length it just uses the central index twice, which is idempotent for "mean", even if a tad slower.

J's "from" operator { takes zero or more elements from a list, possibly repeatedly. For porting this, lmap is a good helper, even though not strictly functional:

proc lmap {_v list body} {
   upvar 1 $_v v
   set res {}
   foreach v $list {lappend res [uplevel 1 $body]}
   set res
}
e.g. {lmap i {1 2 3 4} {* $i $i}} -> {1 4 9 16}

#-- So here's my 'from':
proc from {indices list} {lmap i $indices {lindex $list $i}}
          e.g. {from {1 0 0 2} {a b c}} -> {b a a c}

We furtheron borrow some more content from expr:

func ceil  x {int(ceil($x))}
func floor x {int(floor($x))}
   e.g. {ceil 1.5}  -> 2
   e.g. {floor 1.5} -> 1
   e.g. {fork list floor ceil 1.5} -> {1 2}

We'll need functional composition, and here's a recursive de-luxe version that takes zero or more functions, hence the name o*:

func o* {functions x} {
   $functions eq []? $x
   : [[head $functions] [o* [tail $functions] $x]]
}
e.g. {o* {} hello,world} -> hello,world

Evidently, identity as could be written

proc I x {set x}

is the neutral element of variadic functional composition, when called with no functions at all.

If composite functions like 'fork' are arguments to o*, we'd better let unknown know that we want auto-expansion of first word:

proc know what {proc unknown args $what\n[info body unknown]}
know {
   set cmd [head $args]
   if {[llength $cmd]>1} {return [eval $cmd [tail $args]]}
}

Also, we need a numeric sort that's good for integers as well as reals ("Def" serves for all kinds of aliases, not just combinations of functions):

Def sort = lsort -real
         e.g. {sort {2.718 10 1}} -> {1 2.718 10}
         e.g. {lsort {2.718 10 1}} -> {1 10 2.718} ;# lexicographic

#-- And now for the median test:
Def median = o* {mean {fork from center sort}}
Def center = o* {{fork list floor ceil} {* 0.5} -1 llength}

func -1 x {$x &mdash; 1}
        e.g. {-1 5} -> 4 ;# predecessor function, when for integers

#-- Trying the whole thing out:
e.g. {median {1 2 3 4 5}} -> 3
e.g. {median {1 2 3 4}}   -> 2.5

As this file gets tacitly sourced, I am pretty confident that I've reached my goal for this weekend — even though my median doesn't remotely look like the J version: it is as "wordy" as Tcl usually is. But the admittedly still very trivial challenge was met in truly function-level style, concerning the definitions of median, center and mean — no variable left behind. And that is one, and not the worst, Tcl way of Tacit programming...

Vector arithmetics

APL and J (see Tacit programming) have the feature that arithmetics can be done with vectors and arrays as well as scalar numbers, in the varieties (for any operator @):

scalar @ scalar → scalar (like expr does)
vector @ scalar → vector
scalar @ vector → vector
vector @ vector → vector (all of same dimensions, element-wise)

Here's experiments how to do this in Tcl. First lmap is a collecting foreach — it maps the specified body over a list:

proc lmap {_var list body} {
    upvar 1 $_var var
    set res {}
    foreach var $list {lappend res [uplevel 1 $body]}
    set res
}

#-- We need basic scalar operators from expr factored out:
foreach op {+ - * / % ==} {proc $op {a b} "expr {\$a $op \$b}"}

The following generic wrapper takes one binary operator (could be any suitable function) and two arguments, which may be scalars, vectors, or even matrices (lists of lists), as it recurses as often as needed. Note that as my lmap above only takes one list, the two-list case had to be made explicit with foreach.

proc vec {op a b} {
    if {[llength $a] == 1 && [llength $b] == 1} {
        $op $a $b
    } elseif {[llength $a]==1} {
        lmap i $b {vec $op $a $i}
    } elseif {[llength $b]==1} {
        lmap i $a {vec $op $i $b}
    } elseif {[llength $a] == [llength $b]} {
        set res {}
        foreach i $a j $b {lappend res [vec $op $i $j]}
        set res
    } else {error "length mismatch [llength $a] != [llength $b]"}
}

Tests are done with this minimal "framework":

proc e.g. {cmd -> expected} {
    catch $cmd res
    if {$res ne $expected} {puts "$cmd -> $res, not $expected"}
}

Scalar + Scalar

e.g. {vec + 1 2} -> 3

Scalar + Vector

e.g. {vec + 1 {1 2 3 4}} -> {2 3 4 5}

Vector / Scalar

e.g. {vec / {1 2 3 4} 2.} -> {0.5 1.0 1.5 2.0}

Vector + Vector

e.g. {vec + {1 2 3} {4 5 6}} -> {5 7 9}

Matrix * Scalar

e.g. {vec * {{1 2 3} {4 5 6}} 2} -> {{2 4 6} {8 10 12}}

Multiplying a 3x3 matrix with another:

e.g. {vec * {{1 2 3} {4 5 6} {7 8 9}} {{1 0 0} {0 1 0} {0 0 1}}} -> \
 {{1 0 0} {0 5 0} {0 0 9}}

The dot product of two vectors is a scalar. That's easily had too, given a sum function:

proc sum list {expr [join $list +]+0}
sum [vec * {1 2} {3 4}]

should result in 11 (= (1*3)+(2*4)).

Here's a little application for this: a vector factorizer, that produces the list of divisors for a given integer. For this we again need a 1-based integer range generator:

proc iota1 x {
    set res {}
    for {set i 1} {$i<=$x} {incr i} {lappend res $i}
    set res
}
e.g. {iota1 7}           -> {1 2 3 4 5 6 7}

#-- We can compute the modulo of a number by its index vector:
e.g. {vec % 7 [iota1 7]} -> {0 1 1 3 2 1 0}

#-- and turn all elements where the remainder is 0 to 1, else 0:
e.g. {vec == 0 [vec % 7 [iota1 7]]} -> {1 0 0 0 0 0 1}

At this point, a number is prime if the sum of the latest vector is 2. But we can also multiply out the 1s with the divisors from the i ndex vector:

e.g. {vec * [iota1 7] [vec == 0 [vec % 7 [iota1 7]]]} -> {1 0 0 0 0 0 7}

#-- Hence, 7 is only divisible by 1 and itself, hence it is a prime.
e.g. {vec * [iota1 6] [vec == 0 [vec % 6 [iota1 6]]]} -> {1 2 3 0 0 6}

So 6 is divisible by 2 and 3; non-zero elements in (lrange $divisors 1 end-1) gives the "proper" divisors. And three nested calls to vec are sufficient to produce the divisors list :)

Just for comparison, here's how it looks in J:

   iota1=.>:@i.
   iota1 7
1 2 3 4 5 6 7
   f3=.iota1*(0&=@|~iota1)
   f3 7
1 0 0 0 0 0 7
   f3 6
1 2 3 0 0 6

Integers as Boolean functions

Boolean functions, in which arguments and result are in the domain {true, false}, or {1, 0} as expr has it, and operators are e.g. {AND, OR, NOT} resp. {&&, ||, !}, can be represented by their truth table, which for example for {$a && $b} looks like:

As all but the last column just enumerate all possible combinations of the arguments, first column least-significant, the full representation of a&&b is the last column, a sequence of 0s and 1s which can be seen as binary integer, reading from bottom up: 1 0 0 0 == 8. So 8 is the associated integer of a&&b, but not only of this — we get the same integer for !(!a || !b), but then again, these functions are equivalent.

To try this in Tcl, here's a truth table generator that I borrowed from a little proving engine, but without the lsort used there — the order of cases delivered makes best sense when the first bit is least significant: }

proc truthtable n {
   # make a list of 2**n lists, each with n truth values 0|1
   set res {}
   for {set i 0} {$i < (1<<$n)} {incr i} {
       set case {}
       for {set j  0} {$j <$n} {incr j } {
           lappend case [expr {($i & (1<<$j)) != 0}]
       }
       lappend res $case
   }
   set res
}

Now we can write n(f), which, given a Boolean function of one or more arguments, returns its characteristic number, by iterating over all cases in the truth table, and setting a bit where appropriate:

proc n(f) expression {
   set vars [lsort -unique [regsub -all {[^a-zA-Z]} $expression " "]]
   set res 0
   set bit 1
   foreach case [truthtable [llength $vars]] {
       foreach $vars $case break
       set res [expr $res | ((($expression)!=0)*$bit)]
       incr bit $bit ;#-- <<1, or *2
   }
   set res
}

Experimenting:

% n(f) {$a && !$a} ;#-- contradiction is always false
0
% n(f) {$a || !$a} ;#-- tautology is always true
3
% n(f) {$a}        ;#-- identity is boring
2
% n(f) {!$a}       ;#-- NOT
1
% n(f) {$a && $b}  ;#-- AND
8
% n(f) {$a || $b}  ;#-- OR
14
% n(f) {!($a && $b)} ;#-- de Morgan's laws:
7
% n(f) {!$a || !$b}  ;#-- same value = equivalent
7

So the characteristic integer is not the same as the Goedel number of a function, which would encode the structure of operators used there.

% n(f) {!($a || $b)} ;#-- interesting: same as unary NOT
1
% n(f) {!$a && !$b}
1

Getting more daring, let's try a distributive law:

% n(f) {$p && ($q || $r)}
168
% n(f) {($p && $q) || ($p && $r)}
168

Daring more: what if we postulate the equivalence?

% n(f) {(($p && $q) || ($p && $r)) == ($p && ($q || $r))}
255

Without proof, I just claim that every function of n arguments whose characteristic integer is 2^(2^n) — 1 is a tautology (or a true statement — all bits are 1). Conversely, postulating non-equivalence turns out to be false in all cases, hence a contradiction:

% n(f) {(($p && $q) || ($p && $r)) != ($p && ($q || $r))}
0

So again, we have a little proving engine, and simpler than last time.

In the opposite direction, we can call a Boolean function by its number and provide one or more arguments — if we give more than the function can make sense of, non-false excess arguments lead to constant falsity, as the integer can be considered zero-extended:

proc f(n) {n args} {
   set row 0
   set bit 1
   foreach arg $args {
       set row [expr {$row | ($arg != 0)*$bit}]
       incr bit $bit
   }
   expr !!($n &(1<<$row))
}

Trying again, starting at OR (14):

% f(n) 14 0 0
0
% f(n) 14 0 1
1
% f(n) 14 1 0
1
% f(n) 14 1 1
1

So f(n) 14 indeed behaves like the OR function — little surprise, as its truth table (the results of the four calls), read bottom-up, 1110, is decimal 14 (8 + 4 + 2). Another test, inequality:

% n(f) {$a != $b}
6
% f(n) 6 0 0
0
% f(n) 6 0 1
1
% f(n) 6 1 0
1
% f(n) 6 1 1
0

Trying to call 14 (OR) with more than two args:

% f(n) 14 0 0 1
0
% f(n) 14 0 1 1
0
53 % f(n) 14 1 1 1
0

The constant 0 result is a subtle indication that we did something wrong :)

Implication (if a then b, a -> b) can in expr be expressed as $a <= $b — just note that the "arrow" seems to point the wrong way. Let's try to prove "Modus Barbara" — "if a implies b and b implies c, then a implies c":

% n(f) {(($a <= $b) && ($b <= $c)) <= ($a <= $c)}
255

With less abstract variable names, one might as well write

% n(f) {(($Socrates <= $human) && ($human <= $mortal)) <= ($Socrates <= $mortal)}
255

But this has been verified long ago, by Socrates' death :^)

Let unknown know

To extend Tcl, i.e. to make it understand and do things that before raised an error, the easiest way is to write a proc. Any proc must however be called in compliance with Tcl's fundamental syntax: first word is the command name, then the arguments separated by whitespace. Deeper changes are possible with the unknown command, which is called if a command name is, well, unknown, and in the standard version tries to call executables, to auto-load scripts, or do other helpful things (see the file init.tcl). One could edit that file (not recommended), or rename unknown to something else and provide one's own unknown handler, that falls through to the original proc if unsuccessful, as shown in Radical language modification.

Here is a simpler way that allows to extend unknown "in place" and incrementally: We let unknown "know" what action it shall take under what conditions. The know command is called with a condition that should result in an integer when given to expr, and a body that will be executed if cond results in nonzero, returning the last result if not terminated with an explicit return. In both cond and body you may use the variable args that holds the problem command unknown was invoked with.

proc know what {
   if ![info complete $what] {error "incomplete command(s) $what"}
   proc unknown args $what\n[info body unknown]
} ;# RS

The extending code what is prepended to the previous unknown body. This means that subsequent calls to know stack up, last condition being tried first, so if you have several conditions that fire on the same input, let them be "known" from generic to specific.

Here's a little debugging helper, to find out why "know" conditions don't fire:

proc know? {} {puts [string range [info body unknown] 0 511]}

Now testing what new magic this handful of code allows us to do. This simple example invokes expr if the "command" is digestible for it:

% know {if {![catch {expr $args} res]} {return $res}}
% 3+4
7

If we had no if

Imagine the makers of Tcl had failed to provide the if command. All the rest would be there. Doing more steps towards functional programming, I came upon this interesting problem, and will shortly demonstrate that it can easily be solved in pure-Tcl.

We still have the canonical truth values 0 and 1 as returned from expr with a comparison operator. The idea in the paper I read is to use them as names of very simple functions:

proc 0 {then else} {uplevel 1 $else}
proc 1 {then else} {uplevel 1 $then} ;# the famous K combinator

Glory be to the 11 rules of man Tcl that this is already a crude though sufficient reimplementation:

set x 42
[expr $x<100] {puts Yes} {puts No}

The bracketed expr command is evaluated first, returning 0 or 1 as result of the comparison. This result (0 or 1) is substituted for the first word of this command. The other words (arguments) are not substituted because they're curly-braced, so either 0 or 1 is invoked, and does its simple job. (I used uplevel instead of eval to keep all side effects in caller's scope). Formally, what happened to the bracketed call is that it went through "applicative order" evaluation (i.e., do it now), while the braced commands wait for "normal order" evaluation (i.e., do when needed, maybe never — the need is expressed through eval/upvar or similar commands).

Though slick at first sight, we actually have to type more. As a second step, we create the If command that wraps the expr invocation:

proc If {cond then else} {
   [uplevel 1 [list expr ($cond)!=0]] {uplevel 1 $then} {uplevel 1 $else}
}
If {$x>40} {puts Indeed} {puts "Not at all"}

This again passes impromptu tests, and adds the feature that any non-zero value counts as true and returns 1 — if we neglect the other syntactic options of if, especially the elseif chaining. However, this is no fundamental problem — consider that

if A then B elseif C then D else E

can be rewritten as

if A then B else {if C then D else E}

so the two-way If is about as mighty as the real thing, give or take a few braces and redundant keywords (then, else).

Luckily we have an if in Tcl (and it certainly fares better in byte-code compilation), but on leisurely evenings it's not the microseconds that count (for me at least) — it's rather reading on the most surprising (or fundamental) ideas, and demonstrating how easily Tcl can bring them to life...

Brute force meets Goedel

Never afraid of anything (as long as everything is a string), a discussion in the Tcl chatroom brought me to try the following: let the computer write ("discover") its own software, only given specifications of input and output. In truly brute force, up to half a million programs are automatically written and (a suitable subset of them) tested to find the one that passes the tests.

To make things easier, this flavor of "software" is in a very simple RPN language similar to, but much smaller than, the one presented in Playing bytecode: stack-oriented like Forth, each operation being one byte (ASCII char) wide, so we don't even need whitespace in between. Arguments are pushed on the stack, and the result of the "software", the stack at end, is returned. For example, in

ebc ++ 1 2 3

execution of the script "++" should sum its three arguments (1+(2+3)), and return 6.

Here's the "bytecode engine" (ebc: execute byte code), which retrieves the implementations of bytecodes from the global array cmd:

proc ebc {code argl} {
   set ::S $argl
   foreach opcode [split $code ""] {
       eval $::cmd($opcode)
   }
   set ::S
}

Let's now populate the bytecode collection. The set of all defined bytecodes will be the alphabet of this little RPN language. It may be interesting to note that this language has truly minimal syntax — the only rule is: each script ("word") composed of any number of bytecodes is well-formed. It just remains to check whether it does what we want.

Binary expr operators can be treated generically:

foreach op {+ - * /} {
   set cmd($op) [string map "@ $op" {swap; push [expr {[pop] @ [pop]}]}]
}

#-- And here's some more hand-crafted bytecode implementations
set cmd(d) {push [lindex $::S end]} ;# dup
set cmd(q) {push [expr {sqrt([pop])}]}
set cmd(^) {push [swap; expr {pow([pop],[pop])}]}
set cmd(s) swap

#-- The stack routines imply a global stack ::S, for simplicity
interp alias {} push {} lappend ::S
proc pop {}  {K [lindex $::S end] [set ::S [lrange $::S 0 end-1]]}
proc K {a b} {set a}
proc swap {} {push [pop] [pop]}

Instead of enumerating all possible bytecode combinations beforehand (which grows exponentially by alphabet and word length), I use this code from Mapping words to integers to step over their sequence, uniquely indexed by an increasing integer. This is something like the Goedel number of the corresponding code. Note that with this mapping, all valid programs (bytecode sequences) correspond to one unique non-negative integer, and longer programs have higher integers associated:

proc int2word {int alphabet} {
   set word ""
   set la [llength $alphabet]
   while {$int > 0} {
       incr int -1
       set word  [lindex $alphabet [expr {$int % $la}]]$word
       set int   [expr {$int/$la}]
   }
   set word
}

Now out for discovery! The toplevel proc takes a paired list of inputs and expected output. It tries in brute force all programs up to the specified maximum Goedel number and returns the first one that complies with all tests:

proc discover0 args {
   set alphabet [lsort [array names ::cmd]]
   for {set i 1} {$i<10000} {incr i} {
       set code [int2word $i $alphabet]
       set failed 0
       foreach {inputs output} $args {
           catch {ebc $code $inputs} res
           if {$res != $output} {incr failed; break}
       }
       if {!$failed} {return $code}
   }
}

But iterating over many words is still pretty slow, at least on my 200 MHz box, and many useless "programs" are tried. For instance, if the test has two inputs and wants one output, the stack balance is -1 (one less out than in). This is provided e.g. by one the binary operators +-*/. But the program "dd" (which just duplicates the top of stack twice) has a stack balance of +2, and hence can never pass the example test. So, on a morning dogwalk, I thought out this strategy:

measure the stack balance for each bytecode
iterate once over very many possible programs, computing their stack balance
partition them (put into distinct subsets) by stack balance
perform each 'discovery' call only on programs of matching stack balance

Here's this version. Single bytecodes are executed, only to measure their effect on the stack. The balance of longer programs can be computed by just adding the balances of their individual bytecodes:

proc bc'stack'balance bc {
   set stack {1 2} ;# a bytecode will consume at most two elements
   expr {[llength [ebc $bc $stack]]-[llength $stack]}
}
proc stack'balance code {
   set res 0
   foreach bc [split $code ""] {incr res $::balance($bc)}
   set res
}

The partitioning will run for some seconds (depending on nmax — I tried with several ten thousand), but it's needed only once. The size of partitions is further reduced by excluding programs which contain redundant code, that will have no effect, like swapping the stack twice, or swapping before an addition or multiplication. A program without such extravaganzas is shorter and yet does the same job, so it will have been tested earlier anyway.

proc partition'programs nmax {
   global cmd partitions balance
   #-- make a table of bytecode stack balances
   set alphabet [array names cmd]
   foreach bc $alphabet {
       set balance($bc) [bc'stack'balance $bc]
   }
   array unset partitions ;# for repeated sourcing
   for {set i 1} {$i<=$nmax} {incr i} {
       set program [int2word $i $alphabet]
       #-- "peephole optimizer" - suppress code with redundancies
       set ok 1
       foreach sequence {ss s+ s*} {
           if {[string first $sequence $program]>=0} {set ok 0}
       }
       if {$ok} {
           lappend partitions([stack'balance $program]) $program
       }
   }
   set program ;# see how far we got
}

The discoverer, Second Edition, determines the stack balance of the first text, and tests only those programs of the same partition:

proc discover args {
   global partitions
   foreach {in out} $args break
   set balance [expr {[llength $out]-[llength $in]}]
   foreach code $partitions($balance) {
       set failed 0
       foreach {input output} $args {
           catch {ebc $code $input} res
           if {$res != $output} {incr failed; break}
       }
       if {!$failed} {return $code}
   }
}

But now for the trying. The partitioning helps very much in reducing the number of candidates. For the 1000 programs with Goedel numbers 1..1000, it retains only a fraction for each stack balance:

-2: 75 
-1: 155 (this and 0 will be the most frequently used) 
0: 241 
1: 274 
2: 155 
3: 100

Simple starter — discover the successor function (add one):

% discover 5 6  7 8
dd/+

Not bad: duplicate the number twice, divide by itself to get the constant 1, and add that to the original number. However, it fails to work if we add the successor of 0 as another test case:

% discover 5 6  7 8  0 1

Nothing coming — because zero division made the last test fail. If we give only this test, another solution is found:

% discover 0 1
d^

"Take x to the x-th" power" — pow(0,0) gives indeed 1, but that's not the generic successor function.

More experiments to discover the hypot() function:

% discover {4 3} 5
d/+

Hm — the 3 is duplicated, divided by itself (=1), which is added to 4. Try to swap the inputs:

% discover {3 4} 5
q+

Another dirty trick: get square root of 4, add to 3 — presto, 5. The correct hypot() function would be

d*sd*+q

but my program set (nmax=30000) ends at 5-byte codes, so even by giving another test to force discovery of the real thing, it would never reach a 7-byte code. OK, I bite the bullet, set nmax to 500000, wait 5 minutes for the partitioning, and then:

% discover {3 4} 5  {11 60}  61
sd/+

Hm.. cheap trick again — it was discovered that the solution is just the successor of the second argument. Like in real life, test cases have to be carefully chosen. So I tried with another a^2+b^2=c^2 set, and HEUREKA! (after 286 seconds):

% discover {3 4} 5  {8 15} 17
d*sd*+q

After partitioning, 54005 programs had the -1 stack balance, and the correct result was on position 48393 in that list...

And finally, with the half-million set of programs, here's a solution for the successor function too:

% discover  0 1  4711 4712
ddd-^+

"d-" subtracts top of stack from itself, pushing 0; the second duplicate to the 0-th power gives 1, which is added to the original argument. After some head-scratching, I find it plausible, and possibly it is even the simplest possible solution, given the poorness of this RPN language.

Lessons learned:

Brute force is simple, but may demand very much patience (or faster hardware)
The sky, not the skull is the limit what all we can do with Tcl :)

Object orientation

OO (Object Orientation) is a style in programming languages popular since Smalltalk, and especially C++, Java, etc. For Tcl, there have been several OO extensions/frameworks (incr Tcl, XOTcl, stooop, Snit to name a few) in different flavors, but none can be considered as standard followed by a majority of users. However, most of these share the features

classes can be defined, with variables and methods
objects are created as instances of a class
objects are called with messages to perform a method

Of course, there are some who say: "Advocating object-orientated programming is like advocating pants-oriented clothing: it covers your behind, but often doesn't fit best" ...

Bare-bones OO

Quite a bunch of what is called OO can be done in pure Tcl without a "framework", only that the code might look clumsy and distracting. Just choose how to implement instance variables:

in global variables or namespaces
or just as parts of a transparent value, with TOOT

The task of frameworks, be they written in Tcl or C, is just to hide away gorey details of the implementation — in other words, sugar it :) On the other hand, one understands a clockwork best when it's outside the clock, and all parts are visible — so to get a good understanding of OO, it might be most instructive to look at a simple implementation.

As an example, here's a Stack class with push and pop methods, and an instance variable s — a list that holds the stack's contents:

namespace eval Stack {set n 0}

proc Stack::Stack {} { #-- constructor
  variable n
  set instance [namespace current]::[incr n]
  namespace eval $instance {variable s {}}
  interp alias {} $instance {} ::Stack::do $instance
}

The interp alias makes sure that calling the object's name, like

::Stack::1 push hello

is understood and rerouted as a call to the dispatcher below:

::Stack::do ::Stack::1 push hello

The dispatcher imports the object's variables (only s here) into local scope, and then switches on the method name:

proc Stack::do {self method args} { #-- Dispatcher with methods
  upvar #0 ${self}::s s
  switch -- $method {
      push {eval lappend s $args}
      pop  {
          if ![llength $s] {error "stack underflow"}
          K [lindex $s end] [set s [lrange $s 0 end-1]]
      }
      default {error "unknown method $method"}
  }
}
proc K {a b} {set a}

A framework would just have to make sure that the above code is functionally equivalent to, e.g. (in a fantasy OO style):

class Stack {
   variable s {}
   method push args {eval lappend s $args}
   method pop {} {
          if ![llength $s] {error "stack underflow"}
          K [lindex $s end] [set s [lrange $s 0 end-1]]
   }
}

which, I admit, reads definitely better. But bare-bones has its advantages too: in order to see how a clockwork works, you'd better have all parts visible :)

Now testing in an interactive tclsh:

% set s [Stack::Stack] ;#-- constructor
::Stack::1             ;#-- returns the generated instance name

% $s push hello
hello
% $s push world
hello world

% $s pop
world
% $s pop
hello
% $s pop
stack underflow       ;#-- clear enough error message

% namespace delete $s ;#-- "destructor"

TOOT: transparent OO for Tcl

Transparent OO for Tcl, or TOOT for short, is a very amazing combination of Tcl's concept of transparent values, and the power of OO concepts. In TOOT, the values of objects are represented as a list of length 3: the class name (so much for "runtime type information" :-), a "|" as separator and indicator, and the values of the object, e.g.

{class | {values of the object}}

Here's my little take on toot in a nutshell. Classes in C++ started out as structs, so I take a minimal struct as example, with generic get and set methods. We will export the get and set methods:

namespace eval toot {namespace export get set}

proc toot::struct {name members} {
   namespace eval $name {namespace import -force ::toot::*}
   #-- membership information is kept in an alias:
   interp alias {} ${name}::@ {} lsearch $members
}

The two generic accessor functions will be inherited by "struct"s

proc toot::get {class value member} {
   lindex $value [${class}::@ $member]
}

The set method does not change the instance (it couldn't, as it sees it only "by value") — it just returns the new composite toot object, for the caller to do with it what he wants:

proc toot::set {class value member newval} {
   ::set pos [${class}::@ $member]
   list $class | [lreplace $value $pos $pos $newval]
}

For the whole thing to work, here's a simple overloading of unknown — see "Let unknown know". It augments the current unknown code, at the top, with a handler for

{class | values} method args

patterns, which converts it to the form

::toot::(class)::(method) (class) (values) (args)

and returns the result of calling that form:

proc know what {proc unknown args $what\n[info body unknown]}

Now to use it (I admit the code is no easy reading):

know {
   set first [lindex $args 0]
   if {[llength $first]==3 && [lindex $first 1] eq "|"} {
       set class [lindex $first 0]
       return [eval ::toot::${class}::[lindex $args 1] \
           $class [list [lindex $first 2]] [lrange $args 2 end]]
   }
}

Testing: we define a "struct" named foo, with two obvious members:

toot::struct foo {bar grill}

Create an instance as pure string value:

set x {foo | {hello world}}
puts [$x get bar] ;# -> hello (value of the "bar" member)

Modify part of the foo, and assign it to another variale:

set y [$x set grill again]
puts $y ;# -> foo | {hello again}

Struct-specific methods can be just procs in the right namespace. The first and second arguments are the class (disregarded here, as the dash shows) and the value, the rest is up to the coder. This silly example demonstrates member access and some string manipulation:

proc toot::foo::upcase {- values which string} {
   string toupper [lindex $values [@ $which]]$string
}

puts [$y upcase grill !] ;# -> AGAIN!

A little deterministic Turing machine

At university, I never learned much about Turing machines. Only decades later, a hint in the Tcl chatroom pointed me to http://csc.smsu.edu/~shade/333/project.txt , an assignment to implement a Deterministic Turing Machine (i.e. one with at most one rule per state and input character), which gives clear instructions and two test cases for input and output, so I decided to try my hand in Tcl.

Rules in this little challenge are of the form a bcD e, where

a is the state in which they can be applied
b is the character that must be read from tape if this rule is to apply
c is the character to write to the tape
D is the direction to move the tape after writing (R(ight) or L(eft))
e is the state to transition to after the rule was applied

Here's my naive implementation, which takes the tape just as the string it initially is. I only had to take care that when moving beyond its ends, I had to attach a space (written as _) on that end, and adjust the position pointer when at the beginning. Rules are also taken as strings, whose parts can easily be extracted with string index — as it's used so often here, I alias it to @. }

proc dtm {rules tape} {
   set state 1
   set pos 0
   while 1 {
       set char [@ $tape $pos]
       foreach rule $rules {
           if {[@ $rule 0] eq $state && [@ $rule 2] eq $char} {
               #puts rule:$rule,tape:$tape,pos:$pos,char:$char
               #-- Rewrite tape at head position.
               set tape [string replace $tape $pos $pos [@ $rule 3]]
               #-- Move tape Left or Right as specified in rule.
               incr pos [expr {[@ $rule 4] eq "L"? -1: 1}]
               if {$pos == -1} {
                   set pos 0
                   set tape _$tape
               } elseif {$pos == [string length $tape]} {
                   append tape _
               }
               set state [@ $rule 6]
               break
           }
       }
       if {$state == 0} break
   }
   #-- Highlight the head position on the tape.
   string trim [string replace $tape $pos $pos \[[@ $tape $pos]\]] _
}

interp alias {} @ {} string index

Test data from http://csc.smsu.edu/~shade/333/project.txt

set rules {
   {1 00R 1}
   {2 01L 0}
   {1 __L 2}
   {2 10L 2}
   {2 _1L 0}
   {1 11R 1}
}
set tapes {
   0
   10011
   1111
}
set rules2 {
   {3 _1L 2}
   {1 _1R 2}
   {1 11L 3}
   {2 11R 2}
   {3 11R 0}
   {2 _1L 1}
}
set tapes2 _

Testing:

foreach tape $tapes {puts [dtm $rules $tape]}
puts *
puts [dtm $rules2 $tapes2]

reports the results as wanted in the paper, on stdout:

>tclsh turing.tcl
[_]1
1[0]100
[_]10000
*
1111[1]1

Streams

Streams are a powerful concept in (not only functional) programming. In SICP chapter 3.5, streams are introduced as data structures characterized as "delayed lists", whose elements are produced and returned only on demand (deferred evaluation). This way, a stream can promise to be a potentially endless source of data, while taking only finite time to process and deliver what's really wanted. Other streams may provide a finite but very large number of elements, which would be impractical to process in one go. In Tcl, the two ways of reading a file are a good example:

read $fp returns the whole contents, which then can be processed;
while {[gets $fp line]>-1} {...} reads line by line, interleaved with processing

The second construct may be less efficient, but is robust for gigabyte-sized files. A simpler example is pipes in Unix/DOS (use TYPE for cat there):

cat foo.bar | more

where the "cat" delivers lines of the file as long as "more" will take them, and waits otherwise (after all, stdin and stdout are just streams...). Such process chains can be emulated in Tcl with the following rules:

A stream is modelled here as a procedure that returns one stream item on each call. The special item "" (the empty string) indicates that the stream is exhausted. Streams are interesting if they don't deliver the same result on every call, which requires them to maintain state between calls e.g. in static variables (here implemented with the fancy remember proc) — examples are intgen that delivers ever increasing integers, or gets $fp where the file pointer advances at each call, so potentially all lines of the file are returned over time.

A filter takes one or more streams, and possibly other arguments, and reacts like a stream too. Hence, streams can be (and typically are) nested for processing purposes. If a filter meets end-of-stream, it should return that too. Filters may be characterized as "selectors" (who may return only part of their input, like "grep") and/or "appliers" who call a command on their input and return the result. Note that on infinite streams, selectors may never return, e.g. if you want the second even prime... Streams in general should not be written in brackets (then the Tcl parser would eagerly evaluate them before evaluating the command), but braced, and stream consumers eval the stream at their discretion.

Before we start, a word of warning: maintaining state of a procedure is done with default arguments that may be rewritten. To prevent bugs from procedures whose defaults have changed, I've come up with the following simple architecture — procs with static variables are registered as "sproc"s, which remembers the initial defaults, and with a reset command you can restore the initial values for one or all sprocs:

proc sproc {name head body} {
   set ::sproc($name) $head
   proc $name $head $body
}

proc reset { {what *}} {
   foreach name [array names ::sproc $what] {
       proc $name $::sproc($name) [info body $name]
   }
}

Now let's start with a simple stream source, "cat", which as a wrapper for gets returns the lines of a file one by one until exhausted (EOF), in which case an empty string is returned (this requires that empty lines in the files, which would look similarly, are represented as a single blank):

sproc cat {filename {fp {}} } {
   if {$fp==""} {
       remember fp [set fp [open $filename]]
   }
   if {[gets $fp res]<0} {
       remember fp [close $fp] ;# which returns an empty string ;-)
   } elseif {$res==""} {set res " "} ;# not end of stream!
   set res
}

proc remember {argn value} {
   # - rewrite a proc's default arg with given value
   set procn [lindex [info level -1] 0] ;# caller's name
   set argl {}
   foreach arg [info args $procn] {
       if [info default $procn $arg default] {
           if {$arg==$argn} {set default $value}
           lappend argl [list $arg $default]
       } else {
           lappend argl $arg
       }
   }
   proc $procn $argl [info body $procn]
   set value
}
# This simple but infinite stream source produces all positive integers:
sproc intgen { {seed -1}} {remember seed [incr seed]}

# This produces all (well, very many) powers of 2:
sproc powers-of-2 { {x 0.5}} {remember x [expr $x*2]}

# A filter that reads and displays a stream until user stops it:
proc more {stream} {
   while 1 {
       set res [eval $stream]
       if {$res==""} break ;# encountered end of stream
       puts -nonewline $res; flush stdout
       if {[gets stdin]=="q"} break
   }
}

Usage example:

more {cat streams.tcl}

which crudely emulates the Unix/DOS pipe mentioned above (you'll have to hit ↵ Enter after every line, and q↵ Enter to quit..). more is the most important "end-user" of streams, especially if they are infinite. Note however that you need stdin for this implementation, which excludes wishes on Windows (one might easily write a UI-more that reacts on mouse clicks, though).

A more generic filter takes a condition and a stream, and on each call returns an element of the input stream where the condition holds — if ever one comes along:

proc filter {cond stream} {
   while 1 {
       set res [eval $stream]
       if {$res=="" || [$cond $res]} break
   }
   set res
}

# Here is a sample usage with famous name:
proc grep {re stream} {
   filter [lambda [list x [list re $re]] {regexp $re $x}] $stream
}

#.... which uses the (less) famous function maker:
proc lambda {args body} {
   set name [info level 0]
   proc $name $args $body
   set name
}
# Usage example: more {grep this {cat streams.tcl}}

Friends of syntactic sugar might prefer shell style:

$ cat streams.tcl | grep this | more

and guess what, we can have that in Tcl too (and not in Scheme !-), by writing a proc, that also resets all sprocs, with the fancy name "$" (in Unix, this could be the shell prompt that you don't type, but for Tcl we always have to have the command name as first word):

proc $ args {
    reset
    set cmd {}
    foreach arg $args {
       if {$arg != "|"} {
           lappend tmp $arg
       } else {
           set cmd [expr {$cmd==""? $tmp: [lappend tmp $cmd]}]
           set tmp {}
       }
   }
   uplevel 1 [lappend tmp $cmd]
}

To prove that we haven't cheated by using exec, let's introduce a line counter filter:

sproc -n {stream {n 0}} {
   set res [eval $stream]
   if {$res!=""} {set res [remember n [incr n]]:$res}
}

This can be added to filter chains, to count lines in the original file, or only the results from grep:

$ cat streams.tcl | -n | grep this | more
$ cat streams.tcl | grep this | -n | more

We further observe that more has a similar structure to filter, so we could also rewrite it in terms of that:

proc more2 stream {
   filter [lambda x {
       puts -nonewline $x; flush stdout
       expr {[gets stdin]=="q"}
   }] $stream
}

# Here is another stream producer that returns elements from a list:
sproc streamlist {list {todo {}} {firstTime 1} } {
   if $firstTime {set todo $list; remember firstTime 0}
   remember todo [lrange $todo 1 end]
   lindex $todo 0
}

# This one repeats its list endlessly, so better use it with 'more':
sproc infinite-streamlist {list {todo {}} } {
   initially todo $list
   remember  todo [lrange $todo 1 end]
   lindex   $todo 0
}

# This is sugar for first-time assignment of static variables:
proc initially {varName value} {
   upvar 1 $varName var
   if {$var==""} {set var $value}
}

# But for a simple constant stream source, just use [subst]:
# more {subst 1} ;# will produce as many ones as you wish

# This filter collects its input (should be finite ;-) into a list:
proc collect stream {
   set res {}
   while 1 {
       set element [eval $stream]
       if {$element==""} break
       lappend res $element
   }
   set res
}

The sort filter is unusual in that it consumes its whole (finite!) input, sorts it, and acts as a stream source on the output:

sproc sort {stream {todo {}} {firstTime 1}} {
   if $firstTime {
       set todo [lsort [collect $stream]]
       remember firstTime 0
   }
   remember todo [lrange $todo 1 end]
   lindex $todo 0
}
# $ streamlist {foo bar grill a} | sort | collect => a bar foo grill

proc apply {f stream} {$f [eval $stream]}

#... This can be plugged into a filter chain to see what's going on:
proc observe stream {apply [lambda y {puts $y; set y}] $stream}

# ... or, to get a stream of even numbers, starting from 0:
more {apply [lambda x {expr $x*2}] intgen}

Now for the example in SICP: find the second prime in the interval between 10000 and 1000000.

sproc interval {from to {current {}} } {
   initially current $from
   if {$current<=$to} {
       remember current [expr $current+1]
   }
}
proc prime? x {
   if {$x<2} {return 0}
   set max [expr sqrt($x)]
   set try 2
   while {$try<=$max} {
       if {$x%$try == 0} {return 0}
       incr try [expr {2-($try==2)}]
   }
   return 1
}
proc stream-index {stream index} {
   for {set i 0} {$i<=$index} {incr i} {
       set res [eval $stream]
   }
   set res
}
sproc stream-range {stream from to {pos 0}} {
   while {$pos<$from} {
       set res [eval $stream] ;# ignore elements before 'from'
       if {$res==""} return   ;# might be end-of-stream
       incr pos
   }
   if {$to!="end" && $pos > $to} return
   remember pos [incr pos]
   eval $stream
}

stream-index {filter prime? {interval 10000 1000000}} 1 ==> 10009

Another idea from SICP is a "smoothing" function, that averages each pair of values from the input stream. For this we need to introduce a short-term memory also in the filter:

sproc average {stream {previous {}} } {
   if {$previous==""} {set previous [eval $stream]}
   remember previous [set current [eval $stream]]
   if {$current!=""} {expr {($previous+$current)/2.}}
}

which, tested on a n-element stream, returns n-1 averages:

collect {average {streamlist {1 2 3 4 5}}} ==> 1.5 2.5 3.5 4.5

Yet another challenge was to produce an infinite stream of pairs {i j} of positive integers, i <= j, ordered by their sum, so that more pairs produces consecutively

{1 1} {1 2} {1 3} {2 2} {1 4} {2 3} {1 5} {2 4} {3 3} {1 6} ...

Here's my solution which does that:

sproc pairs { {last {}} } {
   if {$last==""} {
       set last [list 1 1] ;# start of iteration
   } else {
       foreach {a b} $last break
       if {$a >= $b-1} {
           set last [list 1 [expr {$a+$b}]] ;# next sum level
       } else {
           set last [list [incr a] [incr b -1]]
       }
   }
   remember last $last
}

Ramanujan numbers: The pairs generator can be used to find Ramanujan numbers, which can be represented as the sum of two integer cubes in more than one way. Here I use a global array for recording results:

sproc Ramanujan {stream {firstTime 1}} {
   if $firstTime {unset ::A; remember firstTime 0}
   while 1 {
       set pair [eval $stream]
       foreach {a b} $pair break
       set n [expr {$a*$a*$a + $b*$b*$b}]
       if [info exists ::A($n)] {
           lappend ::A($n) $pair
           break
       } else {set ::A($n) [list $pair]}
   }
   list $n $::A($n)
}

more {Ramanujan pairs} ;# or: $ pairs | Ramanujan | more

delivers in hardly noticeable time the R. numbers 1729, 4104, 13832... Or, how's this infinite Fibonacchi number generator, which on more fibo produces all the F.numbers (0,1,1,2,3,5,8,13,21...) you might want?

sproc fibo { {a ""} {b ""}} {
   if {$a==""} {
       remember a 0
   } elseif {$b==""} {
       remember b 1
   } else {
       if {$b > 1<<30} {set b [expr double($b)]}
       remember a $b
       remember b [expr $a+$b]
   }
}

Discussion: With the above code, it was possible to reproduce quite some behavior of streams as documented in SICP, not as data structures but with Tcl procs (though procs are data too, in some sense...). What's missing is the capability to randomly address parts of a stream, as is possible in Scheme (and of course their claim to do without assignment, or mutable data...) Tcl lists just don't follow LISP's CAR/CDR model (though KBK demonstrated in Tcl and LISP that this structure can be emulated, also with procs), but rather C's flat *TclObject[] style. The absence of lexical scoping also led to constructs like sproc/reset, which stop a gap but aren't exactly elegant — but Tcl's clear line between either local or global variables allows something like closures only by rewriting default arguments like done in remember (or like in Python).

Don't take this as a fundamental critique of Tcl, though — its underlying model is far more simple and elegant than LISP's (what with "special forms", "reader macros"...), and yet powerful enough to do just about everything possible...

Playing with Laws of Form

After many years, I re-read

G. Spencer-Brown, "Laws of Form". New York: E.P. Dutton 1979

which is sort of a mathematical thriller, if you will. Bertrand Russell commented that the author "has revealed a new calculus, of great power and simplicity" (somehow sounds like Tcl ;^). In a very radical simplification, a whole world is built up by two operators, juxtaposition without visible symbol (which could be likened to or) and a overbar-hook (with the meaning of not) that I can't type here — it's a horizontal stroke over zero or more operands, continued at right by a vertical stroke going down to the baseline. In these Tcl experiments, I use "" for "" and angle-brackets <> for the overbar-hook (with zero or more operands in between).

One point that was new for me is that the distinction between operators and operands is not cast in stone. Especially constants (like "true" and "false" in Boolean algebras) can be equally well expressed as neutral elements of operators, if these are considered variadic, and having zero arguments. This makes sense, even in Tcl, where one might implement them as

proc and args {
   foreach arg $args {if {![uplevel 1 expr $arg]} {return 0}}
   return 1
}

proc or args {
   foreach arg $args {if {[uplevel 1 expr $arg]} {return 1}}
   return 0
}

which, when called with no arguments, return 1 or 0, respectively. So [or] == 0 and [and] == 1. In Spencer-Brown's terms, [] (which is "", the empty string with no arguments) is false ("nil" in LISP), and [<>] is the negation of "", i.e. true. His two axioms are:

<><> == <> "to recall is to call       -- (1 || 1) == 1"
<<>> ==    "to recross is not to cross -- !!0 == 0"

and these can be implemented by a string map that is repeated as long as it makes any difference (sort of a trampoline) to simplify any expression consisting only of operators and constants (which are operators with zero arguments):

proc lf'simplify expression {
   while 1 {
       set res [string map {<><> <> <<>> ""} $expression]
       if {$res eq $expression} {return $res}
       set expression $res
   }
}

Testing:

% lf'simplify <<><>><>
<>

which maps <><> to <>, <<>> to "", and returns <> for "true".

% lf'simplify <a>a
<a>a

In the algebra introduced here, with a variable "a", no further simplification was so far possible. Let's change that — "a" can have only two values, "" or <>, so we might try to solve the expression by assuming all possible values for a, and see if they differ. If they don't, we have found a fact that isn't dependent on the variable's value, and the resulting constant is returned, otherwise the unsolved expression:

proc lf'solve {expression var} {
   set results {}
   foreach value {"" <>} {
       set res [lf'simplify [string map [list $var $value] $expression]]
       if {![in $results $res]} {lappend results $res}
       if {[llength $results] > 1} {return $expression}
   }
   set results
}

with a helper function in that reports containment of an element in a list:

proc in {list element} {expr {[lsearch -exact $list $element] >= 0}}

Testing:

% lf'solve <a>a a
<>

which means, in expr terms, {(!$a || $a) == 1}, for all values of a. In other words, a tautology. All of Boole's algebra can be expressed in this calculus:

* (1) not a       == !$a       == <a>
* (2) a or b      == $a || $b  == ab
* (3) a and b     == $a && $b  == <<a>&lt;b&gt;>
* (4) a implies b == $a <= $b  == <a>b

We can test it with the classic "ex contradictione quodlibet" (ECQ) example — "if p and not p, then q" for any q:

% lf'solve <&lt;p><&lt;p>>>q p
q

So formally, q is true, whatever it is :) If this sounds overly theoretic, here's a tricky practical example in puzzle solving, Lewis Carroll's last sorites (pp. 123f.). The task is to conclude something from the following premises:

The only animals in this house are cats
Every animal is suitable for a pet, that loves to gaze at the moon
When I detest an animal, I avoid it
No animals are carnivorous, unless they prowl at night
No cat fail to kill mice
No animals ever take to me, except what are in this house
Kangaroos are not suitable for pets
None but carnivora kill mice
I detest animals that do not take to me
Animals that prowl at night always love to gaze at the moon

These are encoded to the following one-letter predicates:

a: avoided by me
c: cat
d: detested by me
h: house, in this
k: kill mice
m: moon, love to gaze at
n: night, prowl at
p: pet, suitable for
r: (kanga)roo
t: take to me
v: (carni)vorous

So the problem set can be restated, in Spencer-Brown's terms, as

<h>c <m>p <d>a <v>n <c>k <t>h <r><p> <k>v td <n>m

I first don't understand why all premises can be just written in a row, which amounts to implicit "or", but it seems to work out well. As we've seen that <x>x is true for any x, we can cancel out such tautologies. For this, we reformat the expression to a list of values of type x or !x, that is in turn dumped into a local array for existence checking. And when both x and !x exist, they are removed from the expression:

proc lf'cancel expression {
   set e2 [string map {"< " ! "> " ""} [split $expression ""]]
   foreach term $e2 {if {$term ne ""} {set a($term) ""}}
   foreach var [array names a ?] {
       if [info exists a(!$var)] {
           set expression [string map [list <$var> "" $var ""] $expression]
       }
   }
   set expression
}

puts [lf'cancel {<h>c <m>p <d>a <v>n <c>k <t>h <r>&lt;p> <k>v td <n>m}]

which results in:

a <r>

translated back: "I avoid it, or it's not a kangaroo", or, reordered, "<r> a" which by (4) means, "All kangaroos are avoided by me".

A little IRC chat bot

Here is a simple example of a "chat bot" — a program that listens on an IRC chatroom, and sometimes also says something, according to its programming. The following script

connects to channel #tcl on IRC
listens to what is said
if someone mentions its name (minibot), tries to parse the message and answer.

#!/usr/bin/env tclsh
set ::server irc.freenode.org
set ::chan   #tcl
set ::me     minibot
proc recv {} {
    gets $::fd line
    puts $line
    # handle PING messages from server
    if {[lindex [split $line] 0] eq "PING"} {
       send "PONG [info hostname] [lindex [split $line] 1]"; return
    }
    if {[regexp {:([^!]*)![^ ].* +PRIVMSG ([^ :]+) +(.*[Mm]inibot)(.+)} $line -> \
        nick target msg cmd]} {
           if {$nick eq "ijchain"} {regexp {<([^>]+)>(.+)} $msg -> nick msg}
           set hit 0
           foreach pattern [array names ::patterns] {
               if [string match "*$pattern*" $cmd] {
                   set cmd [string trim $cmd {.,:? }]
                   if [catch {mini eval $::patterns($pattern) $cmd} res] {
                       set res $::errorInfo
                   }
                   foreach line [split $res \n] {
                       send "PRIVMSG $::chan :$line"
                   }
                   incr hit
                   break
               }
           }
           if !$hit {send "PRIVMSG $::chan :Sorry, no idea."}
    }
}

#----------- Patterns for response:

set patterns(time) {clock format [clock sec] ;#}
set patterns(expr) safeexpr
proc safeexpr args {expr [string map {\[ ( \] ) expr ""} $args]}
set patterns(eggdrop) {set _ "Please check http://wiki.tcl.tk/6601" ;#}
set patterns(toupper) string
set patterns(Windows) {set _ "I'd prefer not to discuss Windows..." ;#}
set {patterns(translate "good" to Russian)} {set _ \u0425\u043E\u0440\u043E\u0448\u043E ;#}
set patterns(Beijing) {set _ \u5317\u4EAC ;#}
set patterns(Tokyo) {set _ \u4E1C\u4EAC ;#}
set {patterns(your Wiki page)} {set _ http://wiki.tcl.tk/20205 ;#}
set patterns(zzz) {set _ "zzz well!" ;#}
set patterns(man) safeman
proc safeman args {return http://www.tcl.tk/man/tcl8.4/TclCmd/[lindex $args 1].htm}
set {patterns(where can I read about)} gotowiki
proc gotowiki args {return "Try http://wiki.tcl.tk/[lindex $args end]"}
set patterns(thank) {set _ "You're welcome." ;#}
set patterns(worry) worry
proc worry args {
   return "Why do [string map {I you my your your my you me} $args]?"
}

#-- let the show begin... :^)
interp create -safe mini
foreach i {safeexpr safeman gotowiki worry} {
    interp alias mini $i {} $i
}
proc in {list element} {expr {[lsearch -exact $list $element]>=0}}
proc send str {puts $::fd $str;flush $::fd}

set ::fd [socket $::server 6667]
fconfigure $fd  -encoding utf-8
send "NICK minibot"
send "USER $::me 0 * :PicoIRC user"
send "JOIN $::chan"
fileevent $::fd readable recv

vwait forever

Examples from the chat:

suchenwi  minibot, which is your Wiki page?
<minibot> http://wiki.tcl.tk/20205
suchenwi  ah, thanks
suchenwi  minibot expr 6*7
<minibot> 42
suchenwi  minibot, what's your local time?
<minibot> Sun Oct 21 01:26:59 (MEZ) - Mitteleurop. Sommerzeit 2007