Introduction to newLISP/Working with files

Working with files edit

Functions for working with files can be grouped into two main categories: interacting with the operating system, and reading and writing data to and from files.

Interacting with the file system edit

The concept of a current working directory is maintained in newLISP. When you start newLISP by typing newLISP in a terminal, your current working directory becomes newLISP's current directory.

$ pwd
/Users/me/projects/programming/lisp
$ newlisp
newLISP v.9.3 on OSX UTF-8, execute 'newlisp -h' for more info.

> (env "PWD")
"/Users/me/projects/programming/lisp"
> (exit)

$ pwd
/Users/me/projects/programming/lisp
$


You can also check your current working directory using the real-path function without arguments:

(real-path)
;-> "/Users/me/projects/programming/lisp"

But when you run a newLISP script from elsewhere, such as from inside a text editor, the current working directory and other settings may be different. It's a good idea, therefore, to use change-dir to establish the current working directory, if you're not sure:

(change-dir "/Users/me/Documents")
;-> true

Other environment variables can be accessed with env:

(env "HOME")
;-> "/Users/me"
(env "USER")
;-> "cormullion"

Again, running newLISP from a text editor rather than in an interactive terminal session will affect which environment variables are available and their values.

Once you have the correct working directory, use directory to list its contents:

(directory)
;-> ("." ".." ".bash_history" ".bash_profile" ".inputrc" ".lpoptions"
; ".sqlite_history" ".ssh" ".subversion" "bin" "Desktop" "Desktop Folder"
; "Documents" "Library" ...

and so on. Notice that it gives you the relative filenames, not the absolute pathnames. directory can list the contents of directories other than the current working directory too, if you supply a path. And in this extended form, you can use regular expressions to filter the contents:

(directory "./")                        ; just a pathname
;-> ("." ".." ".bash_history" ".bash_profile" ".inputrc" ".lpoptions"
; ".sqlite_history" ".ssh" ".subversion" "bin" "Desktop" "Desktop Folder"
; "Documents" "Library" ...

(directory "./" {^[^.]})                ; exclude files starting "."
;-> ("bin" "Desktop" "Desktop Folder" "Documents" "Library" ... )

Again, notice that the results are relative to the current working directory. It's often useful to store the path of the directory that you're listing, in case you have to use it later to build full pathnames. real-path returns the full pathname of a file or directory, either in the current working directory:

(real-path ".subversion")
;-> "/Users/me/.subversion"

or specified by another relative pathname:

(real-path "projects/programming/lisp/lex.lsp")
;-> "/Users/me/projects/programming/lisp/lex.lsp"

To find the containing directory of an item on disk, you can just remove the file name from the full pathname:

(set 'f "lex.lsp")
(replace f (real-path f) "")
;-> "/Users/me/projects/programming/lisp/"

This won't always work, by the way, if the file name appears as a directory name as well earlier in the path. A simple solution is to do a regex search for f occurring only at the very end of the pathname, using the $ option:

(replace (string f "\$") (real-path f) "" 0)

To scan a section of your file system recursively, use a function that calls itself recursively. Here, just the full path name is printed:

(define (search-tree dir)
 (dolist (item (directory dir {^[^.]}))
   (if (directory? (append dir item))
    ; search the directory
    (search-tree (append dir item "/"))
    ; or process the file
    (println (append dir item)))))
    
(search-tree {/usr/share/newlisp/})
/usr/share/newlisp/guiserver/allfonts-demo.lsp
/usr/share/newlisp/guiserver/animation-demo.lsp
...
/usr/share/newlisp/util/newlisp.vim
/usr/share/newlisp/util/syntax.cgi

See also Editing text files in folders and hierarchies.

You'll find some testing functions useful:

  • file? does this file or directory exist?
  • directory? is this pathname a directory or a file?

Remember the difference between relative and absolute pathnames:

(file? "System")
;-> nil
(file? "/System")
;-> true

File information edit

You can get information about a file with file-info. This function asks the operating system about the file and returns the information in a series of numbers:

  • 0 is the size
  • 1 is the mode
  • 2 is the device mode
  • 3 is the user id
  • 4 is the group id
  • 5 the access time
  • 6 the modification time
  • 7 the status change time

To find out the size of files, for example, look at the first number returned by file-info. The following code lists the files in a directory and includes their sizes too.

(set 'dir {/usr/share/newlisp/modules})
(dolist (i (directory dir {^[^.]}))
  (set 'item (string dir "/" i))
  (if (not (directory? item))
      (println (format {%7d %-30s} (nth 0 (file-info item)) item))))
  35935 /usr/share/newlisp/modules/canvas.lsp
   6548 /usr/share/newlisp/modules/cgi.lsp
   5460 /usr/share/newlisp/modules/crypto.lsp
   4577 /usr/share/newlisp/modules/ftp.lsp
  16310 /usr/share/newlisp/modules/gmp.lsp
   4273 /usr/share/newlisp/modules/infix.lsp
  12973 /usr/share/newlisp/modules/mysql.lsp
  16606 /usr/share/newlisp/modules/odbc.lsp
   9865 /usr/share/newlisp/modules/pop3.lsp
  12835 /usr/share/newlisp/modules/postgres.lsp
  31416 /usr/share/newlisp/modules/postscript.lsp
   4337 /usr/share/newlisp/modules/smtp.lsp
  10433 /usr/share/newlisp/modules/smtpx.lsp
  16955 /usr/share/newlisp/modules/sqlite3.lsp
  21807 /usr/share/newlisp/modules/stat.lsp
   7898 /usr/share/newlisp/modules/unix.lsp
   6979 /usr/share/newlisp/modules/xmlrpc-client.lsp
   3366 /usr/share/newlisp/modules/zlib.lsp


Notice that we stored the name of the directory in dir. The directory function returns relative file names, but you must pass absolute pathname strings to file-info unless the string refers to a file that's in the current working directory.

You can use implicit addressing to select the item you want. So instead of (nth 0 (file-info item)), you can write (file-info item 0).

MacOS X: resource forks edit

If you tried the previous script on MacOS X, you might notice that some files are 0 bytes in size. This might indicate the presence of the dual-fork system inherited from the days of the classic (ie old) Macintosh. Use the following version to access the resource forks of files. A good hunting ground for the increasingly rare resource fork is the fonts folder:

(set 'dir "/Users/me/Library/Fonts") ; fonts folder
(dolist (i (directory dir "^[^.]"))
 (set 'item (string dir "/" i))
  (and
    (not (directory? item))          ; don't do folders
    (println 
      (format "%9d DF %-30s" (nth 0 (file-info item)) item))
    (file? (format "%s/..namedfork/rsrc" item)) ; there's a resource fork too
    (println (format "%9d RF" 
      (first (file-info (format "%s/..namedfork/rsrc" item)))))))
...
        0 DF /Users/me/Library/Fonts/AvantGarBoo
    26917 RF  
        0 DF /Users/me/Library/Fonts/AvantGarBooObl
    34982 RF  
        0 DF /Users/me/Library/Fonts/AvantGarDem
    27735 RF  
        0 DF /Users/me/Library/Fonts/AvantGarDemObl
    35859 RF  
        0 DF /Users/me/Library/Fonts/ITC Avant Garde Gothic 1
   116262 RF  
...


File management edit

To manage files, you can use the following functions:

  • rename-file renames a file or directory
  • copy-file copies a file
  • delete-file deletes a file
  • make-dir makes a new directory
  • remove-dir removes an empty directory

For example, to renumber all files in the current working directory so that the files sort by modification date, you could write something like this:

(set 'dir {/Users/me/temp/})
(dolist (i (directory dir {^[^.]}))
 (set 'item (string dir "/" i))
 (set 'mod-date (date (file-info item 6) 0 "%Y%m%d-%H%M%S"))
 (rename-file item (string dir "/" mod-date i)))
;-> before
image-001.png
image-002.png
image-003.png
image-004.png

;-> after
20061116-120534image-001.png
20061116-155127image-002.png
20061117-210447image-003.png
20061118-143510image-004.png


The (file-info item 6) extracts the modification time (item 6) of the result returned by file-info.

Always test scripts like this before you use them for real work! A misplaced punctuation character can wreak havoc.

Reading and writing data edit

newLISP has a good selection of input and output functions.

An easy way of writing text to a file is append-file, which adds a string to the end of a file. The file is created if it doesn't exist. It's very useful for creating log files and files that you write to periodically:

(dotimes (x 10) 
 (append-file "/Users/me/Desktop/log.log" 
 (string (date) " logging " x "\n")))

and there's now a file on my desktop with the following contents:

Sat Sep 26 09:06:08 2009 logging 0
Sat Sep 26 09:06:08 2009 logging 1
Sat Sep 26 09:06:08 2009 logging 2
Sat Sep 26 09:06:08 2009 logging 3
Sat Sep 26 09:06:08 2009 logging 4
Sat Sep 26 09:06:08 2009 logging 5
Sat Sep 26 09:06:08 2009 logging 6
Sat Sep 26 09:06:08 2009 logging 7
Sat Sep 26 09:06:08 2009 logging 8
Sat Sep 26 09:06:08 2009 logging 9


You don't have to worry about opening and closing the file.

To load the contents of a file into a symbol in one gulp, use read-file.

(set 'contents (read-file "/usr/share/newlisp/init.lsp.example"))
;-> 
";; init.lsp - newLISP initialization file\n;; gets loaded automatically on 
; ...
(load (append $HOME \"/.init.lsp\")) 'error))\n\n;;;; end of file ;;;;\n\n\n          "

The symbol contents stores the file's contents as a single string.

open returns a value which acts as a reference or 'handle' to a file. You'll probably want to use the file later, so store the reference in a symbol:

(set 'data-file (open "newfile.data" "read"))  ; in current directory

; and later

(close data-file)

Use read-line to read a file one line at a time from a file handle. Each time you use read-line, the next line is stored in a buffer which you can access with the current-line function. The basic approach for reading a file is like this:

(set 'file (open ((main-args) 2) "read")) ; the argument to the script
(while (read-line file) 
   (println (current-line)))               ; just output the line

read-line discards the line feed at the end of each line. println adds one at the end of the text you supply. For more about argument handling and main-args, see STDIO.

For small to medium-size files, reading a source file line by line is much slower than loading the whole file into memory at once. For example, the source document for this book is about 6000 lines of text, or about 350KBytes. It's about 10 times faster to process the file using read-file and parse, like this:

(set 'source-text (read-file "/Users/me/introduction.txt"))
(dolist (ln (parse source-text "\n" 0))
     (process-line ln))

than using read-line, like this:

(set 'source-file (open "/Users/me/introduction.txt" "read"))
(while (read-line source-file)     
     (process-line (current-line)))

The device function is a handy way of switching output between the console and a file:

(set 'output-file (open "/tmp/file.txt" "write"))
(println "1: this goes to the console")
(device output-file)
(println "2: this goes to the temp file")
(device 0)
(println "3: this goes to the console")
(close output-file)

 

Suppose that your script accepts a single argument, and you want to write the output to a file with the same name but with a .out suffix. Try this:

(device (open (string ((main-args) 2) ".out") "write"))

(set 'file-contents (read-file ((main-args) 2)))

Now you can process the file's contents and output any information using println statements.

The load and save functions are used to load newLISP source code from a file, and to save source code into a file.

The read-line and write-line functions can be used to read and write lines to threads as well as files. See Reading and writing to threads.

Standard input and output edit

To read from STDIO (standard input) and write to STDOUT (standard output), use read-line and println. For example, here's a simple filter that converts standard input to lower-case and outputs it to standard output:

#!/usr/bin/newlisp

(while (read-line) 
   (println (lower-case (current-line))))

(exit)

And it can be run in the shell like this:

$ ./lower-case.lsp 
HI
hi
HI THERE
hi there
...


The following short script is a useful newLISP formatter submitted to the user forum by user Echeam:

#!/usr/bin/env newlisp
(set 'indent "    ")
(set 'level 0)

(while (read-line)
    (if (< level 0) (println "ERROR! Too many close-parenthesis. " level))
    (letn ((ln (trim (current-line))) (fc (first ln)))
        (if (!= fc ")") (println (dup indent level) ln))  ; (indent & print
        (if (and (!= fc ";") (!= fc "#"))     ; don't count if line starts with ; or #
            (set 'level 
              (+ level (apply - (count (explode "()") (explode (current-line)))))))
        (if (= fc ")") (println (dup indent level) ln)))  ; (dedent if close-parenthesis
    )

(if (!= level 0) (println "ERROR! Parenthesis not balanced. " level))
(exit)

which you can run from the command-line like this:

$ format.lsp < inputfile.lsp


Command-line arguments edit

To use newLISP programs on the command line, you can access the arguments with the main-args function. For example, if you create this file:

#!/usr/bin/newlisp 
(println (main-args))
(exit)

make it executable, and then run it in the shell, you'll see a list of the arguments that were supplied to the script when you run it:

$ test.lsp 1 2 3
("/usr/bin/newlisp" "/Users/me/bin/test.lsp" "1" "2" "3")
$


main-args returns a list of the arguments that were passed to your program. The first two arguments, which you probably don't want to process, are the path of the newLISP program, and the pathname of the script being executed:

(main-args)
;-> ("/usr/bin/newlisp" "/path/script.lsp" "1" "2" "3")

So you probably want to process the arguments starting with the one at index 2:

((main-args) 2)
;-> 1

(main-args 2)       ; slightly simpler
;-> 1

which is returned as a string. Or, to process all the arguments, starting at index 2, use a slice:

(2 (main-args))
;-> ("1" "2" "3")

and the arguments are returned in a list of strings.

Often, you'll want to work through all the main arguments in your script: a convenient phrase for this is:

(dolist (a (2 (main-args)))
 (println a))

A slightly more readable equivalent is this, which works through the rest of the rest of the arguments:

(dolist (a (rest (rest (main-args))))
 (println a))

Here's a short script that filters out unwanted Unicode characters from a text file, except for a few special ones that are allowed through:

(set 'file (open ((main-args) 2) "read")) ; one file

(define (in-range? n low high)
 ; is n between low and high inclusive?
 (and (<= n high) (>= n low)))
 
(while (read-line file) 
 (dostring (c (current-line))
   (if 
      (or
       (in-range? c 32 127)      ; ascii
       (in-range? c 9 10)        ; tab newline
       (in-range? c 12 13)       ; \f \r
       (= c (int "\0xbb"))       ; right double angle
       (= c (int "\0x25ca"))     ; diamond
       (= c (int "\0x2022"))     ; bullet
       (= c (int "\0x201c"))     ; open double quote
       (= c (int "\0x201d"))     ; close double quote
       )
    (print (char c))))           ; nothing to do
   (println) ; because read-line swallows line endings
)

For some more examples of argument processing, see Simple countdown timer.