Scheme Programming/Input and Output
Files
editA file is essentially nothing but a string that is stored on your computer's hard drive (or a USB stick, SD card, or other storage device) with a name. There is one other type of object on a hard drive that has a name, and that is a directory. "Directory" is the word that programmers have always used for what later became known as "folders." They are lists of filenames.
Ports
editWhen you want to work with the contents of a file in Scheme, you use functions such as read-char and write-char, which retrieve or add characters to a file one byte at a time. Or, you can use library functions such as read-line, or read and write, all of which read or write more than one character at a time and parse the data in different ways.
Before you can read from or write to a file, you must open it. That means getting a file descriptor from the operating system, which is a value that keeps track of your place in the file― which character in the file will be the next one you read, or where in the file the next character you write will end up. On Windows, the file descriptor also gives you exclusive write access to the file― Other programs can't write to the same file you're writing to or delete it. Unix (Linux and Mac OS) provides no such guarantee, however.
Ports are Scheme's file-descriptor values. They are passed to input/output procedures to tell the I/O functions which file to read from or write to. Each port represents one file and one direction. That is, a port can either by an input-port, or an output-port, but never both.
The keyboard and screen or terminal are also files. Typically, they're the same file: /dev/tty on Unix, or CON for a Windows console program. GUI programs such as your web browser typically have no console or TTY (which stands for teletype) associated with them, however.
At the Scheme REPL, the file representing the keyboard and teletype is open by default. It has three ports: (current-input-port) for the keyboard, (current-output-port) for the teletype, and (current-error-port), also for the teletype, for error messages. It is customary to send error messages to (current-error-port) rather than (current-output-port). That's because it is possible to redirect these ports. The current-output-port can be redirected to a file, for example, while error messages still get printed on the screen.
> (current-input-port)
#<input-output-soft 9a8ad18>
> (current-output-port)
#<input-output-soft 9a8ad18>
> (current-error-port)
#<output-port /dev/pts/20>
Displaying human-readable values to a port
edit> (display "This is a string" (current-output-port))
This is a string#<unspecified>
>
The display function does not print a newline after its output. It strips quotation marks from strings, but otherwise displays Scheme values in the same format in which they're found in Scheme source.
To print a newline, use the newline function:
> (begin
(display "This is a string" (current-output-port))
(newline (current-output-port)))
This is a string
#<unspecified>
The port argument is actually optional. If you don't include it, display
, newline
, and other input and output (I/O) functions will assume you mean (current-output-port)
for output or (current-input-port)
for input.
Opening and Closing a File
editYou open a file with either open-input-file to read, or open-output-file to write to a file. The value returned by those functions is a port, which needs to be bound to something.
> (define in (open-input-file "test.c")) ; Just a C source file I have lying around.
#<unspecified>
> (read-line in)
"#include <stdio.h>"
> (close-input-port in)
0
>
It is important to close a port when you're done with it. There are a limited number of files that you can have open at the same time. This limit is imposed by the operating system, not by Scheme.
Writing to a File
editOpening a file for output erases its contents.
> (define out (open-output-file "test.c"))
#<unspecified>
> (display "Problem?" out) ; My C source file is wiped out and replaced with this. >:(
#<unspecified>
> (newline out)
#<unspecified>
> (close-output-port out)
0
When you write a character to a file, some Scheme implementations will not actually write them, but will instead store them in an internal buffer until either enough bytes have been received, or until a newline is written. Any buffered characters remaining will be written out when you close the file.
The operating system will automatically close all your files when Scheme exits.
On some Scheme implementations, open-output-file will raise an error if the file already exists. For example, on Racket:
> (define out (open-output-file "test.c"))
open-output-file: file exists
path: /tmp/test.c
context...:
/usr/share/racket/collects/racket/private/misc.rkt:87:7
>
Reading and Writing Scheme values
editScheme provides the read function, which reads and parses a Scheme value from a port (or (current-input-port) if no port is specified), and a corresponding write function. This is the easiest way to get data into and out of Scheme, and as a result, Scheme programmers prefer to store their data as Scheme code whenever it's feasible. For example, suppose you have the following file:
just-some-raw-data.scm
((0.00036277727 0.00024514514 0.00010899892 -0.00017201288 5.1782848e-05) (0.000252906 0.00015007147 -0.00023179696 -0.00037388649 8.3796775e-05) (-0.00037429505 -0.00020174753 0.00043324157 0.00015203918 0.0003337927) (0.0001250037 5.5220273e-05 -0.00049933029 -0.00010911703 -0.00019316927) (0.00018089121 4.254036e-05 0.00018602787 -2.7271702e-05 -0.00024643468))
You could write a program to read the file, do something to all the numbers in it, and write the result back to the same file:
manipulate-raw-data.scm
(define filename "just-some-raw-data.scm")
(define in (open-input-file filename))
(define raw-data (read in))
(close-input-port in)
(define out (open-output-file filename))
(write (map (lambda (row)
(map (lambda (num)
(* num 100000)) row)) raw-data)
out)
(close-output-port out)
Then, in the REPL:
> (load "manipulate-raw-data.scm")
; loading manipulate-raw-data.scm
; done loading manipulate-raw-data.scm
#<unspecified>
>
The new contents of the file would be:
just-some-raw-data.scm
((36.27772699999999 24.514513999999998 10.899892 -17.201288 5.1782847999999974) (25.290600000000003 15.007147 -23.179696000000005 -37.388649 8.3796775) (-37.429505000000005 -20.174753 43.324157 15.203918 33.379269999999996) (12.500370000000003 5.5220273000000004 -49.933029 -10.911703 -19.316927) (18.089121000000002 4.254036 18.602787 -2.7271701999999997 -24.643468000000003))
Note that Scheme does not format the output in a way that looks nice or is easy for a human to read. But if you load this file with the same program, it will have no trouble reading the values and changing them again.
Reading from the keyboard
editThe read-line function reads a line of text from a port (or (current-input-port) if none is specified). When used at the REPL, the reading usually begins on the same line that the code is being read from:
> (define (prompt/read prompt)
> (display prompt)
> (read-line))
#<unspecified>
> (prompt/read "Enter your name: ")
Enter your name: ""
As you can see, Scheme didn't even give the user a chance to enter the name. However:
> (prompt/read "Enter your name: ") Johnny Boy
Enter your name: " Johnny Boy"
This happens because Scheme stops reading as soon as it sees the closing parenthesis. read-line sees everything after that. This doesn't affect your program if it's loaded from a file.
Deleting a file
edit> (delete-file "test.c")
Redirecting Ports and Automatically Closing the File
editScheme provides the with-input-from-file and with-output-to-file functions, which take a function as an argument. They redirect (current-input-port) or (current-output-port) so that they're open on the specified file, then call the function that you provide, and then close the file when that function exits. You can then call any function in your program, and if they write to (current-output-port) or read from (current-input-port), then those functions will read from/write to the file, also.
In some Scheme implementations, the file closes even if an error occurs, which is important because in R5RS Scheme there is no way to trap errors (however, various Scheme implementations provide extensions to allow errors to be trapped, while in some implementations, with-input-from-file does not trap errors). It's also nice not to need to define port variables.
The above file-manipulating program could have been written with with-input-from-file and with-output-to-file. The program would then look like this:
manipulate-raw-data.scm
(define filename "just-some-raw-data.scm")
(define raw-data (with-input-from-file filename read))
(with-output-to-file filename
(lambda ()
(write (map (lambda (row)
(map (lambda (num)
(* num 100000)) row)) raw-data))))
Reading a String As If It Was a File
editSome Scheme implementations provide with-input-from-string, which redirects (current-input-port) just like with-input-from-file. SCM, however, only provides call-with-input-string, which is like with-input-from-string except the procedure you provide must accept the port as an argument.
> (define my-string "the quick brown fox jumps over the lazy dog\n")
#<unspecified>
> (call-with-input-string my-string (lambda (port) (values (read port) (read port))))
the
quick
It is also possible to write to a string as if it was a port. call-with-output-string is used for this. The string is created from scratch. This is one way you can convert any value to a string:
> (call-with-output-string
(lambda (out)
(write (sqrt 2) out)))
"1.4142135623730951"
Handling raw, binary data
editRaw, binary data is not 100% portable between different Scheme implementations. Some implementations provide SRFI-56, which provides read-byte, write-byte, peek-byte, and byte-ready?. If your Scheme implementation doesn't provide them, and its characters are in a single-byte encoding like ASCII and it does not use Unicode, you can define them yourself:
(define (read-byte . opt)
(let ((c (apply read-char opt)))
(if (eof-object? c) c (char->integer c))))
(define (write-byte int . opt)
(apply write-char (integer->char int) opt))
(define (peek-byte . opt)
(let ((c (apply peek-char opt)))
(if (eof-object? c) c (char->integer c))))
(define byte-ready? char-ready?)
Then, bytes can be combined with OR (bitwise-ior), AND (bitwise-and) and bit shifting (arithmetic-shift or ash). Here is a function to convert a list of bytes to an integer, assuming the "big endian" encoding, which is commonly used in network packets:
(define (big-endian->integer list)
(let loop ((list list)
(result 0)
(shift (* 8 (- (length list) 1))))
(if (null? list)
result
(loop (cdr list)
(bitwise-ior result (arithmetic-shift (car list) shift))
(- shift 8)))))
You can use it to read an arbitrarily-sized integer from a port:
(define (read-big-endian-integer bytes . port)
(let loop ((bytes bytes)
(result '()))
(if (= bytes 0)
(big-endian->integer (reverse result))
(loop (- bytes 1)
(cons (apply read-byte port) result)))))
To read little-endian, which is the format used natively by Intel CPUs, just don't reverse the result. It might be convenient to have a function that can read both. Then you could define the both reading functions in terms of it:
(define (read-binary-integer bytes maybe-reverse . port)
(let loop ((bytes bytes)
(result '()))
(if (= bytes 0)
(big-endian->integer (maybe-reverse result))
(loop (- bytes 1)
(cons (apply read-byte port) result)))))
(define (read-big-endian-integer bytes . port)
(apply read-binary-integer (append (list bytes reverse) port)))
(define (read-little-endian-integer bytes . port)
(apply read-binary-integer (append (list bytes identity) port)))
Scheme provides the identity function, which simply returns its arguments, specifically for cases like the above, where we used it because we didn't want to reverse or do anything else to the result when reading little endian.
Reading Strings from Binary Files
editIn binary files, strings are stored either as a binary length preceded by the data, or as a null-terminated string. In the case of a binary length, the length itself can have different lengths, and it can be in either little or big endian byte order. The function below requires arguments that take all of that into account:
(define (read-counted-string count-size-in-bytes byte-order . port)
(let ((string-size (case byte-order
((big-endian) (apply read-big-endian-integer (cons count-size-in-bytes port)))
((little-endian) (apply read-little-endian-integer (cons count-size-in-bytes port))))))
(let loop ((result '())
(remaining-bytes string-size))
(if (= remaining-bytes 0)
(list->string (reverse result))
(loop (cons (apply read-char port) result)
(- remaining-bytes 1))))))
(define (read-null-terminated-string . port)
(let loop ((result '()))
(let ((next-char (apply read-byte port)))
(if (= next-char 0)
(reverse result)
(loop (cons (char->integer next-char) result))))))
Finally, it might be convenient to have a function that can read whole structures from a file. You could specify the format of a structure as a list that includes the sizes of integers to be read, and also specifies when to expect a string. For example, you could call (read-binary '(big-endian 2 4 (counted 1))) to read a 16-bit big-endian integer, followed by a 32-bit one, followed by a counted string whose length is represented by an 8-bit integer.
(define (read-binary spec . port)
(define (read-integer byte-order size)
(case byte-order
((big-endian) (apply read-big-endian-integer (cons size port)))
((little-endian) (apply read-little-endian-integer (cons size port)))))
(let loop ((spec spec)
(endian 'big-endian)
(result '()))
(cond ((null? spec)
(reverse result))
((eq? (car spec) 'big-endian)
(loop (cdr spec) 'big-endian result))
((eq? (car spec) 'little-endian)
(loop (cdr spec) 'little-endian result))
((list? (car spec))
(case (caar spec)
((counted) (loop (cdr spec)
endian
(cons (apply read-counted-string
(append (list (cadr (car spec)) endian) port))
result))
(null-term) (loop (cdr spec)
endian
(cons (apply read-null-terminated-string port) result)))))
((number? (car spec))
(loop (cdr spec) endian (cons (read-integer endian (car spec)) result)))
(else
(error "Invalid token:" (car spec))))))
The reader functions above assume that all integers are unsigned, meaning there's no way to represent a negative number. But some of the integers you may find in a binary file are meant to be interpreted as "signed". Suppose you have a signed byte that you read as an unsigned byte. An unsigned byte, being 8 bits, can have a value ranging from 0 to 255. Anything bigger than that and you need more than 8 bits to store it. A signed byte can represent values from 0 to 127 in exactly the same format as an unsigned byte, but the value that is interpreted as 128 in an unsigned byte is -128 in a signed byte. Unsigned 129 maps to -127, and so on, until you get to unsigned 255, which maps to -1.
The following function converts an unsigned integer of any size (in bytes) into a signed integer:
(define (unsigned->signed number orig-size)
(let ((max (inexact->exact (- (floor (/ (expt 2 (* orig-size 8)) 2)) 1))))
(if (> number max)
(- number (* 2 (+ 1 max)))
number)))