PostScript FAQ/Programming PostScript

Programming PostScript

edit

How to edit a PS file?

edit

Few people want to edit HPGL or PCL. PostScript is yet another control language used by some laser printers. In most cases PostScript file is algorithmically derived from some human-readable document. Get the original document, if possible.

Visual editing requires interpretation of the PS file, editing of the objects, and generation of a new PostScript file. Few applications support all features of PostScript graphic model. Unsupported features will be lost or approximated. Being a programming language PostScript can take different branches depending on the execution environment. Untaken branches will be discarded by the interpreter leaving no trace in the editor. When the printer driver generates PS file it doesn't care about preserving high-level structure. It can be very difficult to do something beyond fixing a spelling error or changing a page number.

Many mainstream graphic editors can import PS files. The quality of the built-in interpreters varies and is often limited to level 1. PDF import filter tend to work better; try to convert the PS file to PDF first.

If the PS file has too many primitives unsupported by the visual editor, pstoedit utility from Ghostscript can be used to reduce PostScript file to a sequence of path operations.

Tailor, from EnFocus Software, seems to be the only proprietary PostScript editor available. (It was discontinued long ago). There is also a software from CGS called PDF Tuner that can edit both PostScript and PDF files. This Software is constantly developed.

Manual editing requires good knowledge of PostScript language and a robust text editor. Occasionaly users make PS files over 100M long. The files often contain binary data; sometimes they depend on the exact size of data.. The text editor shouldn't expand tabs or normalize line ends. Hexadecimal view of the file can be handy.

Redefinition of operators and procedures is a powerful technique to alter the behaviour of a PS file. New code can be included into the RIP startup script, copied by the driver from PPD, or added directly into the file by an input filter. token operator helps to take away the control from the PostScript interpreter when operator redefinition doesn't work.

Idiom recognition is an automated technique to patch a PS procedure when it is bound. Although it was designed to add smooth shading to legacy PostScript programs it can be also used to fix bugs.

How to create a PS form and populate it from a database?

edit

Fixed format forms can be created in any graphic editor and contain any presentation graphics. The variable content fields can be reserved using placeholder EPS files.

The placeholder EPS files can be replaced with variable content using standard Unix text tools. For many years a similar technique has been used in OPI workflow.

Alternatively, the placeholder EPS files can be made smart and print themselves. Variable content can be retrieved from PostScript VM or external files.

Variable format documents need text reflow and re-pagination. You have the option of using available resources for direct PostScript text formatting, or creating the report in a higher level page layout language (TeX, groff) and converting it to PostScript or PDF.

In most cases it is impossible to replace words in a PostScript document harvested from a standard PostScript driver. Drivers can fragment the word for kerning without xshow, use hexadecimal strings, or convert the line of text to a sampled image. The latter is a common way to print Asian fonts on Roman printers.

How to concatenate several PS files?

edit

Contrary to the popular belief PostScript files cannot be concatenated to get the combined result.

 # won't work
 cat page1.ps page2.ps page3.ps > threepages.ps

Entering the files as a separate Ghostscript arguments won't help either.

 # won't work
 gs -sDEVICE=pswrite -sOutputFile=threepages.ps page1.ps page2.ps page3.ps

PostScript programs change the state of the PostScript interpreter and these changes must be undone before the next program runs. First, write the following header.

 %!
 /begin_file
  { /save_state save def  % save state of PS interpreter
    currentfile
    0 (% $$$ EOF Mark $$$) /SubFileDecode filter cvx exec % safequard against flushfile, etc.
  } bind def
 /end_file
  { clear cleardictstack  % clear after the file
    save_state restore    % restore the state
  } bind def

Include every PostScript file in the following wrapper and append them to the header.

 begin_file
 % Include your file here.
 % $$$ EOF Mark $$$       % don't delete this line
 end_file

The result won't be DSC-conforming even if the original files are.

Composition of EPS files is done by placement of the files into a container EPS file.

PostScript interpreters that work in a job server mode (most printers, Ghostscript 8.50 or higher with -dJOBSERVER) treat multiple jobs delimited with ^D character and piped to the input stream as independent jobs.

In general it is impossible to concatenate several PS programs to get the combined result. The following program will defeat all attempts to append something to it. Most commonly such techniques can be found in OCF font installers.

 systemdict begin (%stdin) (r) file flushfile

How to print accented characters?

edit

Single byte PS fonts can include many glyphs, but only 256 of them can be accessed at the same time. The 256 element array mapping a character code to the glyph name is known as an encoding vector. The PostScript language has several built-in encoding vectors. You can (and should always) re-encode your fonts. The ISOLatin1Encoding is probably a good start. To re-encode the font, what you need to do is:

/Courier findfont               % load the font, for instance, Courier
0 dict copy begin               % copy it to a new dictionary
/Encoding ISOLatin1Encoding def % replace encoding vector
/MyCourier /FontName def        % replace font name
currentdict end
dup /FID undef                  % remove internal data
/MyCourier exch definefont pop  % define the new font 

Re-encoded fonts can be used in the same way as any other fonts.

There are several ways to print more than 256 glyphs at a time.

  • Create a few re-encoded fonts and select them when needed.
  • Use glyphshow for rare characters
  • Make OCF fonts from several re-encoded fonts. OCF supports every imaginable encoding scheme.
  • Make a composite font from CMap and Font or CIDFont resources.

How to justify text in PostScript?

edit

The common approach is to do text formatting in an application. The application emits PostScript code reflecting the final format - now a simple question of dumping things at x, y coordinates. If an application overrides the default interword/interletter spacing (say, for justification or kerning), it should use PostScript's ashow, widthshow, awidthshow, kshow, xshow, yshow, or xyshow operators, which take an intact text string and a separate specification for positioning the elements. The sloppy alternative, just using show on individual letters and small word fragments with movetos in between, is a hindrance for text extraction programs and distillers.

To perform common functions of text formatters, such as hyphenation, multi-column text, footnotes, bibliography, glossary, etc., in PostScript itself requires implementing suitable algorithms in PostScript and supplying them along with the file to be rendered, either included in the file or downloaded in advance. Existing implementations include David Byram-Wigfield's TinyDict, Graham Freeman's Quikscript, and Don Lancaster's Gonzo Utilities. All three are quite full-featured, with justification, columns, pagination, and many other features. Simpler building blocks include Chapman Flack's Markup, which can be used as a front end to those, or to do simple left/center/right unfilled/unjustified text setting on its own, and Hyphenate, an implementation of the TeX hyphenation algorithm, for which many language-specific patterns have been compiled. Collectively, these approaches that do the bulk of the computation in PostScript itself can be called (Byram-Wigfield's term) direct PostScript, and the Anastigmatix direct PostScript page further describes and compares them.

Automatic kerning requires information that is in the Adobe Font Metrics (AFM) files that come with most fonts, so to do it in PostScript would require downloading also the appropriate AFM files and some PostScript code to parse them (the syntax is simple). No direct PostScript resource that does this seems currently available. All of the existing systems do permit manual kerning, which is usually adequate as kerning is most important for large display type, accounting for a small fraction of a typical document.

Designs are not limited to two approaches, all formatting in the application emitting pure PostScript only or full-fledged direct PostScript. The first approach tends to save compute time on the printer at the expense of emitting large, uneditable PostScript files and longer transmission times to the printer, and direct approaches can have compact, editable files (with fixed prologs of implementation code that can be downloaded once in advance) and do more computation on the printer. Most applications choose a compromise, doing some of the work in the application and including prologs in the emitted PostScript that provide procedures to finish the work on the printer itself. It would even be possible to design an application that used one of the existing direct PostScript resources as its prolog, and emitted the document in that form.

How to place several copies of a picture?

edit

The picture can be a sampled image or EPS file.

PostScript always reads the standard input stream forward. There is no way to reposition the file or push characters back. Next picture need a new copy of data. To reuse the picture we need to copy the data from the input stream to the reusable object.

Short data can be stored in a PostScript string, but the string length is limited to 64K-1 . More data can be stored as an array of strings. The length of array is also limited to 64K-1 but this is enough to exhaust the memory on most printers. In fact low end printers can have as little as 256K of free memory.

Encoding filters can be used to compress picture data. In level 3 PostScript most encoding filters are optional.

PostScript level 3 introduces reusable streams. They cache the input data and can be repositioned after reading to the end of the file. On disk-less printers reusable streams are implemented in memory file system.

Some printers and most host-based RIP's have writeable file systems.

To avoid multiple rendering PostScript level 2 introduced forms. It is a wrapper around a procedure that hints PostScript interpreter how to cache rendering results. The form procedure can be executed many times and need reusable data source. Forms are rarely used and many Adobe OEM choose very small default size for the form cache.

How to redefine an operator?

edit

The simpliest approach is to use bind operator.

 /foo { bar } bind def

This approach is not recommended in production environment where foo can be already redefined as a procedure. The following approach works in all cases.

 /baz /foo load def
 /foo { baz bar } bind def

Ghostscript also provides implementation-specific ways of operator redefinition. See the comments to odef operator in Ghostscript sources.

Good redefinition should preserve as many features of the original operator as possible. The following simple redefinition of halftone operators can cause several unexpected problems.

 /sethalftone { pop } bind def
 /setscreen { pop pop pop } bind def
 /setcolorscreen { 12 { pop } repeat } bind def

First, halftone can read data from the current file. Second, differences in error processing may be significant. The stock installation of an early version of Mac OS X generated the following code:

 featurebegin { 30 60 setscreen % no function!
 } featurecleanup

The redefinitions above would smash the stack and cause a PostScript error regardless of featurecleanup. Here are better redefinitions

 /sethalftone_orig /sethalftone load def
 /setscreen_orig /setscreen load def
 /setcolorscreen_orig /setcolorscreen load def
 /sethalftone { gsave sethalftone_orig grestore } bind def
 /setscreen { gsave setscreen_orig grestore } bind def
 /setcolorscreen { gsave setcolorscreen_orig grestore } bind def

How to concatenate strings?

edit

PostScript doesn't have a build-in string concatenation operator. The following procedure, copied from GNU Ghostscript, seems to be the shortest.

 /concatstrings % (a) (b) -> (ab)  
   { exch dup length    
     2 index length add string    
     dup dup 4 2 roll copy length
     4 -1 roll putinterval
   } bind def  

Concatenation of multiple strings can be done more efficiently than calling concatstrings repeatedly. The strings are stored in array for the convenience of iteration.

 /concatstringarray  %  [(a) (b) ... (z)] --> (ab...z)  
    { 0 1 index { length add } forall string     
      0 3 2 roll      
        { 3 copy putinterval
          length add 
        }
      forall pop  
    } bind def

Where are min and max functions?

edit

PostScript doesn't have built-in min and max operators but they can be easily coded as procedures.

 /min { 2 copy gt { exch } if pop } bind def
 /max { 2 copy lt { exch } if pop } bind def

Ghostscript has .min and .max operators in systemdict . For the backward compatibility Ghostscript also provides min and max procedures defined as:

 /max { .max } bind def
 /min { .min } bind def

How to sort an array?

edit

Ghostscript has a bubble sort procedure in gs_init.ps.

 % <array> <lt-proc> .sort <array>
 /.sort 
   { 1 index length 1 sub -1 1 
       { 2 index exch 2 copy get 3 copy	% arr proc arr i arr[i] arr i arr[i]    
         0 1 3 index 1 sub 
           { 3 index 1 index get	        % arr proc arr i arr[i] arr imax amax j arr[j]      
             2 index 1 index 10 index exec
               {                               % ... amax < arr[j]	
                 4 2 roll      
               } 
             if pop pop    
           } 
         for             			% arr proc arr i arr[i] arr imax amax    
         4 -1 roll exch 4 1 roll put put  
       } 
     for
     pop
   } bind def

net.anastigmatix.Order provides insertion sort, quicksort, heapsort, array min, max, and simultaneous minmax, and arbitrary order statistics:

 /net.anastigmatix.Order /ProcSet findresource begin
 [7 49 73 58 30 72 44 78 23 9 40 65 92 42 87] //PolyCmp QuickSort

This Shell sort procedure sorts anything whose elements can be compared and exchanged by a procedure.

 %!
 % Shell sort in PostScript
 % Copyright (c) 2002 by Alex Cherepanov.  All rights reserved.
 % Distributed under GPL, http://www.gnu.org/licenses/gpl.txt
 
 %
 % Shell sort procedure based on compare and exchange operation
 % < i > < j > exch_less <bool> compares keys at positions i and j
 %                          exchange if not in order, retutn *i<*j
 % <len> is the number of elements
 %
 /shellsort                       % {} len -> -
   { dup dup 2 idiv               % {} len m p
       { dup 0 eq { exit } if
         exch                     % {} len p m
         15 le { dup 2 idiv } { dup } ifelse
         16#fffffffe and 1 add    % {} len p m'
         1 1 4 index 3 index sub  % {} len p m' 1 1 len-_m
           { 1 index neg 1        % {} len p m' is -m 1
               { 1 sub            % {} len p m' ii-1
                 2 copy add       % {} len p m' ii-1 m+ii-1
                 5 index exec
                   { exit
                   }
                 if
               }
             for
           }
         for
         exch 2 idiv              % {} len m' p'
       }
     loop
     pop pop pop pop              % -
   } bind def
 
 %
 % Sample unsorted array
 %
 /sample_array [ 1 8 76 3 7 0 7 8 55 86 5 58 57 55 3 6 9 6 66 4 3 4 8 65 8 8 55
                 4 5 88 55 3 6 7 44 5 7 4 7 43 3 ] def
 %
 % Sample array compare and exchange
 % The elements at positions i and j are compared and exchanged
 %
 % if (*i<*j)
 %   return true
 % (*i,*j) = (*j,*i)
 % return false
 %
 /array_exch_less        % i j -> bool
   { 6 index 3 1 roll    % [] i j  % get the element just below {exch}
     2 index exch        % [] i [] j
     4 copy              % [] i [] j [] i [] j
     get                 % [] i [] j [] i *j
     3 1 roll get        % [] i [] j *j *i
     2 copy ge           % [] i [] j *j *i *i<*j
       { pop pop pop pop pop pop //true
       }
       { exch            % [] i [] j *i *j
         4 1 roll        % [] i *j [] j *i
         put put //false
       }
     ifelse
   } bind def
 
 %
 % Sort the array
 %
 /array_sort  % [unsorted] -> [sorted]
   { //array_exch_less 1 index length shellsort
   } bind def
 
 %
 % Print the sorted array
 %
 sample_array array_sort ==
edit

From Level 2 on, the glyphshow operator has allowed a named glyph to be displayed:

% display a "registered trademark" symbol
/registered glyphshow

Where can I get more examples of PostScript code?

edit

Books about PostScript include many simple PS programs.

The Ghostscript distribution has a few sample PostScript files in the examples directory. A large part of Ghostscript is implemented in PostScript language including a PDF 1.4 interpreter.

Adobe has published a few sample programs for Level 3 features.

Free sample library contains many good (and bad) PostScript files. The library collects interesting examples of PostScript, PDF, PCL, and TIFF files coded manually or harvested from applications.

For an understanding of typesetting procedures see 'Practical PostScript' A Beginner's Guide to Digital Typesetting, 90 pages (E-book 416k PDF). [[1]]

How to debug a PostScript program?

edit

The first skill you should learn as soon as you can is how to interpret PostScript error messages. They are not always easy. Error reporting varies by product. Some products will not give PostScript error details (e.g. EPS import to Photoshop), and are best avoided for any learning purposes.

The standard PostScript error format is something like this

PS>0 1 2 3 4 5 put
%%[ Error: typecheck; OffendingCommand: put ]%%
PS><< >> image
%%[ Error: undefined; OffendingCommand: image ]%%

You will see this includes the error name, and a command causing the error. Please do not describe this as an "offending command" error, since all messages say that, and remember that both error name and offending command are important. The latter message means that a required key was not found in the image dictionary.

If you happen to be using Ghostscript, you may get more details. Observe that the error message includes the stacks, and the operator name.

GS>0 1 2 3 4 5 put
Error: /typecheck in --put--
Operand stack:   0   1   2   3   4   5
Execution  stack:   %interp_exit   .runexec2   --nostringval--    
--nostringval--   --nostringval--   2   %stopped_push   --nostringval--  
--nostringval--   %loop_continue   2   3   %oparray_pop   --nostringval--    
--nostringval--   false   1   %stopped_push   .runexec2   --nostringval--    
--nostringval--   --nostringval--   2   %stopped_push   --nostringval--   
--nostringval--   --nostringval--
Dictionary stack:   
--dict:1044/1123(ro)(G)--   --dict:0/20(G)--   --dict:69/200(L)--
Current allocation mode is local
Current file position is 16

You can look at the description of the operator, and the stack to see your operands, and often see why it is complaining. Then you can (sometimes) work out where in your original code the problem lies. Remember that a failed PostScript operator doesn't change the operand stack.

Ghostscript exits when an error happens in a file passed as a command line parameter. Use operator run to keep the session and examine operand stack interactively.

 $ gs
 GS> (foo.ps) run

Use the = and == commands to print out information about objects on the operand stack. To see the contents of a dictionary, push the dictionary on the stack and then use {== ==} forall. For example, to see the contents of the user dictionary, use userdict {== ==} forall.

Try to run the program on different interpreters. Different error messages may reveral more information about the problem. Ghostscript can print detailed dumps of operand and execution stacks when it runs with -dOSTACKPRINT and -dESTACKPRINT options respectively. The top element of the stack is printed last.

Error handler is a PostScript procedure that runs when PostScript interpreter encounters an error. It is initially installed by the PS interpreter but can be replaced by the job. The error handler provided by the PostScript driver is very basic - delete it. On Ghostscript you will get better error messages with the default error handler. On other interpreters you can use ehandler.ps provided by Adobe.

If the problem is not yet apparent try to make the smallest possible program that still has the problem by removing or commenting out pieces of code. Using { ... } pop construct is a convenient way to disable parts of PostScript code. Save every version of the file under a different name to have multi-level undo. When the file is reduced to a single procedure, try to replace it with inline code. When the procedure is buried in other code or redefined several times, try to load it at the execution spot and print. For instance: /foo load ==.

You can trace a PostScript program by adding print operators into procedures or redefining operators. The Ghostscript package includes a script traceop.ps which can help in putting a trace on operators. After including the script near the front of your postscript program, you put /foo traceop at the point where you want to start tracing all calls to operator foo. You then modify procedures /tracebefore and /traceafter to print some relevant diagnostics. For example, /tracebefore { count traceprint } def would print the whole stack just before entry into /foo. (/traceprint basically does a {print ==only } for as described above.) Operator /foo is then executed as normal. If a program doesn't render correctly on a raster device, chances are the same problem will occur during PS to PDF conversion. The PDF document preserves more information and can be interpreted as a trace of graphic operations.

LaserTalk was once part of the Adobe developer's kit. It was a visual front end to the PostScript printer. LaserTalk only traced the top level of execution.

PSAlter from Quite Software is the only visual PostScript debugger on the market.

How to include an EPS file?

edit

EPS format is documented in Adobe technical note 5002, Encapsulated PostScript File Format Specification (PDF 0.2M) and 2nd edition of PostScript Language Reference Manual. The specification includes guidelines for creating EPS files (they are simply PostScript programs that contain a couple of required header comments and promise not to do certain things), and for importing EPS files. The EPS file text can simply be pasted into the enclosing PostScript program between a %%BeginDocument: and an %%EndDocument comment. (The %%BeginDocument: keyword can be followed by descriptive text such as the EPS file name.) It is the enclosing program's responsibility to establish a suitable state just before the inserted EPS and restore the prior state afterward. The program should:

  • take a state snapshot with save
  • redefine showpage as no-op
  • set the color space, color, line cap, line width, line join, miter limit, dash pattern, current path, overprint, and stroke adjust modes to specified defaults
  • present an empty operand stack and the default dictionary stack (but containing the no-op showpage)
  • compute and set a coordinate transformation that puts the EPS figure where it is wanted
  • set a clipping path matching the figure's bounding box

After the inserted EPS code, the enclosing program should discard anything the EPS code may have left on the operand and dictionary stacks, restore the prior contents, and then execute restore on the earlier save object, which will undo all of the other preparations.

Applications that output PostScript typically omit the code to do all of that at every place an EPS figure is inserted, and they often cut corners and omit code that makes simplifying assumptions. Especially when writing PostScript by hand, it can be more convenient to use, for example, the Anastigmatix StatEPSF, ReadyEPSF, ExecEPSF procedures, which automate the necessary steps. The following code is then all that's needed to place an EPS at (100,550), rotated 0 degrees and scaled by 1 in both axes:

 % in the document setup:
 /net.anastigmatix.Import /ProcSet findresource begin
 % at the point of import:
 { StatEPSF 100 550  0  1 1 ReadyEPSF ExecEPSF }
 currentfile exch exec
 %%BeginDocument: helloworld.eps
 ... the EPS file contents go here
 %%EndDocument

Of course, nothing stops an application from generating PostScript that way too.

Some EPS files contain preview images. In the DOS/Windows format, they are binary data that must be stripped out when pasting the EPS text into another program. The EPSI preview format is in the form of PostScript comments and can be removed to save space, but will not cause problems if left in place.

Does PostScript support unicode for CJK fonts?

edit

PostScript does not specifically support Unicode. It includes general support (from level 2) for multi-byte (2, 3 or 4 byte) fonts. Unicode is just a special case of multi-byte encoding. There is no Unicode font as such, but you can use multi-byte CID fonts that provide support for ranges of Unicode.

There are no freely redistributable Unicode CMaps. They can be derived from MS Windows code pages and corresponding Adobe CMaps.

Developers often look for a "simple" way to just include Unicode strings with no checking for ranges, language etc. but this is a pipe dream.

Some PostScript engines also provide support for TrueType fonts that have no CID-mapping internally. In that case, no CID map is needed for these fonts, but only a read filter for decoding a Unicode-encoded stream for the string storing the byte sequences ; several decoding filters are possible, including those supporting UTF-8 and UTF-16, because the PostScript's "read" operator just expects the filter to return any convenient integer that is used as an index to the glyph in the font's dictionary.

Filters, that can be chained and used like other file objects by the read operator, are supported in PostScript Level 2. Note that most PostScript applications expect that bytesavailable is the number of integers that read can return without waiting or encountering an end-of-file (or end-of-string) condition.

The read operator is then not limited to returning integers in the range 0..255, so the actual return value from read is a code unit (or code points or glyph ID) in any convenient encoding that is supported as glyph indexes for the selected font, and the read operator will actually read as many bytes from its source (file, string or procedure) as needed by the decoding filter. Other standard filters support data decompression, or decoding hex or ASCII85 sequences from any source.

So instead of enumerating characters in a string using the forall operator, pass the string to a Unicode decoding filter, and use a read loop to enumerate characters (or glyph indexes) from the filter object, then render the characters using the returned integer index to lookup for glyphs in a composite font. If the PostScript engine does not handle a native support for Unicode or TrueType, you'll have to convert the TrueType font into a CID-mapped font and use a filter that will return CID values usable with the CID-mapped font loaded into the document preamble or transfered as a resource before your printed document.

You'll probably also need a layout engine (like Pango) to process the Unicode codepoints into series of glyph ids and precompute their position within the PostScript-rendered document where these will be encoded and decoded by the PostScript engine using the associated filter. Such text layout engine is needed anyway to support the BiDi alghorithm, as well as to render complex scripts, ligatures, contextual forms, or variants described in the font used in your document design...