CAT-Tools/OmegaT/User manual/File Filters
Previous: Tagged (Formatted) Files Text Editing - Next: Project Files and Directories
File Filters
editA new addition starting with OmegaT version 1.4.5 is user configurable file (conversion) filters.
Overview
editFile filters are responsible for:
- reading a source document in from a file in a specific format (e.g. different filters exist for handling plaintext and OpenDocument/OpenOffice files);
- extracting translatable content from a file;
- writing the target document out to file (replacing translatable content with its translation in the process).
Since the filters are a new addition in this release there may still be unknown bugs, user interface quirks, and unclear procedure–hopefully this documentation will help clear up a few things in that respect. At present the user interface text for the file filters is only available in English. This is a limitation of the current version that will be remedied in a newer version of OmegaT.
Please use realistic settings when configuring filters or else unexpected results may occur. Read the instructions carefully and, if the results of manipulating the filters is not understood, stick with the default settings or work with a test project before using them on serious work. Ask questions on the OmegaT users group to get more help.
Take note that file filters can only be configured when there is no project open. To modify file handling for a particular project it must be done before the project is opened.
Filter Dialogs
editThe details for each of the dialogs and filter usage is given here.
File Filters
editAccess to this dialog is gained by selecting File Filters... from the Settings menu.
Within the dialog a table of the available file formats that can be handled is presented. The left column shows the available filter types. The right column shows a checkbox next to each filter type that indicates whether it is enabled (i.e. with a check mark) or not.
Currently available filter types are: text, XHTML, HTML, OpenDocument/OpenOffice.org, and Java Resource Bundles.
To edit a file filter from the types available highlight it and select the Edit... button to open the Edit File Filter configuration dialog for it.
To reset the file filters after changes have been made select the Defaults button. Select OK for the reset to defaults to take effect or Cancel to back out of the reset. Be sure that resetting the filters to defaults is what is wanted before accepting the changes.
Edit File Filters
editAccess this from the File Filters dialog via the Edit... button.
The user is presented here with a four column table for the file filter type selected. The table contains the headings: Source Filename Pattern, Source File Encoding, Target File Encoding, and Target Filename Pattern for the configured settings of a particular filter. These are used for the following purposes:
- Source Filename Pattern - determines which files in the /source directory will have the particular filter applied to them. Patterns can be customized according to the users' preferences.
- Source File Encoding - the character encoding in effect for the source files. Used to read in the source files. Has a drop down list from which to choose encodings.
- Target File Encoding - the character encoding in effect for the target files. Used to write out the target files. Has a drop down list from which to choose encodings.
- Target Filename Pattern - determines the pattern used to assign filenames to the target documents created in /target. Patterns can be customized according to the users' preferences.
On the right side of the dialog are the buttons Add..., Edit..., and Remove... that are used to open the Add Filter, Edit Filter, and Remove Filter dialogs, respectively
There is another Defaults button available in this dialog also. Use of this button will reset the filters for the filter type back to its original number of filters and default settings. Again, be sure this is exactly what is wanted before using it.
Add Filter & Edit Filter
editAccess these from the Edit File Filters dialog via the Add... and Edit... buttons, respectively.
These dialogs and their instructions are the same, but their use is slightly different. User defined filters are created in the Add Filter dialog. A newly created filter is added to the list of available filters for the particular type of filter through this dialog. Default and user defined filters are edited in the Edit Filter dialog. A chosen filter is updated when changes are accepted there.
Each dialog contains a number of entries that must be filled in to properly configure the settings of a filter. These correspond to the settings seen in the Edit File Filters dialog. Select OK to add a new filter or accept changes to an existing filter, or Cancel to back out of the particular operation.
Source Filename Pattern
editSpecify the pattern used to the determine which files in the /source directory will have the particular filter applied to them. Patterns can be customized according to the users' preferences and within the limits of normal shell (glob) patterns.
Enter the appropriate source file pattern in the text area. There is a default pattern set here which depends on the particular filter under consideration. Use the default or set a custom one according to the specific source files that are to be used with the filter. Source File Encoding
Specify the source file character encoding. This can be determined externally and entered (e.g. known ahead of time, from OpenOffice.org, etc.), by the file extension for some preset file formats (i.e. plaintext in Latin-1 uses *.txt1. or, if applicable, the <auto> setting can be used. For some formats the ability to change the encoding will be disabled. See the Encodings section on this page for details on the <auto> and disabled settings.
Select the appropriate encoding from the drop down list, or use <auto> for a filter to be automatically selected to read source files in with. Target File Encoding
Specify the target file character encoding. The target documents will be created with the selected encoding. It may be a good idea to know the encoding required ahead of time for the particular target locale. If applicable, the <auto> setting can be used to attempt to automatically set the encoding. Again, for some formats the ability to change the encoding will be disabled. See the Encodings section on this page for details on the <auto> and disabled settings. Target Filename Pattern
Specify the pattern used to assign filenames to the target documents created in /target. Patterns can be customized according to the users' preferences, within the limits of normal shell (glob) patterns.
Enter the appropriate target file pattern in the text area. The default pattern set here depends on the particular filter under consideration. Use the default or set up a custom pattern to alter the filenames of the target files for this filter. There are a number of preset filename variables available for this purpose. Filename Variables
For the Target Filename Pattern there are few preset filename variables that can be used to help create the filenames of the target documents. The syntax of these variables is ${VARIABLE} and follows from commonly known glob patterns in shell (command line) usage. A variable is used in line in the Target Filename Pattern text area to insert the filename variables' values that are determined from the files and settings in a project.
Select a filename variable from the drop down list. Use the Insert button to put a selected variable at the cursor location in the Target Filename Pattern text area. Filename Variables Variable Description ${filename} filename of a source file (name and extension) ${nameOnly} source file name only, without the extension ${extension} source file extension only ${sourceLanguage} project's source language (locale) (xx-YY or xx_YY) ${targetLocale} project's target locale (xx_YY) ${targetLanguage} project's target language (xx-YY) ${targetLanguageCode} project's target language code (xx) ${targetCountryCode} project's target country code (YY)
${filename} is the default configuration for the Target Filename Pattern in most cases, which means that the source file name and target (translated) file names will be the same.
See Sample Project for information on language (locale) and codes. Remove Filter
This is a confirmation dialog only. Select OK to permanently remove a filter. User defined filters will not be available after confirming their removal. The default filters for a particular file format will also not be available if removed, but these can be restored from the Edit File Filters dialog by selecting Defaults. Encodings
Encodings for source and target are selectable from the drop down lists in the Edit File Filters, Add Filter, and Edit Filter dialogs. The encodings available are limited to those in the Java Runtime Environment in operation. In addition, an <auto> variable can be used to attempt to automatically detect the source file encoding required. <auto> Setting
Depending upon the particular filter in use, one of three actions will be taken when an <auto> setting is encountered:
1. OmegaT will attempt to determine the encoding automatically; applies to source files only. 2. the default encoding of the operating system will be used. 3. no action will occur. This happens when the filter only supports a single source and target encoding.
Disabled(Grayed Out) Setting
The particular filter with the disabled variable does not support multiple file encodings or the file format is encoding-neutral. In which case there is no way to specify another encoding. Example: Target Filename Pattern
Perhaps it would be nice to change the target filenames to reflect the particular target language or locale being translated to. One possibility is to add a suffix to the file names (before the extension). For instance, if the target language is French, then the addition of the language code might be good.
In this case the pattern could be
${nameOnly}-fr.${extension},
but there is already a variable for the target language code so insert that variable instead to give
${nameOnly}-${targetLanguageCode}.${extension}.
Now, if a source file was named test.txt. (e.g. under the *.txt. text filter) the target filename for it would become test-fr.txt.
Note that the filename separator '.'. is not inserted by using ${extension}, it must be manually included in the pattern if you want it.
Continuing, perhaps the country code would be nice also. It could be added with
${nameOnly}-${targetLanguageCode}-CA.${extension}
for the case of French as the target language and Canada as the country. In this case test.txt. would become test-fr-CA.txt.
In this case, there is already a variable for the target country code available so insert that variable to give
${nameOnly}-${targetLanguageCode}-${targetCountryCode}.${extension}
Much more simply for the case at hand, just insert the target language in place of the target language code and target country code filename variables.
${nameOnly}-${targetLanguage}.${extension}
The result is the same in the last two cases as before. In this example the target filename now has the language (language plus country code) tacked on as a suffix to all of the files names before the extension, the extension is unaltered. Questions & Answers How is the filter that works with a file determined?
This is determined by a file's pattern. Each filter lists the source file pattern of the files it can handle. This can be set by adding or editing a file filter. Each filter may have a distinct pattern associated with it. For example, if the plaintext filter is to be used to handle all files without an extension, add or edit a filter to have a *.. source filename pattern. There are also a few preset filters that work with a specific file type, but use different encodings. For example, a Latin-1 encoded plaintext file can be recognized by the .txt1 extension. All files can be given a particular pattern ahead of using them in a project so they will automatically be recognized. In the example, the extension was the determining factor for associating files with a filter. What if a file encoding seems incorrect?
Set up another encoding instead by changing the Source File Encoding of the appropriate file filter. What can be done if a single encoding does not work well for the source and target languages?
In reality, UTF-8, UTF-16, and UTF-32 encodings will work in almost all cases. It is possible to use one of these encodings for source and target together. To use two different encodings for source and target ensure that the appropriate encodings are chosen for each character set and that the source document files are saved in the proper encoding before opening the project. OmegaT Logo Previous Tagged (Formatted) Files Text Editing Next Project Files and Directories