Working with formatted texts

Formatting information present in the source file usually needs to be reproduced in the target file. OmegaT displays the in-line formatting information of supported formats (at present DocBook, HTML, XHTML, Open Document, and OpenOffice.org) as tags. Tags are usually not taken into account when considering string similarity for matching purposes. Tags that you have reproduced in the translated segment will be present in the translated document.


Formatting tags

Tag naming

The tags consist of one to three characters and a number. Unique number allows to group those tag that correspond to each other, and differentiate the tags that can have the same shortcut character, but are in fact different. The characters may or may not reflect the value of the formatting the tag represents (e.g. bold, italics, etc.)

Tag numbering

Tags are numbered in an incremental way by tag group. What we call "tag groups" here is either a single tag (like <br1>), on its own, or 2 tags forming a pair (like <f3> and </f3>). Within a segment, the first group (pair or singleton) gets the number 0, the second the number 1 etc. The first example below has 3 tag groups (a pair, a singleton, and then another pair), the second example only has one group (a pair).

Pairs and singletons

Tags always come either in singletons or in pairs. Single tags indicate formatting information that does not affect the surrounding text (extra space or line break for example).

	<segment 2132><b0><Ctr+N></b0>, <br1><b2><Enter></b2><end segment>

<br1> is a single tag and does not affect any surrounding text. Paired tags usually indicate style information that applies to the text between the opening tag and the closing tag of a pair. Whatever happens to a pair, the opening tag should always come before the closing tag.

	<segment 3167>Log file (<f1>log.txt</f1>) for tracking operations and errors.<end segment>

<f1> and </f1> are paired and affect log.txt.


Tags and sentence segmenting

OmegaT creates its tags before sentences segmenting is applied. Depending on the segmenting rules, tags may look like they do not respect the above rules of numbering and grouping.

	<segment 1554> <f0>Before: \. After: \s</f0><end segment>

There seems to be no problem here. Now, if we apply the default segmenting rules on this segment (the default OmegaT behaviour) we will have the following result:

	<segment 1990> <f0>Before: \. <end segment>
	<segment 1991> After: \s</f0><end segment>
	

Segmenting rules that apply are: segment after a period followed by a space . In the above example we see that segments taken one by one do not respect the pairing and numbering rules:<f0> and </f0> should be paired in the same segment, but are separated by the segmentation rules. In some cases that may even cause problems in the translation, when tags must be displaced in the target language to reflect the word order of that language (see Tag operations below).


Tag operations

Care must be exercised with tags. If they are accidentally changed, the formatting of the final file may be corrupted. The sound approach is "Don't fix, what's not broken". However, it is still good to know, what is possible and how to do it.

Tag group duplication:

To duplicate tag groups, just copy them in the position of your choice. Keep in mind that in a pair group the opening tag must come before the closing tag. The formating represented by the group you duplicated will be applied to the section where you duplicated it.

Example:

	<segment 0001><f0>This formatting</f0> is going to be duplicated here.<end segment>

After duplication:

	<segment 0001><f0>This formatting</f0> has been <f0>duplicated here</f0>.<end segment>

Tag group deletion

To delete tag groups, just remove them from the segment. Keep in mind that a pair group must have its opening as well as its closing tag deleted to ensure all traces of the formatting are properly erased, otherwise the translated file might be corrupted and may not open. By deleting a tag group you will remove the related formatting from the translated file.

Example:

	<segment 0001><f0>This formatting</f0> is going to be deleted.<end segment>

After deletion:

	<segment 0001>This formatting has been deleted.<end segment>

Tag group order modification

To change the order of a tag group to reflect a different language structure in the translation, simply put the tag group where it should be in the translation. The formatting will follow the part it is applied to.

Example:

	<segment 0001><f0>Formatting zero</f0> and <f1>formatting one</f1> are going to be inverted around.<end segment>

After order modification:

	<segment 0001><f1>Formatting one</f1> and <f0>formatting zero</f0> have been inverted.<end segment>

Tag group nesting

Modifying a tag group order may result into nesting a tag group within another tag group. This is possible as long as the enclosing group totally encloses the enclosed group. Extra care must be taken with nesting especially in the case of tag pairs that are not fully moved inside an enclosing group otherwise the translated file might be corrupted and may not open. The nested part will have both formats apply to it.

Example:

	<segment 0001><f0>Formatting</f0> <f1>one</f1> is going to be nested inside formatting zero.<end segment>

After nesting:

	<segment 0001><f0>Formatting <f1>one</f1></f0> has been nested inside formatting zero.<end segment>

Tag group overlapping

Overlapping is the result of bad manipulations of tag pairs and will certainly result in formatting corruption and sometimes in the translating file not opening at all. Example:

	<segment 0001><f0>Formatting</f0> <f1>one</f1> is going to be messed up.<end segment>

After bad manipulation:

	<segment 0001><f0>Formatting <f1>one</f0> </f1>is very messed up now.<end segment>

Tag group validation

The validate tags function detects changed tags (whether done deliberately or by accident), and indicates the affected segments. Use of this function will open a dialog with any suspected broken or bad tags in a document.

This function can be useful for tracking down bugs in a translated tagged text. This is often a problem in OpenDocument or OpenOffice.org files that will not open due to tag problems created in the process of translation. Fixing the tags and recreating the target documents again can often remedy the problem.

To open the window, use Ctrl+T.

The window features a 3 column table with:

The tags are highlighted in bold blue for easy comparison between the original and the translated contents.

Click on the link to activate the segment in the Editor.

Correct the error if necessary and press Ctrl+T to return to the tag validation window to correct other errors.

Tag errors are tag manipulations in the translation that do not reproduce the same tag order and number as in the original segment. Some tag manipulations are necessary and are benign, some will cause problems when the translated document is created.

Tags generally represent some kind of formatting in the original text. Simplifying the original text formating greatly contributes to reducing the number of tags.


Hints for tags management

Tags generally represent some kind of formatting in the original text. Simplifying the original text formatting greatly contributes to reducing the number of tags. Unifying used fonts, font sizes, colors etc should be considered if possible, to simplify the translation and reduce the number of possible tag errors. Take a look at the Tag operations section to see what can be done with tags. Remember that if tags bother you and formatting is not extremely relevant for the job at hand, removing most formatting from the source document will greatly ease the translation.

If you need to see the tags in OmegaT but do not need to retain most of the format in the translated document you are free not to add the tags to the translation. In this case pay extra attention to tag pairs since forgetting to delete one part of the pair will corrupt your document's formatting, which is what will keep a translated OpenOffice.org document to open. Since tags are included in the text itself, it is possible to use Segmentation rules to create segments with less tags. This is an advanced feature and experience is required to apply in properly.

Important: OmegaT is not yet able to detect mistakes in formatting fully automatically, so it will not prompt you if you make an error or change formatting to fit your target language better. Sometimes, however, your translated file will look distorted, and in case of OpenDocument / OpenOffice.org files will even refuse to open.


Legal notices