NLA - Implementation considerations

Tagging Data
Data Type Key
Sorting Text
Properties of an Environment
Hierarchy of Defaults
Adaptability
Modular Keyboard
Typewriter Area of Keyboard

This section contains examples of suggested implementations for various requirements.

This paper calls for the rich functionality of a full implementation of the required NLA. Neither software developers nor end-users can wait until this final implementation is available. Hence there is need for a staged approach. As an example the following steps for implementing multilingual support can be envisioned:

all national languages enabled
different national languages on different systems in the network
different national languages on different work stations in the network
different national languages in different windows of same work station
more than one national language within one window of work station.

Similar remarks apply to the coding issue. Also for this area of the NLA a stepwise implementation must be considered:

Concentration on one single-byte code page in the EBCDIC and the ASCII area.
2-byte coding scheme incorporating the already existing DBCS (Double Byte Character Sets).
Ultimate multi-byte coding scheme.

Tagging Data

Textual data need a multitude of attributes (tags):

coding scheme
code page
character set
language
country
sorting scheme

Most of these may be specified globally (for an entire installation), but all of them must also be applicable to specific data items (except the sorting scheme). This is especially relevant in data base applications. It may be necessary to change the code page for personal names. Multilingual operation always must be assumed:

A data base may be tagged «Swiss German»
The system may be tagged «French»
An element (e.g. name) may be tagged «Turkish»

As stated also elsewhere in this paper an implementation of the NLA should not introduce new coding schemes. This is also relevant for the tags of data. Existing (e.g. ISO) language- and country codes must be used where possible.

Joining data base tables, especially for distributed, multilingual data bases requires that all elements be tagged, at least temporarily. It is desirable that

The whole database has (defaulting) tags
tables may have tags
columns may have tags
elements may have some exceptional tags

Hence it is necessary to indicate the fact whether children tags do exist or do not exist.

There is also a need to identify undisplayable information like

numeric data (integer, real, packed decimal)
program code
noncoded information (pictures)

Such data items must be prevented from any translation.

Data Type Key

For some text operating functions (like search, sort) the detailed information about the data (such as accents, case, special symbols) is needed in different stages of the process. All this information is there in the fully formed characters like an upper case A with grave accent (À) and can be associated with the code of that character. However, for processes like sort it is more convenient to have the information separated into «keys».

Let us call this the «processable» form of the data, whereas the fully formed characters represent the «external» form of the file. For some applications, like data base applications, it may be desirable to have the data already in processable form. For other applications it may be more economic to «convert on the fly».Whether the translation between these forms is applicable to data files directly (as in this description) or exists only «on the fly» during further processing of the text, is left to the implementor.

Many functions like sort or search in an NLA can only be performed correctly, if text is stored in its «richest possible form». [LaBonté-3] proposes sort keys for text data which define information about:

lower case form of text
diacritics (position, nature)
case of each character
special symbols (position, nature)

These keys separate the information spread across the characters of the original string. The created strings (keys) contain only one type of information. These keys are independent of the code page used to encode text data.

This method easily compares to numeric data types like Floating Point Real, which consists of:

mantissa
exponent
sign

Assume for example the French word Côté. Then the elements of the proposed data type become:

Element	Description	Example
`PK:` primary key	base characters	`cote`
`SK:` secondary key	accents	`«17»«16»«19»«16»`
`TK:` tertiary key	case	`«09»«08»«08»«08»`
`QK:` quarternary key	non alphabetic symbols	`«00»`

The coding of these data elements is demonstrated here with numeric values in angle brackets. The values are chosen not to conflict among themselves.

The secondary key (SK) may use a different sequence with respect to the language (left to right or right to left correspondence to the base characters in PK). For this key numeric values below those for characters are assigned to the accents. The values depend on the language. For French appropriate values are:

`«16»`	unaccented
`«17»`	acute accent
`«18»`	grave accent
`«19»`	grave accent
`«20»`	diaeresis
`«21»`	cedilla

Values for the tertiary order key (TK) are

`«08»`	lower case
`«09»`	upper csse

The quaternary key (QK) starts with a delimiter «00» to be distinguished from the values before, followed by a sequence of «position»«value» - pairs. This can be best demonstrated with another example, the English word vice-president:

`PK`	`vicepresident`
`SK`	`13 * «16»`
`TK`	`13 * «08»`
`QK`	`«00»«04»-`

Sorting Text

Current sort methods both in the ASCII and EBCDIC environment do not create sequences as expected by the user. They use code points (binary values) of the characters. However, many applications are based on this behaviour. So when implementing sort methods according to NLA rules the default processes must retain the old (wrong) results.

Properties of an Environment

The following properties must be independent from each other. However they may be grouped (for example by language and country):

Language

character set
character attributes
collating sequence (sort)

Country, culture

numeric punctuation
monetary punctuation
date representation
time representation

Other

coding
messages

The list of properties forming an environment must be extensible, for example for user defined categories.

These properties must be kept in a form suitable for modification with any common text editor, although this information is intended to be modified only by system administrators or the like. The model given by the utility setlocale() and its input-definitions [POSIX-2, IBM SC09-1264] demonstrates the needed functionality.

[Properties of the user environment]

Hierarchy of Defaults

It is obvious that within a particular application or even installation most of the attributes of the text-data can be considered constant. Nevertheless the National Language Architecture must provide full flexibility in the various stages of execution and user interaction.

There need to be «nested» environments, with their own sets of attributes. This is to transfer the defaults from global to local scope. The layers needed are:

Installation, system
Session, batch-job
Step, split screen, window, task
Application (e.g. special sort order)
File (included files may use different coding schemes)
User (help desk may use different language from end-user)

Implementation of an NLA assumes that all data items (at least textual data) are tagged. Where no specific tags exist for an item, the global definition is assumed.

Hence with the total absence of any specific tags only defaults are active. This works also in non-NLA-installations, where no tags exist, but some new applications may use NLA functions.

Because the set of defaults may not be consistent among installations or systems, exported data must be tagged.

Some scenarios illustrate the needed flexibility of independent attribute settings:

The mother tongue of a person at a help desk in Switzerland is French, but as a person trained in Germany he uses a German keyboard. He receives a call from an end-user in Italian. The end-user has a Swiss-German Keyboard to complicate the situation further.
Someone is working on a German text, but prefers English help texts. He needs a German dictionary and spell checker. But he uses a French keyboard for arbitrary reasons.
A telephone operator switches his environment according to customer language.

The sophisticated set-up needed for these scenarios can not be expected to be implemented in the NLA in the beginning. However, any announcement must clearly show the direction, in which the NLA will expand over time.

Adaptability

Where flexibility and adaptability is recommended (e.g. in translation processes), a layered exit approach should be developed. That is, IBM should provide a standard exit where a user program can execute prior to calling an IBM option or function.

[Adapt the standard process - variant one]

This method creates n (in the example above: 3) pieces of special code for the user.

The following method is the better approach, but calls for a certain «granularity» of functions. It needs only one piece of user code, wherein all the specials are combined.

[Adapt the standard process - variant 2]

Tags to identify objects and their properties (attributes) are needed at various levels:

environments (asynchronous: batch, online or conversational)
objects
applications
devices

Modular Keyboard

Many of the keyboard requirements presented in earlier sections could be achieved by a modular keyboard. Such a modular design would reduce the number of keyboard variants, because the customer could choose the components he needs from a set like the following:

A numeric cluster could be changed to a block of Program Function Keys
Cursor control could be replaced by a track-ball
Terminal function keys (for emulations) could be added on demand

A modular design could specify the following elements (see also the figures later in this section):

Basic «typewriter area» with sufficient number of keys to serve all national variants. In particular an «international version» of such a keyboard must exist.
Alternatives for numeric input:
- Numeric cluster with calculator functions
- Numeric cluster with some calculator keys
Alternatives for program function keys:
- 3 x 4 Block of Programmed Function Keys (PFK) with some keys for «host functions». A shift mechanism provides access to 12, 24 or even 48 PFK's.
- Row of Programmed Function Keys, grouped in 3 x 4 keys. A shift mechanism provides access to 12, 24 or even 48 PFK's.
- Row of Function Keys, grouped in 3 x 4 keys, for local functions like set up, coulour selection, word wrap on input.
- Block of function keys arranged as on the PC's (2 x 5) which also could be used in emulations for local terminal functions (for example clear, erase eof).
- Block of function keys arranged as on some PC's (3 x 4 in a row) which also could be used in emulations for local terminal functions (for example clear, erase eof).
Alternatives for cursor control:
- Cursor control keys arrangement of a cross (+) (home in the centre) plus some local editing keys (character delete, character insert etc.). Both for these functions and the cursor functions a shift mechanism would be helpful functions like word delete or sentence delete as well as top or bottom.
- Track ball (inverted mouse) with two keys to «click» the cursor and hold it for «dragging»
- A mini tablet as pointing device
A keyboard help function must graphically show the key assignments with shifts of each key on the keyboard.

Keyboard labelling in general must use internationally defined symbols (see ISO 9995-6) rather than cryptic abbreviations like STRNG for the German word Steuerung (English equivalent: control).

Basic Keyboard building Blocks

Main areas of a keyboard

Alternatives of Program Function Keys

[Arrangements for function keys]

Alternatives for Cursor Control

[Arrangement for cursor and other keys; location of a track ball]

It is by no means sensible to preserve the appearant problems of the original PC keyboard. The dual use of the right-hand key-cluster for numeric input and cursor control should be abandoned from keyboards in favor of separate keys for these functions.

Typewriter Area of Keyboard

The «typewriter area» of keyboards must become more ergonomic. This is especially relevant to NLA issues, because national keyboard layouts tend to enlarge the number of keys. This will happen in particular, if more characters are to be supported on one keyboard. Despite the fact that all keyboard layouts based on the «current» ones are not ergonomic in the pure sense, the following rules must be observed:

Normal sized hands of typists must be assumed. «Power-typers» do not have the hands of butchers. Hence only seldom used rows (row E - top row) should be extended with arbitrary keys.
International standards specify a vertical and horizontal distance between keys to be 19mm +/- 1mm. That is the distance may vary between 18 and 20mm! This tolerance is too large.
Shift-keys and carriage-return keys must be larger than symbol-keys. They also must be accessible from the home-row (row C).
Shift-lock belongs to the class of «alternate» keys and should not be as easily accessible as «caps-lock»
Since many traditional habits must be covered by a keyboard layout, a rich design («superfluous» keys for some countries) is desirable [Apple].
International layouts need 48 keys, so this number should be used also in the national variants of keyboards. There the unused keys may be «unused», but accessible by the keyboard-drivers in general.
ISO 9995 should be used for labelling function keys.

An installation should be able to use one keyboard that will satisfy a secretaries' requirement for a national keyboard and a programmer's requirement for a US-English keyboard.

URL:	Created: 1996-12-28	Updated:
© Docu+Design Daube, Zürich