NLA - Implementation considerations
Tagging Data
Data Type Key
Sorting Text
Properties of an Environment
Hierarchy of Defaults
Adaptability
Modular Keyboard
Typewriter Area of Keyboard
This section contains examples of suggested implementations for various requirements.
This paper calls for the rich functionality of a full implementation of the required NLA. Neither software developers nor end-users can wait until this final implementation is available. Hence there is need for a staged approach. As an example the following steps for implementing multilingual support can be envisioned:
- all national languages enabled
- different national languages on different systems in the network
- different national languages on different work stations in the network
- different national languages in different windows of same work station
- more than one national language within one window of work station.
Similar remarks apply to the coding issue. Also for this area of the NLA a stepwise implementation must be considered:
- Concentration on one single-byte code page in the EBCDIC and the ASCII area.
- 2-byte coding scheme incorporating the already existing DBCS (Double Byte Character Sets).
- Ultimate multi-byte coding scheme.
Tagging Data
Textual data need a multitude of attributes (tags):
- coding scheme
- code page
- character set
- language
- country
- sorting scheme
Most of these may be specified globally (for an entire installation), but all of them must also be applicable to specific data items (except the sorting scheme). This is especially relevant in data base applications. It may be necessary to change the code page for personal names. Multilingual operation always must be assumed:
- A data base may be tagged «Swiss German»
- The system may be tagged «French»
- An element (e.g. name) may be tagged «Turkish»
As stated also elsewhere in this paper an implementation of the NLA should not introduce new coding schemes. This is also relevant for the tags of data. Existing (e.g. ISO) language- and country codes must be used where possible.
Joining data base tables, especially for distributed, multilingual data bases requires that all elements be tagged, at least temporarily. It is desirable that
- The whole database has (defaulting) tags
- tables may have tags
- columns may have tags
- elements may have some exceptional tags
Hence it is necessary to indicate the fact whether children tags do exist or do not exist.
There is also a need to identify undisplayable information like
- numeric data (integer, real, packed decimal)
- program code
- noncoded information (pictures)
Such data items must be prevented from any translation.
Data Type Key
For some text operating functions (like search, sort) the detailed information about the data (such as accents, case, special symbols) is needed in different stages of the process. All this information is there in the fully formed characters like an upper case A with grave accent (À) and can be associated with the code of that character. However, for processes like sort it is more convenient to have the information separated into «keys».
Let us call this the «processable» form of the data, whereas the fully formed characters represent the «external» form of the file. For some applications, like data base applications, it may be desirable to have the data already in processable form. For other applications it may be more economic to «convert on the fly».Whether the translation between these forms is applicable to data files directly (as in this description) or exists only «on the fly» during further processing of the text, is left to the implementor.
Many functions like sort or search in an NLA can only be performed correctly, if text is stored in its «richest possible form». [LaBonté-3] proposes sort keys for text data which define information about:
- lower case form of text
- diacritics (position, nature)
- case of each character
- special symbols (position, nature)
These keys separate the information spread across the characters of the original string. The created strings (keys) contain only one type of information. These keys are independent of the code page used to encode text data.
This method easily compares to numeric data types like Floating Point Real, which consists of:
- mantissa
- exponent
- sign
Assume for example the French word Côté. Then the elements of the proposed data type become:
Element | Description | Example |
---|---|---|
PK: primary key |
base characters | cote |
SK: secondary key |
accents | «17»«16»«19»«16» |
TK: tertiary key |
case | «09»«08»«08»«08» |
QK: quarternary key |
non alphabetic symbols | «00» |
The coding of these data elements is demonstrated here with numeric values in angle brackets. The values are chosen not to conflict among themselves.
The secondary key (SK
) may use a different
sequence with respect to the language (left to right or right to
left correspondence to the base characters in PK
). For
this key numeric values below those for characters are assigned
to the accents. The values depend on the language. For French
appropriate values are:
«16» |
unaccented |
«17» |
acute accent |
«18» |
grave accent |
«19» |
grave accent |
«20» |
diaeresis |
«21» |
cedilla |
Values for the tertiary order key (TK
) are
«08» |
lower case |
«09» |
upper csse |
The quaternary key (QK
) starts with a delimiter «
00»
to be distinguished from the values before, followed by a sequence of «
position»«
value»
- pairs. This can be best demonstrated with another example, the English word vice-president
:
PK |
vicepresident |
SK |
13 * «16» |
TK |
13 * «08» |
QK |
«00»«04»- |
Sorting Text
Current sort methods both in the ASCII and EBCDIC environment do not create sequences as expected by the user. They use code points (binary values) of the characters. However, many applications are based on this behaviour. So when implementing sort methods according to NLA rules the default processes must retain the old (wrong) results.
Properties of an Environment
The following properties must be independent from each other. However they may be grouped (for example by language and country):
- Language
- character set
- character attributes
- collating sequence (sort)
- Country, culture
- numeric punctuation
- monetary punctuation
- date representation
- time representation
- Other
- coding
- messages
The list of properties forming an environment must be extensible, for example for user defined categories.
These properties must be kept in a form suitable for modification with any common text editor, although this information is intended
to be modified only by system administrators or the like. The model given by the utility setlocale()
and its input-definitions
[POSIX-2, IBM SC09-1264] demonstrates the needed functionality.
Hierarchy of Defaults
It is obvious that within a particular application or even installation most of the attributes of the text-data can be considered constant. Nevertheless the National Language Architecture must provide full flexibility in the various stages of execution and user interaction.
There need to be «nested» environments, with their own sets of attributes. This is to transfer the defaults from global to local scope. The layers needed are:
- Installation, system
- Session, batch-job
- Step, split screen, window, task
- Application (e.g. special sort order)
- File (included files may use different coding schemes)
- User (help desk may use different language from end-user)
Implementation of an NLA assumes that all data items (at least textual data) are tagged. Where no specific tags exist for an item, the global definition is assumed.
Hence with the total absence of any specific tags only defaults are active. This works also in non-NLA-installations, where no tags exist, but some new applications may use NLA functions.
Because the set of defaults may not be consistent among installations or systems, exported data must be tagged.
Some scenarios illustrate the needed flexibility of independent attribute settings:
- The mother tongue of a person at a help desk in Switzerland is French, but as a person trained in Germany he uses a German keyboard. He receives a call from an end-user in Italian. The end-user has a Swiss-German Keyboard to complicate the situation further.
- Someone is working on a German text, but prefers English help texts. He needs a German dictionary and spell checker. But he uses a French keyboard for arbitrary reasons.
- A telephone operator switches his environment according to customer language.
The sophisticated set-up needed for these scenarios can not be expected to be implemented in the NLA in the beginning. However, any announcement must clearly show the direction, in which the NLA will expand over time.
Adaptability
Where flexibility and adaptability is recommended (e.g. in translation processes), a layered exit approach should be developed. That is, IBM should provide a standard exit where a user program can execute prior to calling an IBM option or function.
This method creates n (in the example above: 3) pieces of special code for the user.
The following method is the better approach, but calls for a certain «granularity» of functions. It needs only one piece of user code, wherein all the specials are combined.
Tags to identify objects and their properties (attributes) are needed at various levels:
- environments (asynchronous: batch, online or conversational)
- objects
- applications
- devices
Modular Keyboard
Many of the keyboard requirements presented in earlier sections could be achieved by a modular keyboard. Such a modular design would reduce the number of keyboard variants, because the customer could choose the components he needs from a set like the following:
- A numeric cluster could be changed to a block of Program Function Keys
- Cursor control could be replaced by a track-ball
- Terminal function keys (for emulations) could be added on demand
A modular design could specify the following elements (see also the figures later in this section):
- Basic «typewriter area» with sufficient number of keys to serve all national variants. In particular an «international version» of such a keyboard must exist.
- Alternatives for numeric input:
- Numeric cluster with calculator functions
- Numeric cluster with some calculator keys
- Alternatives for program function keys:
- 3 x 4 Block of Programmed Function Keys (PFK) with some keys for «host functions». A shift mechanism provides access to 12, 24 or even 48 PFK's.
- Row of Programmed Function Keys, grouped in 3 x 4 keys. A shift mechanism provides access to 12, 24 or even 48 PFK's.
- Row of Function Keys, grouped in 3 x 4 keys, for
local functions like
set up, coulour selection, word wrap on input
. - Block of function keys arranged as on the PC's (2
x 5) which also could be used in emulations for
local terminal functions (for example
clear, erase eof
). - Block of function keys arranged as on some PC's
(3 x 4 in a row) which also could be used in
emulations for local terminal functions (for
example
clear, erase eof
).
- Alternatives for cursor control:
- Cursor control keys arrangement of a cross (+) (
home
in the centre) plus some local editing keys (character delete, character insert
etc.). Both for these functions and the cursor functions a shift mechanism would be helpful functions like word delete or sentence delete as well as top or bottom. - Track ball (inverted mouse) with two keys to «click» the cursor and hold it for «dragging»
- A mini tablet as pointing device
- Cursor control keys arrangement of a cross (+) (
- A keyboard help function must graphically show the key assignments with shifts of each key on the keyboard.
Keyboard labelling in general must use internationally defined
symbols (see ISO 9995-6) rather than cryptic abbreviations like STRNG
for the German word Steuerung
(English equivalent:
control).
Basic Keyboard building Blocks
Alternatives of Program Function Keys
Alternatives for Cursor Control
It is by no means sensible to preserve the appearant problems of the original PC keyboard. The dual use of the right-hand key-cluster for numeric input and cursor control should be abandoned from keyboards in favor of separate keys for these functions.
Typewriter Area of Keyboard
The «typewriter area» of keyboards must become more ergonomic. This is especially relevant to NLA issues, because national keyboard layouts tend to enlarge the number of keys. This will happen in particular, if more characters are to be supported on one keyboard. Despite the fact that all keyboard layouts based on the «current» ones are not ergonomic in the pure sense, the following rules must be observed:
- Normal sized hands of typists must be assumed. «Power-typers» do not have the hands of butchers. Hence only seldom used rows (row E - top row) should be extended with arbitrary keys.
- International standards specify a vertical and horizontal distance between keys to be 19mm +/- 1mm. That is the distance may vary between 18 and 20mm! This tolerance is too large.
- Shift-keys and carriage-return keys must be larger than symbol-keys. They also must be accessible from the home-row (row C).
- Shift-lock belongs to the class of «alternate» keys and should not be as easily accessible as «caps-lock»
- Since many traditional habits must be covered by a keyboard layout, a rich design («superfluous» keys for some countries) is desirable [Apple].
- International layouts need 48 keys, so this number should be used also in the national variants of keyboards. There the unused keys may be «unused», but accessible by the keyboard-drivers in general.
- ISO 9995 should be used for labelling function keys.
An installation should be able to use one keyboard that will satisfy a secretaries' requirement for a national keyboard and a programmer's requirement for a US-English keyboard.