NLA - Background
The SEAS White Paper [Gardner-1] stated a number of problems with IBM's software and hardware products when they were used in countries outside the US. These were grouped into three categories:
National character problems
National language problems
Keyboard related problems
The value of this paper from 19985 is that it recognized the problems, not as isolated national problems but as a set of international problems. For a better understanding of the proposed architecture these problems are summarized here, because they still exist.
National Character Problems
Many of the requirements for character data encoding are described also in a White Paper published by SHARE Inc. in North America [Hart]. SHARE Europe has endorsed this SHARE Inc. White Paper, which focuses on specialized NLA requirements.
- Dual 3270 character sets:
- Several countries have had their 3270 national character sets split into two almost identical versions. The ultimate in confusion arises when «alternate» and «new» networks are tied together in an SNA/MSNF configuration.
- Character sets are incomplete:
- Multinational companies and agencies find themselves unable to enter and display correspondence which is written in different languages.
- Diacritical marks in Latin alphabet based languages:
- IBM products are only available with the national use-characters of selected European countries. It is not possible to write names of all employees correctly. There is no support for Slavic languages or even Turkish (a member of NATO).
- Inconsistent code points across DP/WP/PC:
- It is not possible to transfer documents from a word processing environment (for example the IBM 5520 Display Writer) to a data-processing environment without losing information.
- Code points for same graphic differ internationally:
- If letters are sent from one country to another, for example from Germany to Denmark, they may be written in a language not spoken in either of these countries (as for example French). For the text of the letter there will be no problem. To sign the document, the national characters of the sending country are needed and for addressing, those of the receiving country are needed.
- Graphic values for same code point differ internationally:
- This is the reverse situation to the one mentioned before. A letter sent from France to Germany by electronic mail or tape cannot be correctly read, because the French national characters are displayed as German national characters.
- Hardware limitations:
- The requirement to be able to use a specific national character set may impose hardware limitations on the customer which seriously reduce the value of a piece of equipment.
- Neighbouring countries:
- Any country having a neighbouring country with a different written language (and this means most countries) has a requirement to be able to process some or all of the national-use characters of its neighbours. At present there is either no support, or very rudimentary support.
- Entry and editing of «scientific» text:
- Greek characters and mathematical symbols are not part of the normal IBM character set. Today it is impossible to type and edit text containing Greek characters and mathematical symbols and have these characters visible on the terminal screen.
- Entering and editing text using a foreign alphabet, or special diacritical marks used in transliterations
- The demands of the end user for flexibility include the use of foreign alphabets (like Cyrillic) mixed with latin text, special diacritical marks. These are met in many hardcopy devices for document processing. However, it is not possible to enter and edit such texts and have the characters visible on the terminal used.
- User defined code points:
- No space remains in the code pages for user assigned graphics. Some installations require graphics like «diameter» or «horizontal».
- System use of code points:
- Besides the controls with codes X'00' to X'3F', some products require additional codepoints for internal use (as do the FORTRAN compilers).
- ASCII to EBCDIC Translation:
- For reading and writing (see ... ASCII tapes on an MVS system, only one translate table is in the system, and it is based on the «old» (see ... EBCDIC assignment or «character set 103 of code page 256». An installation has no choice of installing alternate tables to support various EBCDIC codes.
- Error graphic:
- There is a strong need for an «error graphic». Currently undefined code points (no graphic assigned to that code point) are presented on most devices as blank or minus «-» This is completely misleading.
- Currency Symbols:
- The code points for $, ¢ and £ lead an interesting life outside the U.S.A. Their code points vary from country to country, creating great havoc to the unwary.
- Text and code:
- Text is a flow of characters. Their meaning is highly dependent on the context. The graphics used have different meanings in different environments. Text with national-use-characters in it becomes meaningless, if this text was entered using different devices. To edit the White Paper we had to set up conventions on «national characters». A tape written in Denmark read into a system in Switzerland cannot be interpreted without an accompanying manual description!
National Language Problems
Once entered into the system, the various national use characters combine with the Latin alphabet to form text in a national language. This section describes the problems encountered in processing national language text. Also included are problems encountered in having software products «converse» in the desired national language. These problems are of special concern for products that reach the end-user directly - for example «information centre» products.
- Messages:
- Messages are not generally available in the national language of the country in which a given software product is being used. In most cases there is no easy, well-documented method for the customer to translate these messages. In some cases the text length available for a possible translation is too small.
- Reserved words:
- It is not generally possible for the customer to specify alternative reserved words or keywords in his own national language. This is of special concern in products which reach the end-user directly as for example Information Centre type products.
- Fixed text:
- It is not generally possible for the customer to specify fixed text (for instance the names of the months or colours) in his own national language - and it is mandatory that all user-directed output from an execution be entirely in the desired national language.
- Menus and Prompts:
- It is not always possible for the customer to specify the fixed text fields of a menu or a prompt in his own national language. Even when this can be done, there may be problems with fields which require fixed input in English.
- Dates:
- It is usually impossible to specify the form of the date which is to appear automatically on user-directed output. It is mandatory that dates supplied automatically by a product be in a nationally or locally understood format.
- Thousand and decimals:
- It is generally not possible to specify the delimiting character for editing thousands and decimals when the output is produced under the control of a standard product, or when input is interpreted under the control of a standard product.
- User-specified names:
- National characters are not always eligible for use in user-specified names, e.g. for file names.
- Lengths of names:
- The 8 character limit on the lengths of some variable names very often is too short to allow reasonable mnemotechnic use in some countries.
- Delimiters:
- The graphics used as syntax elements or delimiters in application input do not use the same code point in most countries. Very often they are taken from the set of «national use code points» (with US graphics $, #, @, [, ], { and }).
- Case conversion:
- Some products may still employ «folding» to convert lower case entered at terminals to upper case for further processing. This has obvious repercussions for national use characters which have code points between X'3F' and X'80' and are therefore unchanged by folding (OR-ing with X'40').
- Translating:
- Incorrect results are often obtained via the use of translate tables incorporated into products. They are invariably set up for US standard usage and do not reflect national use characters. In most products they are difficult to find and may not be documented.
- Double characters:
- There are several problems associated with the existence of «double characters» in written languages (for example: Å in some countries is equivalent to AA; uppercasing ß to SS). IBM has no hardware nor software solution for any of these problems.
- Sorting single case text:
- The order of items sorted alphabetically on one or more character fields is incorrect.
- Sorting mixed case text:
- It is impossible to sort mixed case text correctly according to national rules unless some type of «ALTSEQ» mechanism is used. The code points for the lower and upper case Latin alphabet in EBCDIC, ASCII, and PC-ASCII cause the difficulty. The situation is further complicated by the presence of national characters.
- Multiple languages:
- «Language» and «Country» issues are intermixed in most applications. This creates great problems in multilingual countries like Switzerland, Belgium and Canada. It must be possible to operate software in the language of one's choice.
Keyboard related Problems
- Non consistent character sets across products:
- Special characters are not placed at the same location on different keyboards in the same country. For example in France the number row is sometimes upper case, sometimes lower case.
- National and international keyboards:
- We don't know any keyboard able to easily enter the whole set of characters that are needed for current national or international work.
- Lack of grouping of functions:
- Existing keyboards show a variety of layouts for the special functions (like cursor control, program function keys). In many cases one person is responsible for several jobs and thus has to become familiar with all of these layouts. Of course the usage of so many different keyboards is prone to errors and decreases productivity.