NLA - Annotated Bibliography
Not all of the literature mentioned hereafter is explicitely cited in the text of this paper. However all of these publications gave some input to the discussion of the National Language Architecture.
- Abderhalden Ulrich, National Language Support
Cookbooks. IBM Switzerland - Field Support centre,
Hints and tips for conversion of applications and data to the international code page 500. 327x terminal equipment, GDDM, ISPF/PDF, compilers, QMF, SDSF, DFSORT, TSO/E. List of machine readable material.
- 'The Script Manager'. Various sources including: Inside
Macintosh, Volume V; Apple Developers CD, HyperCard Stack
'The International Utilities Package'. The Apple
Programmer's and Developer's Association, Renton, WA,
USA, 1987 - 1989.
The Script Manager is a set of general text manipulation routines that let Macintosh applications function correctly with roman and non-roman writing systems. Also correct handling of dates and number punctuation is included. The process of adapting an application to different languages (localization) is made easier with this set of tools. The Script Interface System provides fonts for the target language; keyboard mapping tables; special routines to perform character input, conversion, sorting and text manipulation. Also transliterations between character sets are supported.
- d'Arielli, L., A message management system for
personal computers. IBM Systems Technical Journal,
vol 28, no 3, 1989.
A method using message skeletons is described.
- Bouch Debb, 'Adding National Language Support to
Applications'. Proceedings of SEAS Anniversary Meeting
1987, vol II, p 995-1015. Geneva, SHARE European
Summary of issues involved in NLS explained by the work done on a text processor running under VM/SP5.
- Canadian Standards Association, Canadian Alphanumeric
Ordering Standard for Character Sets of CSA Z243.4
Standard. Edited by Alain LaBonté. Canadian
Standards Association 1990.
This standard specifies and explains in detail the method of the four sorting keys (see also [LaBonté-3]).
- Daube, Klaus, Implementation of Swiss Character Set.
OBRZ AG, DTA-report 31. Zürich, 1989.
Proposal for change of hardware, software and data from DP94 coding to code page 500.
- Daube, Klaus, 'Text and Code - A Dragons Pond.'
Proceedings of the 30. G.U.I.D.E. conference in Basel. G.U.I.D.E.
Comparing the human habits of gesture and miming with codes in data processing shows a severe gap in understanding coding mechanisms. This is the source of many problems related to national language support in data processing applications.
This text also is available in French, kindly translated by Ministère des Communications du Québec, St. Foy, Canada,
- Daube, Klaus, National Language Support in SUSI.
OBRZ AG, Handbuch 410 (Dokumentverarbeitung mit SUSI),
Kapitel 410.20.35. Zürich 1990.
The text formatter SUSI supports code page switching (also within a file. A piece of text also may have the attribute language, to which hyphenation is bound. National keyboards can be used. The code page used internally is based on code page 037, the character set supports most Western and Eastern European languages.
- Daube, Klaus, Aufbau des OBRZ Runtime Systems.
OBRZ AG, Handbuch 400 (Technisch-wissenschaftliches
Rechnen am MVS System), Kapitel 400.50.10. Zürich 1989.
The OBRZ runtime systems is layered above the runtime system of the programming language FORTRAN. Among other functions it supports editing and output of messages. Message skeletons are kept on files (PDS members) and hold various information (error messages, trace messages, full description of routine). For the output of a message it is customized with actual data and it's presentation is adapted to available linesize and nesting level of the routine.
- Daube, Klaus, 'VS Pascal - Promises and Pitfalls'. Proceedings
of SEAS Anniversary Meeting 1989, vol II, p 1511 - 1518.
Geneva, SHARE Europe (SEAS), 1989.
Section 'Generalized Requirements for Compilers and Runtime Systems' details the need for National Language Support in the programming environment, as well as Error Handling and Reporting.
- Gardner Peter (ed.), SEAS National Character Task
Force: White Paper on national character, language and
keyboard problems. Geneva, SHARE European
Association, September 1985.
Collection of the problems encountered in Europe with IBM hardware and software for processing textual data.
- Gardner Peter (ed.), 'A Character Set Conversion in
Practice'. Proceedings of SEAS Anniversary Meeting
1988, vol II, p 1181-1192. Geneva, SHARE European
Case study about the character set conversion at Kommunedata, a service bureau for the Danish hospital authorities. The data accumulated were a mixture of the «old» and «new» IBM 3270 character set. Both the data and the customer's terminals had to be «converted» to the new character set.
- Hart, Edwin (ed.), ASCII and EBCDIC Character Set and
Code Issues in Systems Application Architecture,
Chicago, SHARE Inc., SSD #366, 1989.
This paper states that IBM can not create a consistent SAA on top of the existing character set and code inconsistencies. Although the paper describes the North American symptoms of the problem, the requirements are for the international problems.
- 'Bale, E. Jonathan and Kellogg E. Harry', Native
Language Support for Computer Systems. Hewlett
Packard Journal. June (1985), p 27-32.
NLS for the HP 3000 and other HP computers: Roman-8 character set; application message catalog; collating sequences; multilingual applications.
- HP 3000 Computer Systems - Native Language Support Reference Manual. Form number 32-414-90001.
- Device I/O and User Interfacing - HP-VX Concepts and Tutorial. Form number 97-089-90052.
- IBM GG24-3516
- National Language Technical Centre: Keys to Sort and
Search for Cultural Expected Results. First edition
February 1990. Form number GG24-3516.
This document is intended for anyone who needs or wants to understand how to perform sorts and searches on sorted data in a manner that matches the expectations of an ordinary end-user.
- IBM SC09-1264
- IBM C/370 User's Guide, Release 2. Form number
SC09-1264-03. IBM Canada Ltd. Department 849, Toronto,
Chapter 40 in this manual describes usage of the
setlocalelibrary routine, which does not cope with all requirements of the desired NLA. The function of this routine is less than specified by [UniForum-2].
Appendix E specifies Code Point Mappings which do not conform to any CECP due to duplicate definition of braces.
- IBM SC26-4351
- Common User Access: Panel Design and User Interaction.
Form number SC26-4351.
Besides the rules to describe the IBM user interface this book contains assignments of function keys to functions, keyboard layouts, assignments of emulation keys and an appendix about Supporting National Languages. An additional booklet with the same form number holds Translated Terms for some 15 languages.
- IBM SE09-8001
- National Language Information and Design Guide. Vol 1:
Designing enabled products, rules and guidelines.
Form number SE09-8001. IBM National Language Technical
Centre, Toronto, 1987.
Set of rules which must be followed in product development and guidelines to get maximum value from the products.
- IBM SE09-8002
- National Language Information and Design Guide. Vol 2:
National Language Support Reference Manual. Form
number SE09-8002. IBM National Language Technical centre,
Toronto, January 10, 1990.
This manual is directed at designers, planners, and vendors of computer products intended for international markets. The term National Language Support is concerned with more than languages and translation. It also demands attention to sorting, sequences, to date and time, to currency, and to many other national and cultural factors. A broad sample of such data is presented here.
- ISO 646-1983
- ISO 7-bit coded character set for information
International Reference Version of 'ASCII'
- ISO 6937/1
- ISO 6937/1 Information Processing - Coded character
sets for text communication - Part 1: General
Introduction. First Edition 1983-11-01. ISO, Geneva,
Nomenclature, Method of identification of graphics.
- ISO 6937/2
- ISO 6937/2 Information Processing - Coded character
sets for text communication - Part 2: Latin alphabetic
and non-alphabetic graphic characters. First Edition
1983-12-15. ISO, Geneva, 1983.
Character repertoire and coded representations, subrepertoirs. Use of non spacing diacritical marks, use of Latin alphabetic characters. This character set is adopted by CCITT for telematic services (e.g. Teletex, Videotex).
- ISO 6937/x
- The two parts of this standard are now under revision and to be combined into one (Committee Draft).
- ISO 8601
- Data Elements and Interchange Formats - Information Interchange - Representation of Dates and Times. 1988-06-15. ISO, Geneva, 1983.
- ISO 8859-1
- International Standard ISO 8859-1: Information
Processing - 8-bit coded graphic character sets - Part 1:
Latin alphabet No. 1, First Edition, ISO, Geneva,
February 15, 1987.
This defines the same graphics as IBM character set used in code pages 037-1, 500-1 and other country extended code pages.
- ISO 8859-2
- International Standard ISO 8859-2: Information
Processing - 8-bit coded graphic character sets - Part 2:
Latin alphabet No. 2, First Edition, ISO, Geneva,
February 15, 1987.
Characters for Eastern European languages.
- ISO 8859-5
- International Standard ISO 8859-5: Information Processing - 8-bit coded graphic character sets - Part 5: Latin/Cyrillic alphabet. First Edition, ISO, Geneva, June 1, 1988.
- ISO 8859-6
- International Standard ISO 8859-6: Information Processing - 8-bit coded graphic character sets - Part 6: Latin/Arabic alphabet. First Edition, ISO, Geneva, June 1, 1988.
- ISO 8859-7
- International Standard ISO 8859-7: Information Processing - 8-bit coded graphic character sets - Part 7: Latin/Greek alphabet. First Edition, ISO, Geneva, June 1, 1988.
- ISO 8859-8
- International Standard ISO 8859-8: Information Processing - 8-bit coded graphic character sets - Part 8: Latin/Hebrew alphabet. First Edition, ISO, Geneva, June 1, 1988.
- ISO 8859-9
- International Standard ISO 8859-9: Information
Processing - 8-bit coded graphic character sets - Part 9:
Latin alphabet No 5. First Edition, ISO, Geneva, June
Characters for Turkish are included here, replacing those for Icelandic.
- ISO 8884
- Keyboard layout for multiple Latin-alphabet languages
With additional shift functions a large set of characters can be accessed.
- ISO 9995
- Keyboard layout for Text and Office Systems Some
parts of this document are in the stage of Committee
Drafts, others are Draft International Standard.
The various parts cover general principles for kayboard layouts, alphanumeric section, numeric section, editing section, function section. Also symbols to be used to represent functions and the allocation of the letters on the keys are presented.
- ISO 10367
- DIS Repertoire of standardized coded graphic character sets for use in 8-bit codes
- ISO 10646
- CD Multiple Octet coded character set. ISO,
Geneva, 1989. At the stage of Committe Draft.
This coding scheme uses 4 bytes for all characters to define symbols used in China, Japan, Korea and Taiwan as well as all known alphabetic scripts, a large number of mathematical and other symbols. There are methods described to «compress» the data to 2 or 1 byte, if a limited number of graphics is used.
In the canonical form, four octets are used to represent each character defining the group, plane, row and cell, respectively. Graphic characters are restricted to four quadrants within any plane. This restriction provides for the use of this coded set with character coded control functions. The coding scheme preserves the established ISO philosophy on using code points (C0, G0, C1, G1 space for controls and graphics).
- Knuth, Donald E., The Art of Computer Programming, Volume 3 / Sorting and Searching. Addison-Wesley Publishing Company, Reading, Mass. 1973. ISBN 0-201-03803-X.
- LaBonté, Alain, 'Full National Language Support: A SHARE Europe Top Concern and the Key to Success'. Proceedings of SEAS Spring Meeting 1988, vol I, p 241-269 Geneva, SHARE European Association, 1988.
- LaBonté, Alain, 'National Language Activities in SEAS -
A Status Report'. Proceedings of SEAS Anniversary
Meeting 1989, vol I, p 204 - 220. Geneva, SHARE
Europe (SEAS), 1989.
This report reviews the work done at SEAS in this area starting at 1985 with the submission of the White Paper [Gardner-1].
- LaBonté, Alain, 'A New Data Type for National
Language?'. Proceedings of SEAS Anniversary Meeting
1989, vol II, p 1519 - 1523. Geneva, SHARE Europe
When comparing textual data with numeric data similarities can be seen in the necessary structure of the data. Sign, exponent and mantissa can be mirrored to base characters, accents, case and special symbols. A reduction technique is presented to lower storage requirements of this new structuring without the need to expand data for processing.
- Internationalisation, White Paper of the
/usr/group Technical Subcommittee of POSIX.
This paper describes the role of the IEEEE P100x standards committees. It also summarizes the handling of 'local conventions' like time and date formats and proposes a flexible solution with the setlocale() function. It also touches all NLA aspects described in this SHARE Europe White Paper.
- localdef - Define locale environment. Unapproved
draft of POSIX 1003.2, September 1989.
This is an extensive definition about the localedef utility which converts source definitions for locale categories into a format usable by the functions and utilities whose operational behaviour is determined by the setting of the locale environment variables (for example collating rules, character classification).
- Reinsch, Roger A., Is there still an Alphabet
in 2002? SEAS Anniversary meeting 1989, session 3.7O
This discussion assumes distributed data bases to be the norm very soon. Children of every culture learn their 'A B C's well and carry that fundamental understanding of the alphabet into everything they do. What happens when the rules break down? Do they resist the changes? And other questions were raised in this discussion.
- Joe Becker, Xerox Corporation and Lee Collins, Apple
Computer: Unicode Draft Design for a pure 16-bit
Character Code. September 1989.
"Unicode" is a proposal for a fixed-width 16-bit multilingual character encoding. The time has come to recognize that 16 bits are necessary and sufficient to represent all of the world's normal text characters, so there needs to be a single "wide" character type representing a clear assignment of 16-bit character codes. "Unicode" is proposed to be that encoding.
Unicode presumes predefined character properties (for example classes such as letter vs. symbol vs. punctuation vs. diacritical mark). Since text is viewed as passive, Unicode does not endorse the concept that text can cause or prescribe actions by means of «control codes»; only the 32-odd traditional «C0»-control codes of ISO 646 are included. An «escape mechanism» consists of sequences of 16-bit codes, interpreted as multi-character constructs. This mechanism is proposed to be used for defining rare ideographic characters.
- Standards Update: Internationalization (Draft 1).
The UniForum Technical Committee, December 1989.
An Update on the UniForum Technical Committee Working Group on Internationalization. Basically it covers the same aspects as [POSIX-1].
- An Architecture for POSIX Internationalization.
The UniForum Technical Committee, September 1989.
This document defines nine requirements for UNIX in international environments and develops the concept of «locale» categories: character set attributes, collating sequence, numeric and monetary editing, date and time editing.
- Wingen, Johan W. van, Coded Character Sets and
Programming Languages. ISO/IEC JTC1/SC2 N1961R and
ISO/IEC JTC1/SC22 N587R, Revised April, 1989.
Summarizing the requirements for compilers for support of national language: character set, symbol classes, comments, identifiers, string-constants.
- Wingen, Johan W. van, Sort Order Schemes in different
Languages. ISO/IEC JTC1/SC2 N211, January 1989.
Testing the sort algorithm from [LaBonté-3] and generalizing it to other languages than French.
- 'Native Language Support'. X/Open Portability Guide,
Chapter 4, December 1988.
X/Open proposes a Native Language Support with definitions kept in an NLS database about configuration data, collating sequence, character classes, language information and message catalogues. Details are equivalent to [POSIX-2].