Telugu Communication Software with JavaTM: Present and Future
Prasad A. Chodavarapu
Abstract
This article looks at the present and future possibilities for developing powerful Telugu communication software with JavaTM. It takes a detailed look at raMgavalli, a transliteration based plain-text editor and web browser, and mEghasaMdESaM, an e-mail tool, developed by the author as proofs-of-concept. An attempt is also made to peek into the future based on the known future directions of JavaTM with Swing and Java-2D APIs and font technologies.
1.0 Introduction
The advent of internet and e-mail has dramatically heightened our expectations of what a Telugu software application should provide. We are no longer satisfied with stand-alone text processing capabilities, not that we have realised even this modest dream to a full measure. The need to communicate easily and effectively over multiple platforms with diverse fonts is the order of the day. JavaTM provides us with a few tools that make this task a little easier to accomplish. It promises to deliver the cherished goal of platform independence along with easy integration into web pages. Better still, it makes the original idea of a WYSIWYG editor easier to accomplish. It also lends itself easily to both keyboard based and transliteration based approaches. In this article, the present and future opportunities that JavaTM provides for developers of Telugu communication software are discussed. raMgavalli, a plain-text editor and web-browser, and mEghasaMdESaM, an e-mail tool, developed by the author as proofs-of-concept are described in detail. Also discussed are the future plans for improvement based on the known directions of Swing and Java-2D APIs. For the sake of completeness, existing standards, fonts, previous and ongoing efforts in Telugu software development and opportunities provided by other new technologies such as DHTML are also briefly discussed.
1.1 Character and Font Standards
The question of standards figures prominently in any discussion of multi-lingual software development. Lack of effective character set and font standards for non-Latin scripts has long been the stumbling block. Though the emphasis on internationalization of software has picked up in recent years, the infrastructure necessary for rapid development is yet to take firm roots.
Indian Standard Code for Information Interchange (ISCII) [1] and Unicode [2] are two character set standards that are relevant in the context of Telugu software development. Both ISCII and Unicode strive to exploit the structural similarities of Indic scripts by using a parallel code layout. Though the former existed since long, it continues to suffer in its effectiveness, as developers do not have easy access to the standard documents. Only a few leading firms and governmental bodies have utilized the power of this standard. Unicode is a recent standard based on ISCII but as it is an international standard, it is expected to finally bring order into the chaotic world of multi-lingual software development.
Font technologies that are primarily built for rendering of Latin scripts form yet another stumbling block for Telugu software development. Like for other world scripts that employ context based glyphs for rendering of characters, a character set standard for Telugu needs to be augmented by a font standard. The author knew of no such standard till recently. Grapevine is that the Indian Standards bodies have recently adopted a font standard called ISFOC. Few details are known though. This complete absence of any font standard has meant that all existing fonts are incompatible with each other, thus posing a big problem for the software developers.
1.2 Fonts and Application Software
Inspite of the absence of effective standards, this decade has seen a few impressive efforts both in the commercial and public domains. These can be broadly classified as keyboard based and transliteration based approaches. C-DAC, Pune [3] has an impressive array of products that can be used for most Indian languages including Telugu. They conform to the ISCII standard and employ a standard keyboard layout called INSCRIPT [1]. Some of their products are:
racana [4] and Anu Graphics [5] are other commercial packages that are available for high quality word processing in Telugu. pOtana fonts (public-domain) developed by Sri T.K. Desikachary [6] also come with a keyboard driver.
Public-domain efforts have mostly employed the transliteration based approach. Rice Transliteration Scheme (RTS) [7], a popular transliteration standard, provides a many-to-one mapping of Roman strings to Telugu characters. Rice Inverse Transliterator-2.0 (RIT) [8], the effort of Sri Ananda Kishore and Sri Rama Rao Kanneganti, is a RTS compatible, easy-to-use front-end for the TeluguTEX [9] package of Smt. Lakshmi Mukkavilli. TeluguTEX is a LaTEX based transliteration package with great many postscript fonts built using METAFONT. RIT-3.0 [10] is a complete re-write by Sri Juvvadi Ramana and used pOtana fonts. telugu lipi [11] of Sri Srinivas Sirigina is a RTS based WYSIWYG editor written in Visual Basic for use with a font of identical name. It can also be used to prepare HTML files in Telugu as well as send e-mail. However, it is only available for Windows. lEkha [12] is an ongoing project under the stewardship of Sri Juvvadi Ramana that aims to deliver integrated solutions for all Indian languages. It is based on a new standard called the Indian Standard Code (ISC) [13].
2.0 JavaTM as a Resource
The "Write Once, Run Everywhere" promise of JavaTM [14] naturally makes it very attractive for developers of Telugu communication software but there is more to JavaTM that makes it a very valuable resource. Use of Unicode standard for characters and strings, API that is geared to meet the needs of internationalization, capability to implement complicated GUI designs and ease of integration into the web are just a few of the many more attractions. The addition of Swing [15] and Java-2D [16] to the core API in 1.2 will further enhance its capabilities. It is in this context that raMgavalli and mEghasaMdESaM were built by the author as proofs-of-concept. What follows is a detailed discussion of the two.
2.1 raMgavalli and mEghasaMdESaM
raMgavalli [17] is primarily intended as a stand-alone text editor that can parse RTS strings, either from a local or remote (HTTP) file, or as the user types them, and produce visual Telugu text output in a font of user's choice. mEghasaMdESaM [18] is an e-mail application that makes use of the functionality of raMgavalli to send/receive mails in RTS. What follows is a discussion of some of the design issues involved in building these projects.
The absence of an effective font standard poses quite a few problems. Ideally, the user should be able to choose a font of choice for use with these applications. This requirement can be partially met as JavaTM is run-time extendible. RTS text input by the user is first converted to a standard character set such as Unicode. Conversion from this standard to the font-specific format is done with the help of a map provided by the user. In raMgavalli, this mapping is defined by the implementation of the TeluguFontMap [19] interface that defines just one abstract method- getFontMap(String), i.e., for every font that the user wishes to use, a corresponding class file that implements the TeluguFontMap interface should be provided. This requirement is obviously too stringent for most users and hence, class files for a few public domain fonts such as telugu lipi, tikkana and pOtana have been provided.
Parsing of RTS text is a computationally intensive task. The primitive definition of TextEvents in JavaTM-1.1 further compounds this problem. Two design decisions were made to tackle these problems. 1) Smallest unit of text would be a word rather than a character. This makes the task of parsing text modifications a lot easier as only those words that were modified need to be re-parsed. 2) Every document will be divided into several pages, each of which has no more than 'n' words. When large documents are loaded, the user will be able to edit the first few pages that have already been loaded at the same time that the rest are parsed in a background thread. This will complicate the user interface as the user can deal with only a single page at any instant, but would mean that the user will experience no harrowing delays however large the documents may be.
As mentioned above, TextAreas in Java-1.1 are primitive and are slated for major overhaul in 1.2 Swing APIs. raMgavalli adopts the well known Model-View-Controller (MVC) architecture on which Swing components are based. This not only provides upward compatibility with Swing, but also given that raMgavalli is only a proof-of-concept, provides an excellent framework for future development. The Model encapsulates all the text and language data as a collection of sub-models with one sub-model per page. All data changes are relayed as ModelEvents to registered ModelListener objects. View is a ModelListener that responds to ModelEvents by appropriately refreshing itself. View also provides user interface elements such as the TextArea and the MenuBar. The Controller receives user events from these UI elements and relays them to the active sub-model. Controller also implements the ModelMultiplexer interface that maintains the integrity of the Model by handling text underflows and overflows in different sub-models. The use of MVC model in this fashion greatly simplifies the task of dealing with the complex set of functions that a text editor has to perform.
mEghasaMdESaM consists of classes designed to read e-mail messages in RTS. It also implements the SMTP protocol to send e-mail messages. The user is provided the flexibility required to convert parts of received messages freely between RTS and Telugu. This is required as the use of RTS in e-mails is done loosely, with not much attention given to language switches and the like. The use of RTS in e-mail messages does not preclude recipients with no reading tools such as mEghasaMdESaM. This is important, as these tools are not available with a majority of users.
2.2 Future Directions
The addition of Swing components and Java-2D to the core API in 1.2 will further enhance the capability of tools such as raMgavalli. Swing API would help in developing better user interfaces with ability to handle multiple fonts and styled strings. Java-2D will help provide better access to font glyphs as well as help in producing complex renderings. Better integration with the web should also be possible with the help of HTML renderers either in the form of Swing components or beans [20]. At present, raMgavalli provides an applet called akshara darpaNaM [21] that can be used for displaying Telugu files in 1.1 compatible web browsers. However, one disappointing aspect is the lagging of both Netscape and MSIE in implementing newer elements of JavaTM.
JavaTM itself isn't enough to completely realise the dream of seamless integration of Telugu into modern computers. The critical link of a suitable font standard continues to be missing. Unfortunately, the industry is moving towards OpenTypeTM [22], a standard that doesn't seem to go far enough to accommodate Indic scripts. Given that most other scripts do get accommodated with OpenTypeTM doesn't bode well for Telugu and other Indic scripts. This may mean that applications have to continue to go to great lengths to provide multiple font support. Once again, JavaTM may come to rescue through the capabilities of Java-2D, though, it is not yet known to what extent.
One good sign though is the rapid development of Dynamic HTML (DHTML) that will allow web designers to embed fonts into web pages. Web Embedding Font Tools (WEFT) [23] are already becoming available. Both Netscape Communicator and MSIE support portable font resources (PFR) and this should bode well for the future of Telugu on the WWW.
3.0 Conclusion
JavaTM is a significant resource that can be taken advantage of in developing versatile communication tools for Telugu users. However, the continuing non-conformance of fonts to any standard will significantly hinder growth. New font technologies such as OpenTypeTM are also inadequate for the needs of Telugu. At the same time, advent of DHTML may ease integration into the web. Let us hope that the efforts of developers with the help of these technologies will ensure a prosperous future for Telugu in the 21st century.
References