NTI Buddhist Text Reader

Using the NTI Reader as a Translation Aid

One of the main purposes of the NTI Reader is to help translators. This page describes the ways that the web site can be used in different translation workflows. Canonical texts included on the site may be used to help translate or the site may be used to translate general Buddhist texts. Answers to more general questions are in the Frequently Asked Questions page.


Getting Started

Several of the most immediate benefits of using the NTI Reader for translation are that

  1. It can save you time by looking up many words at the same time
  2. You can look at the dictionary definitions for whole phrases or sentences at one time.
  3. The NTI Reader contains a large number of Buddhist terms, literary Chinese words, and modern Chinese words, which saves you from having to buy and consult a collection of different specialist dictionaries. If you do need more information on a term, the NTI Reader helps with references using a system of abbreviations in the dictionary entry notes.

Translation involves more than just looking up words. Translation needs to consider the lexicon, the syntactic structure of each language, and the message that the source text describes (Vinay & Darbelnet 1995, pp. 11-12). The NTI Reader aims to help translators by allowing them to look at the English equivalents of many Chinese words at one time and to drill down on individual Chinese words to relate these back to their use in canonical texts.

Translation can be difficult for any document because of a number of different challenges, including the different structure, conventions, and cultural conventions of the two languages and their environments (Vinay & Darbelnet 1995, p. 4). Translation of a canonical text will be especially difficult because it will start with the challenging task of understanding a source text written in a very different time and culture. This will most likely start with a document analysis that may include

  1. Reading the source document, historic or modern, and using the web site to understand more about certain words or phrases.
  2. Getting a sense of an untranslated historic document. A full translation may not be feasible initially but you want to get an overall sense of the content and nature of the document, say by examining the keywords.
  3. Commentary or summary in a modern language, say from a secondary source, such as an encyclopedia.

Translation Terminology

In the process of translation, text in a source language is translated to text in a target language (Munday 2012, p. 28). In the context of translating works from the Chinese Buddhist canon the source language is Chinese and the target language is English.

A major theme that dominated translation theory before the twentieth century was the tension between word-for-word translation, which is a more literal translation, and sense-for-sense translation, which is a looser translation focused on the needs of the readers in the target language (Munday 2012, p. 47). Arguments for literal or word-for-word translation include keeping as true as possible to the source text (Munday 2012, pp. 57-59). Malmkjær gives more information on this topic with an overview of translation methodologies based on modern linguistic approaches (Malmkjær 2011, pp. 57-70).

Casual Use for a Private Translation

In this workflow a user does not join the GitHub project or have contact with the NTI Reader site owner but simply wants to use the web site as tool for a private translation. The user has their own private copy of the document and does not wish to share it. The user wants to use the site to search word meanings and for translation memory.

  1. Phrases or words are cut-and-paste from the private Word document into the word or phrase search page to find the word meanings.
  2. If a word is found that is not in the dictionary then the user can email the site onwer at alex@ntireader.org with the additional vocabulary to add.
  3. The user can copy material from the web site under the terms of the Creative Commons Attribution-Share Alike 3.0 License license.

Collaborative Translation

In this workflow a user will either join the GitHub project or send linguistic artifacts to the site owner by email. Typically, the linguistic artifacts will be either entries to the words file containing more or changed vocabulary or corrections to the files containing the canonical text. The artifacts can be added to the site and the dictionary and / or corpus rebuilt.

Getting the Sense of a Canonical Text

In this workflow the user wants to get the overall sense of a document in the Taisho.

  1. Find the document in the Table of Contents and browse the text. Mouse over the Chinese text to get a quick sense of the meaning and click on any word to get a full definition that also relates the word to how it is used in the canon.
  2. Look at the content analysis for the text. A link for a content analysis for each text is given at the bottom of the colophon page for each document. The content analysis includes a list of proper nouns, frequencies of lexical words, frequencies of all words, and bigrams ordered by frequency. For example, for the Treatise on the Awakening of Faith in the Mahāyāna大乘起信論》 (T 1666) the colophon page is here and the content analysis is here. This may be useful for terminology extraction.


Language tone relates to the context it is used in and is indicated by word choice and other stylistic choices. For example, the modern English word 'deceased' indicates an administrative tone, while 'dead' indicates a conversational tone (Vinay & Darbelnet 1995, pp. 17-18). In translation of text from a modern source language to a modern target language the tone is usually preserved in the translation process. However, this is most often not the case for translation of historic texts, including canonical Buddhist texts.

The tone and style of translated Buddhist texts will usually vary with the intended audience. A target audience of lay Buddhist readers will probably appreciate a more digestable form compared with a target audience of Buddhist scholars will appreciate greater precision, and already be familiar with basic terminology. For example, consider translation of the term 公案 gōng'àn “kōan” or “gong'an”. A lay readership will probably more easily understand “koan”, without the diacritics, which is a word now included in general English dictionaries, such as the Oxford Living Dictionary. A translation for a scholarly audience will probably prefer to use the term “gong'an” for a Chinese Buddhist text or “kōan” with diacritics for a Japanese canonical text.

Sanskrit and Pali words are particularly numerous in canonical texts. Take the word 大悲 dàbēi “mahākaruṇā” or “great compassion”. A lay audience will probably prefer an English equivalent “great compassion” The problem with this is that it would be a guess to back translate from that English equivalent to the source word. This might be thought of as an imprecise translation by a scholarly audience, who may prefer “mahākaruṇā”, which the diacritics. This is one reason that the NTI Reader includes several English equivalents and an indication in the notes of which are Sanskrit, Pali, and Japanese. The NTI Reader uses the International Alphabet for Sanskrit Transliteration for diacritics of Sanskrit words, as described in the NTI Reader Style Guide.

Larger translation projects will generally use a style guide. If your project does not have one, you may consider consulting the Wisdom Publications’ Style Guide for books on Indian and Tibetan Buddhism, although it may not be sufficient for Chinese texts.

There are many more nuances to style in translation than this. For example, if the Chinese is itself a transliteration of a Sanskrit word then it may be best to keep the Sanskrit form in the target text.

If something about the NTI Reader makes it difficult for you to translate according to your style guide, please send an email to alex@ntireader.org.


Translation methodologies are often guided by linguistics. Vinay and Darbelnet developed a translation approach based on early modern linguistics in the 1950s (Malmkjær, 2011, p. 58-60). Thier approach views the translation process from the three levels of lexis, syntax, and message. One of the important aspects of their method is consideration of cultural context for both the source and target languages. The the approach of Catford lexis is dealt with using collocations and lexical sets (Malmkjær, 2011, p. 60-62). Nida's approach emphasizes grammatical structure using concepts from Chomsky's generative grammar (Malmkjær, 2011, p. 62-64). Bells approach and Halverson's approaches emphasize psycholinguistics and cognitive linguistics respectively (Malmkjær, 2011, p. 64-67).

According to the classic translation methodology text by Vinay and Darbelnet, a translation unit is “the smallest segment of the utterance whose signs a linked in such a way that they should not be translated individually” (Vinay & Darbelnet 1995, p. 21). They recognise several types of translation units

  1. Functional units - forming a syntactic group, for example, “at a location”
  2. Semantic units - having a unit of meaning, for example, “main feature”
  3. Dialectic units - expressing a unit of reasoning, for example, “on the other hand”
  4. Prosodic units - with the same intonation, for example, “You there!”

The NTI Reader can assist you identify translation units by (1) giving a holistic view of a sentence and (2) through finding collocations for specific words. Consider the word qiú 'to seek'. Looking at the detail page for this word you will find the collocation qiú lì “to seek profit”, which may help you identify the translation unit and choose a combination of English equivalents that fit well together.

Nida is a modern Bible translation scholar that has continued a tradition of religious translation. Nida distinguished two types of translation based on syntactical structure: ‘formal equivalence’ and ‘dynamic equivalence.’ Formal equivalence preserves the grammatical structure of the source language. Dynamic equivalence reorganizes the syntactic structure to read more naturally in the target language (Barnes 2011, pp. 44-46). Although the early version generative grammar that Nida's theory is based on is somewhat out of date now the concept of formal equivalence in contrast with dynamic equivalence is an informative idea.

Terminology Extraction

Terminology is vocabulary for a specialized domain. Buddhism has a very large amount of terminology and this includes subdomains and genres with their own terminology. Terminology extraction tools are software aids that help compile glossaries for terminology (Kenny, 2011, pp. 462-463). Although the NTI Reader was not specifically designed as a terminology extraction tool it has general corpus analysis features that can help in terminology extraction. The Corpus Analysis of the Taishō version of the Chinese Buddhist canon may be used to help extract terminology for the canon as a whole. The corpus analysis includes frequencies of lexical words, frequencies of all words, and frequencies of bigrams. Frequency lists of of lexical words exclude commonly occuring stop words, that are mainly function words that occur very frequently. Individual word entries relate the words back to the corpus with listing by frequency of occurrence, collocations, and usage examples.

Corpus Analysis includes Frequencies of Lexical Words by Genre, for genres such as āgama, jātaka, and avadāna. Individual texts include content analysis for the text, including proper nouns, rrequencies of lexical words, frequencies of all words, and bigrams.

Terminology Management

A termbank or terminology management system is like an electronic dictionary but more narrow in focus, often for a specific domain or built for a specific organization. One of the functions of a terminology management systems is extraction of terminology and the other is retreival as an aid to translators (Bowker 2002, pp. 77-78). If a terminology management is intended primarily for one organization then it can be more prescriptive for encoding in the target language. That is, rather than give several target language equivalents for a given word sense then the terminology management system may specify the exact equivalent that the term in the source language should be translated to.

The NTI Reader does terminology management through a system of labels for domains and subdomains. For example, consider the 真善美新聞獎 'Truthful, Virtuous, and Beautiful Media Award' entry. This is a term specific to Fo Guang Shan. The term is labels as domain: Buddhism, Sub Domain: Fo Guang Shan. The source of this and simililar terms is the Fo Guang Shan Terminology 佛光山詞彙 database (Fo Guang Shan 2017). The entries imported with permission from the Fo Guang Shan Terminology are stored in the file fgs_mwe.txt.


A collocation is a combination of words occurring together by convention rather than by free selection (Svensén 2009, pp. 158-159). For example, the word combinations ‘arouse suspicion’ and ‘notorious thief’ are collocations. It is important for L2 speakers to be made aware of the convention formed by the collocation, which would not otherwise be obvious (Svensén 2009, pp. 166-167). Collocations are also a good source of examples and translators may treat them as translation units.

How can we find the words, word senses, and names of people and places in a corpus? The process of vocabulary acquisition can be done by: 1. Reading texts manually. 2. Scanning texts automatically looking for characters not appearing in the dictionary, 3. examining collocations and 4. examining words of frequency since the most frequent words tend to have the greatest polysemy.

Corpora may be are used by lexicographers to perform word sense disambiguation in dictionary compilation (Atkins and Rundell 2008, loc. 3316-3324). Collocations and concordances are the primary tools for this. For example, the word fǎ, which may mean ‘law,’ ‘method,’ or one of several senses of ‘dharma’. In this example the collocation

        fǎ diàn
        Dharma + hall

Could only mean Dharma in the sense of the teachings of the Buddha, since the following word is the hall that the Dharma is studied. The example sentence confirms this

        zàolì fǎ diàn
        Build + dharma + hall -> “build a Dharma hall”

Collocations may be dictionary entries themselves. For example, the collocation 染法 rǎn fǎ ‘polluting dharma’ appears twelve times in the Awakening of Faith and is listed as an entry in the FGDB, which explains that it is a synonym of 煩惱 fánnǎo kleśa ‘mental affliction’ (FGDB, s.v. ‘染法’).

Svensén mentions that inclusion of collocations is a measure of dictionary quality and discusses how they should be included at length (Svensén 2009, pp. 169-183). While some collocations are included in the NTI Reader dictionary, most are included automatically from corpus analysis. Collocations are treated in the NTI Reader as two adjacent words. However, this is a gross simplification. For example, the English collocation “strong tea” can be modified to a number of combinations, such as a “strong cup of tea” or a “strong pot of tea.” This highlights a grammatical dimension that is not accounted for by any of the capabilities of the NTI Reader.

Translation Memory

A translation memory system stores source text segments and the matching translations in the target language (Bowker 2002, pp. 92-93). Bowker describes translation memory systems for long passages. Translation memory is a small scale, experimental feature in the NTI Reader. The NTI Reader takes a different approach to that described by Bowker in storing only very short 2-3 word phrases, of the size of the translation units of Vinay and Darbelnet, described above.

Only a very small number of phrases are stored for each published source, to avoid copyright problems. However, the phrases stored have a high frequency of occurrence in the corpus. The translation memory entries are marked with a TM label in the Notes section of each entry.

The translation memory entries are stored in Github in the file translation_memory_buddhist.txt.


  1. Atkins, BTS & Rundell, M 2008, The Oxford Guide to Practical Lexicography, Oxford University Press, Oxford.
  2. Barnes, R 2011, “Translating the Sacred”, tn Kirsten Malmkjær and Kevin Windle (Eds), The Oxford Handbook of Translation Studies, pp. 37–54, Oxford: Oxford University Press.
  3. Bowker, L 2002, Computer-Aided Translation Technology: a Practical Introduction, University of Ottawa Press, Ottawa.
  4. Fo Guang Shan 2015, “Fo Guang Shan Terminology”, FoGuangPedia, accessed 5 May 2017, https://sites.google.com/site/foguangpedia/foguangpedia-collection/b02_fgs-translation/fo-guang-shan-terminology.
  5. Kenny, D 2011, “Electronic Tools and Resources for Translators”, in: Kirsten Malmkjaer and Kevin Windle(eds), The Oxford Handbook of Translation Studies, Oxford University Press, Oxford.
  6. Malmkjær, K 2011, “Linguistic Approaches to Translation”, in Kirsten Malmkjær and Kevin Windle (Eds), The Oxford Handbook of Translation Studies, Oxford University Press, Oxford.
  7. Svensén, B 2009, A Handbook of Lexicography: the Theory and Practice of Dictionary-Making, Cambridge University Press, New York.
  8. Vinay, J-P & Darbelnet, J 1995, Comparative Stylistics of French and English: A Methodology for Translation, John Benjamins Publishing: Amsterdam and Philadelphia.