NTI Reader

What’s New

An update to recent changes on this site.

2024-01-26:
New page on Tibetan Language Resources to help working between Chinese, Tibetan, and English.

2022-06-02:
Improved reverse index for looking up Chinese words from English, pinyin, and Sanskrit.

2022-05-14:
Improved performance by compressing larger and most commonly used files and caching then at edge locations closer to users' geographic locations using a CDN. This reduces page load times.

2021-12-12:
Added bibliographic notes and data for texts in the Esoteric Section of the Taisho Canon. Set an optimization for mobile users to avoid downloading of the dictionary file to the browser client to minimize bandwidth charges and optimize performance.

2021-05-21:
The site now uses a new icon with stylized characters 南天 Nan Tien, in honor of Nan Tien Institute, associated with Fo Guang Shan and located under the southern heaven in Wollongong, Australia.

The NTI Reader Buddhist Dictionary Chrome Extension is now available at the Chrome Store. The extension adds a context menu when text on a web page is selected so that a term can be used to look up an entry in the Chinese dictionary. You can lookup terms with simplified or traditional Chinese or pinyin, or reverse lookup with an English equivalent. See the demo on Youtube and extension support page for more details.

The Mahāvyutpatti Sanskrit-Tibetan-Chinese Buddhist Dictionary Chrome Extension is also now available at the Chrome Store. Mahāvyutpatti is a historic dictionary compiled for translation of Buddhist texts to Tibetan. See the demo on Youtube and extension support page for more details.

2021-04-10:
Published an experimental page for Multilingual Lookup.

2021-03-06:
Published Tour of the Chinese Buddhist Canon demo of this using the NTI Reader and related digital resources to explore the Chinese Buddhist Canon.

2021-01-31:
New guide published on Youtube with demo of new features Working with Sanskrit from Chinese with the NTI Reader. A new feature is red highlighting in the reader of Chinese terms with Sanskrit equivalents. This is designed to help with alignment of Chinese and Sanksrit texts. More details in the online help for >Working with Sanskrit.

2021-01-28:
New guide published on Youtube with demo of new features Translating from Chinese with the NTI Reader (January 2021 edition). These new features include

Quotation database: Find quotations in the reader and aligning with published English translations
Organization of the dictionary - named modern entities (people, places, companies, etc) now excluded from the NTI Reader to avoid confusion. Visit hbreader.org or chinesenotes.com if you need these.
Improved word definitions and coverage.

2020-09-05:
Moved the compute platform to Cloud Run. This will provide a more reliable backend to power the web site.

2020-08-04:
Added new Translation Memory feture to enable discovery of phrases and names that are close but not exact matches.

2020-05-31:
Created new file buddhist_named_entities.txt for Buddhist named entities (people, organizations, temples, etc) because thre could potentially be many thousands of these that are not included in dictionaries. Keeping them in a separate list allows for them to be managed sepraately. For the moment it has few entries and mainly serves as a placeholder.

2020-05-29:
A large update to vocabulary, now with a total of over 140,000 entries. Most of the new vocabulary is from CC-CEDICT with some cross checking in various sources. Modern named entities (people, products, organizations, etc) are steadily being excluded so that they are not confused in Buddhist texts.

2020-04-10:
A large update to vocabulary, now with a total of over 110,000 entries.

2020-02-15:
A new text tokenizer has been introduced that scans both left to right and right to left then compares the two. The scan with the least number of terms will be selected. There is rarely a difference but sometimes the difference can be important. For example, for the phrase from the Blue Cliff Record Koan 10:
只恐龍頭蛇尾
I’m afraid he has a dragon’s head but a snake’s tail (tr. Cleary 1998, p. 63).
Left to right tokenization would lead to
只、恐龍、頭、蛇、尾
Right to left tokenization results in
只、恐、龍頭蛇尾
Clearly, the right to left tokenization is better in this case and is selected because there are fewer tokens: 3 for right to left compared to 5 for left to right. The left to right scanning method misses the idiom 龍頭蛇尾 dragon’s head but a snake’s tail, which is skipped over because of the greedy tokenization method used is not always globally optimal.

A second change that has been introduced was to separate modern named entities into a separate file that is excluded from the NTI Reader. These modern named entities are occasionally conflated with terms in the Buddhist canon by random inclusion. For example, company names and names of modern countries, especially two character names. These seem obvious and silly to a human reader. The modern named entities are still included in the Humanistic Buddhism Reader for Venerable Master Hsing Yun's ollected works and and Chinese Notes for other Chinese literature. It will take some time to fully separate out these named entities from the general dictionary.

Other ways of improving the accuracy of the tokenizer are being evaluated.

The vocabulary has been updated with new words, corrections, and references. Please let me know if you see any problems by emailing alex@ntireader.org.

2020-01-10:
Summary of major recent updates: Vocabulary
Checking and expanding of the vocabulary with continued focus on literary Chinese. Current stats are: 93,821 headwords with 14,920 Buddhist terms

Usability
Better offline access has been added by writing a version of the dictionary that runs in JavaScript on the browser. So, even if you are totally disconnected from the Internet the dictionary and reader can still be used. This requires the dictionary to be downloaded to the browser in JavaScript form in the first place. If you are on a low bandwidth connection and never had an opportunity to do that then it still may not work offline.

When segmenting text the identified text segments, multi-character terms are also broken down into their parts. In the reader vocabulary dialog box, just click on any word. In dictionary lookup mode click on the 'split' link. If the text segmentation is not what you expect then check the individual parts. This requires the dictionary to be downloaded to the browser in JavaScript form, as for offline access. There may be some performance issues on long pages. Please email alex@ntireader.org if you notice problems.

English translations for multi-word expressions are shown under a 'Contained In' header in the headword pages. This makes it easier to scan example uses for different word senses.

Research
The JavaScript component is available at
https://www.npmjs.com/package/@alexamies/chinesedict-js

A publication describing the research on word networks with co-author Karen Deng is here:
Amies, A. and Deng, Y. 2019, “Identifying Keywords in the Buddhist Canon,” in 2019 Pacific Neighborhood Consortium Annual Conference and Joint Meetings (PNC), IEEE, https://ieeexplore.ieee.org/document/8939631, free download at academia.edu.

2020-01-04: Added contained in with translation to headword pages.

2019-12-25: Added term splitting. Updated vocabulary.

2019-08-10:

Added new page Search for Buddhist Terminology supporting searching based on any part of a term.
Vocabulary update

2019-07-01: Major update:

Migrated user interface from Material Lite to Material Design Web components
Added HTTPS. Now it is better to use https://ntireader.org
Vocabulary update

2019-06-22: Vocabulary update

2019-06-09: Fixed a problem with the word frequency analysis for the corpus: analysis/corpus_analysis.html The summary word frequency analysis was broken for some time and is now restored. Updated the vocabulary for the embedded Chinese-English dictionary. New numbers are:
Headwords: 90,300
Buddhist word senses: 14,503

2019-03-11: Create the group ntireader-announce, a low volume group for announcements.

2018-09-03: Added highlighted snippets to full text search. Renamed 'Advanced Search' to 'Full Text Search.'

2018-08-11: Added a new feature to search text in document bodies, within collections.”

2018-08-05: Added a new feature to search text in document bodies, labelled as “Advanced Search.”

2018/7/15 Updated to a newer HTML interface that allows users to view basic word information within documents via a dialog box in addition to mouse over and clicking a hyperlink to go to a new page. Added new functionality to search by title.

2018/2/13 Recent concentration has been on further increasing vocabulary and improving references for individual terms. The dictionary now contains over 81,000 headwords, including over 12,000 Buddhist terms.

2017/12/25 Recent concentration has been on increasing vocabulary and improving references for individual terms. The dictionary now contains over 69,000 headwords, including over 12,000 Buddhist terms.

2017/11/11 Recorded a presentation on Using the NTI Reader Chinese-English Dictionary - includes more detailed use of the dictionary.

2017/8/20 Recorded a second presentation on Using the NTI Reader Chinese-English Dictionary - includes more detailed use of the dictionary.

2017/8/19 Recorded a presentation on Basic Use of the NTI Reader - includes an introduction, basic use of the Chinese-English dictionary, and navigation of the Chinese Buddhist canon.

2017/5/5 Translation Memory See Using the NTI Reader as a Translation Aid - adds experimental support for a translation aid with a small number of 2-3 word translation units.

2017/3/6 Updated the page See Translation of Chinese Buddhist Texts with the NTI Reader to be more current based on recent experiences with translators and some study on the subject.

2017/2/11 Volumes 1-55 of the Taishō Shinshū Daizōkyō version of the Chinese Buddhist Canon have been added. See Taishō shinshū daizōkyō.

2016/5/22: Overhaul including:

Word detail pages are now arranged around a headword with word senses enumerated underneath. Occurrences of the words within larger words or phrases are listed. Frequent collocations and concordances ('Usage' section) of the words within the corpus is given. The word detail pages are now HTML with no PHP dependency, for better performance.
More texts have been added to the corpus, including volumes 1-2 and 5-8 of the Taishō.
More word entries are now referenced. There is a new page for the Abbreviations used in the word detail page and an expanded list of References.
New corpus management system with no PHP dependency, also for better perormance.

2015/12/27: Added forms: Tell us about a problem. Tell us about your experience on this web site using this form. Add a new word or suggest a change to a dictionary entry defintion with this form.

2015/2/23: Added the record of Yi Jing's travels A Record of the Buddhist Religion: As Practised in India and the Malay Archipelago 南海寄歸內法傳.

2015/2/8: NTI Reader used to help analyze and translate historic Chinese text for the Electronic Cultural Atlas Initiative Atlas of Maritime Buddhism.

2015/1/31: Update to sister site Chinese Notes to have a more modern look and be more compatible with the NTI Reader site.

2015/1/31: Update to word frequencies, incorporating over 10,000 words from the text of Record of Buddhistic Kingdoms 佛國記.

2014/12/31: Added a Chinese-English bilingual version of Record of Buddhistic Kingdoms 佛國記.

2014/12/31: Reorganized the Diamond Sūtra 金剛般若波羅蜜經 and related commentaries.

2014/12/24: Reorganized the Amitābha Sūtra 佛說阿彌陀經.

2014/12/24: Created a new text entry for the Sumati Sūtra Chinese-English Text 妙慧童女經<.

2014/12/14: Split the NTI Buddhist Text Reader (this site) of from chinesenotes.com.