What's New

An update to recent changes on this site.

2020-02-15:
A new text tokenizer has been introduced that scans both left to right and right to left then compares the two. The scan with the least number of terms will be selected. There is rarely a difference but sometimes the difference can be important. For example, for the phrase from the Blue Cliff Record Koan 10:
只恐龍頭蛇尾
I’m afraid he has a dragon’s head but a snake’s tail (tr. Cleary 1998, p. 63).
Left to right tokenization would lead to
只、恐龍、頭、蛇、尾
Right to left tokenization results in
只、恐、龍頭蛇尾
Clearly, the right to left tokenization is better in this case and is selected because there are fewer tokens: 3 for right to left compared to 5 for left to right. The left to right scanning method misses the idiom 龍頭蛇尾 dragon’s head but a snake’s tail, which is skipped over because of the greedy tokenization method used is not always globally optimal.

A second change that has been introduced was to separate modern named entities into a separate file that is excluded from the NTI Reader. These modern named entities are occasionally conflated with terms in the Buddhist canon by random inclusion. For example, company names and names of modern countries, especially two character names. These seem obvious and silly to a human reader. The modern named entities are still included in the Humanistic Buddhism Reader for Venerable Master Hsing Yun's ollected works and and Chinese Notes for other Chinese literature. It will take some time to fully separate out these named entities from the general dictionary.

Other ways of improving the accuracy of the tokenizer are being evaluated.

The vocabulary has been updated with new words, corrections, and references. Please let me know if you see any problems by emailing alex@ntireader.org.

2020-01-10:
Summary of major recent updates: Vocabulary
Checking and expanding of the vocabulary with continued focus on literary Chinese. Current stats are: 93,821 headwords with 14,920 Buddhist terms

Usability
Better offline access has been added by writing a version of the dictionary that runs in JavaScript on the browser. So, even if you are totally disconnected from the Internet the dictionary and reader can still be used. This requires the dictionary to be downloaded to the browser in JavaScript form in the first place. If you are on a low bandwidth connection and never had an opportunity to do that then it still may not work offline.

When segmenting text the identified text segments, multi-character terms are also broken down into their parts. In the reader vocabulary dialog box, just click on any word. In dictionary lookup mode click on the 'split' link. If the text segmentation is not what you expect then check the individual parts. This requires the dictionary to be downloaded to the browser in JavaScript form, as for offline access. There may be some performance issues on long pages. Please email alex@ntireader.org if you notice problems.

English translations for multi-word expressions are shown under a 'Contained In' header in the headword pages. This makes it easier to scan example uses for different word senses.

Research
The JavaScript component is available at
https://www.npmjs.com/package/@alexamies/chinesedict-js

A publication describing the research on word networks with co-author Karen Deng is here:
Amies, A. and Deng, Y. 2019, “Identifying Keywords in the Buddhist Canon,” in 2019 Pacific Neighborhood Consortium Annual Conference and Joint Meetings (PNC), IEEE, https://ieeexplore.ieee.org/document/8939631, free download at academia.edu.

2020-01-04: Added contained in with translation to headword pages.

2019-12-25: Added term splitting. Updated vocabulary.

2019-08-10:

  1. Added new page Search for Buddhist Terminology supporting searching based on any part of a term.
  2. Vocabulary update

2019-07-01: Major update:

  1. Migrated user interface from Material Lite to Material Design Web components
  2. Added HTTPS. Now it is better to use https://ntireader.org
  3. Vocabulary update

2019-06-22: Vocabulary update

2019-06-09: Fixed a problem with the word frequency analysis for the corpus: analysis/corpus_analysis.html The summary word frequency analysis was broken for some time and is now restored. Updated the vocabulary for the embedded Chinese-English dictionary. New numbers are:
Headwords: 90,300
Buddhist word senses: 14,503

2019-03-11: Create the group ntireader-announce, a low volume group for announcements.

2018-09-03: Added highlighted snippets to full text search. Renamed 'Advanced Search' to 'Full Text Search.'

2018-08-11: Added a new feature to search text in document bodies, within collections.”

2018-08-05: Added a new feature to search text in document bodies, labelled as “Advanced Search.”

2018/7/15 Updated to a newer HTML interface that allows users to view basic word information within documents via a dialog box in addition to mouse over and clicking a hyperlink to go to a new page. Added new functionality to search by title.

2018/2/13 Recent concentration has been on further increasing vocabulary and improving references for individual terms. The dictionary now contains over 81,000 headwords, including over 12,000 Buddhist terms.

2017/12/25 Recent concentration has been on increasing vocabulary and improving references for individual terms. The dictionary now contains over 69,000 headwords, including over 12,000 Buddhist terms.

2017/11/11 Recorded a presentation on Using the NTI Reader Chinese-English Dictionary - includes more detailed use of the dictionary.

2017/8/20 Recorded a second presentation on Using the NTI Reader Chinese-English Dictionary - includes more detailed use of the dictionary.

2017/8/19 Recorded a presentation on Basic Use of the NTI Reader - includes an introduction, basic use of the Chinese-English dictionary, and navigation of the Chinese Buddhist canon.

2017/5/5 Translation Memory See Using the NTI Reader as a Translation Aid - adds experimental support for a translation aid with a small number of 2-3 word translation units.

2017/3/6 Updated the page See Translation of Chinese Buddhist Texts with the NTI Reader to be more current based on recent experiences with translators and some study on the subject.

2017/2/11 Volumes 1-55 of the Taishō Shinshū Daizōkyō version of the Chinese Buddhist Canon have been added. See Taishō shinshū daizōkyō.

2016/5/22: Overhaul including:

  • Word detail pages are now arranged around a headword with word senses enumerated underneath. Occurrences of the words within larger words or phrases are listed. Frequent collocations and concordances ('Usage' section) of the words within the corpus is given. The word detail pages are now HTML with no PHP dependency, for better performance.
  • More texts have been added to the corpus, including volumes 1-2 and 5-8 of the Taishō.
  • More word entries are now referenced. There is a new page for the Abbreviations used in the word detail page and an expanded list of References.
  • New corpus management system with no PHP dependency, also for better perormance.

2015/12/27: Added forms: Tell us about a problem. Tell us about your experience on this web site using this form. Add a new word or suggest a change to a dictionary entry defintion with this form.

2015/2/23: Added the record of Yi Jing's travels A Record of the Buddhist Religion: As Practised in India and the Malay Archipelago 南海寄歸內法傳.

2015/2/8: NTI Reader used to help analyze and translate historic Chinese text for the Electronic Cultural Atlas Initiative Atlas of Maritime Buddhism.

2015/1/31: Update to sister site Chinese Notes to have a more modern look and be more compatible with the NTI Reader site.

2015/1/31: Update to word frequencies, incorporating over 10,000 words from the text of Record of Buddhistic Kingdoms 佛國記.

2014/12/31: Added a Chinese-English bilingual version of Record of Buddhistic Kingdoms 佛國記.

2014/12/31: Reorganized the Diamond Sūtra 金剛般若波羅蜜經 and related commentaries.

2014/12/24: Reorganized the Amitābha Sūtra 佛說阿彌陀經.

2014/12/24: Created a new text entry for the Sumati Sūtra Chinese-English Text 妙慧童女經<.

2014/12/14: Split the NTI Buddhist Text Reader (this site) of from chinesenotes.com.