An update to recent changes on this site.
Moved the compute platform to Cloud Run. This will provide a more reliable backend to power the web site.
Added new Translation Memory feture to enable discovery of phrases and names that are close but not exact matches.
Created new file buddhist_named_entities.txt for Buddhist named entities (people, organizations, temples, etc) because thre could potentially be many thousands of these that are not included in dictionaries. Keeping them in a separate list allows for them to be managed sepraately. For the moment it has few entries and mainly serves as a placeholder.
A large update to vocabulary, now with a total of over 140,000 entries. Most of the new vocabulary is from CC-CEDICT with some cross checking in various sources. Modern named entities (people, products, organizations, etc) are steadily being excluded so that they are not confused in Buddhist texts.
A large update to vocabulary, now with a total of over 110,000 entries.
A new text tokenizer has been introduced that scans both left to right and right to left then compares the two. The scan with the least number of terms will be selected. There is rarely a difference but sometimes the difference can be important. For example, for the phrase from the Blue Cliff Record Koan 10:
I’m afraid he has a dragon’s head but a snake’s tail (tr. Cleary 1998, p. 63).
Left to right tokenization would lead to
Right to left tokenization results in
Clearly, the right to left tokenization is better in this case and is selected because there are fewer tokens: 3 for right to left compared to 5 for left to right. The left to right scanning method misses the idiom 龍頭蛇尾 dragon’s head but a snake’s tail, which is skipped over because of the greedy tokenization method used is not always globally optimal.
A second change that has been introduced was to separate modern named entities into a separate file that is excluded from the NTI Reader. These modern named entities are occasionally conflated with terms in the Buddhist canon by random inclusion. For example, company names and names of modern countries, especially two character names. These seem obvious and silly to a human reader. The modern named entities are still included in the Humanistic Buddhism Reader for Venerable Master Hsing Yun's ollected works and and Chinese Notes for other Chinese literature. It will take some time to fully separate out these named entities from the general dictionary.
Other ways of improving the accuracy of the tokenizer are being evaluated.
The vocabulary has been updated with new words, corrections, and references. Please let me know if you see any problems by emailing email@example.com.
Summary of major recent updates: Vocabulary
Checking and expanding of the vocabulary with continued focus on literary Chinese. Current stats are: 93,821 headwords with 14,920 Buddhist terms
English translations for multi-word expressions are shown under a 'Contained In' header in the headword pages. This makes it easier to scan example uses for different word senses.
A publication describing the research on word networks with co-author Karen Deng is here:
Amies, A. and Deng, Y. 2019, “Identifying Keywords in the Buddhist Canon,” in 2019 Pacific Neighborhood Consortium Annual Conference and Joint Meetings (PNC), IEEE, https://ieeexplore.ieee.org/document/8939631, free download at academia.edu.
2020-01-04: Added contained in with translation to headword pages.
2019-12-25: Added term splitting. Updated vocabulary.
- Added new page Search for Buddhist Terminology supporting searching based on any part of a term.
- Vocabulary update
2019-07-01: Major update:
- Migrated user interface from Material Lite to Material Design Web components
- Added HTTPS. Now it is better to use https://ntireader.org
- Vocabulary update
2019-06-22: Vocabulary update
Fixed a problem with the word frequency analysis for the corpus:
The summary word frequency analysis was broken for some time and is now
restored. Updated the vocabulary for the embedded
Chinese-English dictionary. New numbers are:
Buddhist word senses: 14,503
2019-03-11: Create the group ntireader-announce, a low volume group for announcements.
2018-09-03: Added highlighted snippets to full text search. Renamed 'Advanced Search' to 'Full Text Search.'
2018-08-11: Added a new feature to search text in document bodies, within collections.”
2018-08-05: Added a new feature to search text in document bodies, labelled as “Advanced Search.”
2018/7/15 Updated to a newer HTML interface that allows users to view basic word information within documents via a dialog box in addition to mouse over and clicking a hyperlink to go to a new page. Added new functionality to search by title.
2018/2/13 Recent concentration has been on further increasing vocabulary and improving references for individual terms. The dictionary now contains over 81,000 headwords, including over 12,000 Buddhist terms.
2017/12/25 Recent concentration has been on increasing vocabulary and improving references for individual terms. The dictionary now contains over 69,000 headwords, including over 12,000 Buddhist terms.
2017/11/11 Recorded a presentation on Using the NTI Reader Chinese-English Dictionary - includes more detailed use of the dictionary.
2017/8/20 Recorded a second presentation on Using the NTI Reader Chinese-English Dictionary - includes more detailed use of the dictionary.
2017/8/19 Recorded a presentation on Basic Use of the NTI Reader - includes an introduction, basic use of the Chinese-English dictionary, and navigation of the Chinese Buddhist canon.
2017/5/5 Translation Memory See Using the NTI Reader as a Translation Aid - adds experimental support for a translation aid with a small number of 2-3 word translation units.
2017/3/6 Updated the page See Translation of Chinese Buddhist Texts with the NTI Reader to be more current based on recent experiences with translators and some study on the subject.
2017/2/11 Volumes 1-55 of the Taishō Shinshū Daizōkyō version of the Chinese Buddhist Canon have been added. See Taishō shinshū daizōkyō.
2016/5/22: Overhaul including:
- Word detail pages are now arranged around a headword with word senses enumerated underneath. Occurrences of the words within larger words or phrases are listed. Frequent collocations and concordances ('Usage' section) of the words within the corpus is given. The word detail pages are now HTML with no PHP dependency, for better performance.
- More texts have been added to the corpus, including volumes 1-2 and 5-8 of the Taishō.
- More word entries are now referenced. There is a new page for the Abbreviations used in the word detail page and an expanded list of References.
- New corpus management system with no PHP dependency, also for better perormance.
2015/2/23: Added the record of Yi Jing's travels A Record of the Buddhist Religion: As Practised in India and the Malay Archipelago 南海寄歸內法傳.
2015/2/8: NTI Reader used to help analyze and translate historic Chinese text for the Electronic Cultural Atlas Initiative Atlas of Maritime Buddhism.
2015/1/31: Update to sister site Chinese Notes to have a more modern look and be more compatible with the NTI Reader site.
2015/1/31: Update to word frequencies, incorporating over 10,000 words from the text of Record of Buddhistic Kingdoms 佛國記.
2014/12/31: Added a Chinese-English bilingual version of Record of Buddhistic Kingdoms 佛國記.
2014/12/31: Reorganized the Diamond Sūtra 金剛般若波羅蜜經 and related commentaries.
2014/12/24: Reorganized the Amitābha Sūtra 佛說阿彌陀經.
2014/12/24: Created a new text entry for the Sumati Sūtra Chinese-English Text 妙慧童女經<.
2014/12/14: Split the NTI Buddhist Text Reader (this site) of from chinesenotes.com.