What’s New
An update to recent changes on this site.
2024-01-26:
New page on Tibetan Language Resources
to help working between Chinese, Tibetan, and English.
2022-06-02:
Improved reverse index for looking up Chinese words from English,
pinyin, and Sanskrit.
2022-05-14:
Improved performance by compressing larger and most commonly used files
and caching then at edge locations closer to users' geographic
locations using a CDN. This reduces page load times.
2021-12-12:
Added bibliographic notes and data for texts in the Esoteric Section
of the Taisho Canon. Set an optimization for mobile users to avoid
downloading of the dictionary file to the browser client to minimize
bandwidth charges and optimize performance.
2021-05-21:
The site now uses a new icon with stylized characters 南天 Nan Tien,
in honor of Nan Tien Institute,
associated with Fo Guang Shan and located under the southern heaven
in Wollongong, Australia.
The NTI Reader Buddhist Dictionary Chrome Extension is now available at the Chrome Store. The extension adds a context menu when text on a web page is selected so that a term can be used to look up an entry in the Chinese dictionary. You can lookup terms with simplified or traditional Chinese or pinyin, or reverse lookup with an English equivalent. See the demo on Youtube and extension support page for more details.
The Mahāvyutpatti Sanskrit-Tibetan-Chinese Buddhist Dictionary Chrome Extension is also now available at the Chrome Store. Mahāvyutpatti is a historic dictionary compiled for translation of Buddhist texts to Tibetan. See the demo on Youtube and extension support page for more details.
2021-04-10:
Published an experimental page for Multilingual Lookup.
2021-03-06:
Published Tour of the Chinese Buddhist Canon demo of this using the
NTI Reader and related digital resources to explore the Chinese
Buddhist Canon.
2021-01-31:
New guide published on Youtube with demo of new features
Working with Sanskrit from Chinese with the NTI Reader. A new
feature is red highlighting in the reader of Chinese terms with
Sanskrit equivalents. This is designed to help with alignment of
Chinese and Sanksrit texts. More details in the online help for
>Working with Sanskrit.
2021-01-28:
New guide published on Youtube with demo of new features
Translating from Chinese with the NTI Reader (January 2021
edition). These new features include
- Quotation database: Find quotations in the reader and aligning with published English translations
- Organization of the dictionary - named modern entities (people, places, companies, etc) now excluded from the NTI Reader to avoid confusion. Visit hbreader.org or chinesenotes.com if you need these.
- Improved word definitions and coverage.
- Added new page Search for Buddhist Terminology supporting searching based on any part of a term.
- Vocabulary update
- Migrated user interface from Material Lite to Material Design Web components
- Added HTTPS. Now it is better to use https://ntireader.org
- Vocabulary update
- Word detail pages are now arranged around a headword with word senses enumerated underneath. Occurrences of the words within larger words or phrases are listed. Frequent collocations and concordances ('Usage' section) of the words within the corpus is given. The word detail pages are now HTML with no PHP dependency, for better performance.
- More texts have been added to the corpus, including volumes 1-2 and 5-8 of the Taishō.
- More word entries are now referenced. There is a new page for the Abbreviations used in the word detail page and an expanded list of References.
- New corpus management system with no PHP dependency, also for better perormance.
2020-09-05:
Moved the compute platform to Cloud Run. This will provide a more reliable backend to power
the web site.
2020-08-04:
Added new Translation Memory
feture to enable discovery of phrases and names that are close but not
exact matches.
2020-05-31:
Created new file buddhist_named_entities.txt for Buddhist
named entities (people, organizations, temples, etc) because
thre could potentially be many thousands of these that are
not included in dictionaries. Keeping them in a separate list
allows for them to be managed sepraately. For the moment it has
few entries and mainly serves as a placeholder.
2020-05-29:
A large update to vocabulary, now with a total of over 140,000 entries.
Most of the new vocabulary is from CC-CEDICT with some cross checking
in various sources.
Modern named entities (people, products, organizations, etc) are
steadily being excluded so that they are not confused in Buddhist
texts.
2020-04-10:
A large update to vocabulary, now with a total of over 110,000 entries.
2020-02-15:
A new text tokenizer has been introduced that scans both left to right
and right to left then compares the two. The scan with the least number
of terms will be selected. There is rarely a difference but sometimes
the difference can be important. For example, for the phrase from the
Blue Cliff Record Koan 10:
只恐龍頭蛇尾
I’m afraid he has a dragon’s head but a snake’s tail
(tr. Cleary 1998, p. 63).
Left to right tokenization would lead to
只、恐龍、頭、蛇、尾
Right to left tokenization results in
只、恐、龍頭蛇尾
Clearly, the right to left tokenization is better in this case and is
selected because there are fewer tokens: 3 for right to left compared
to 5 for left to right. The left to right scanning method misses the
idiom 龍頭蛇尾 dragon’s head but a snake’s tail, which is skipped over
because of the greedy tokenization method used is not always globally
optimal.
A second change that has been introduced was to separate modern named entities into a separate file that is excluded from the NTI Reader. These modern named entities are occasionally conflated with terms in the Buddhist canon by random inclusion. For example, company names and names of modern countries, especially two character names. These seem obvious and silly to a human reader. The modern named entities are still included in the Humanistic Buddhism Reader for Venerable Master Hsing Yun's ollected works and and Chinese Notes for other Chinese literature. It will take some time to fully separate out these named entities from the general dictionary.
Other ways of improving the accuracy of the tokenizer are being evaluated.
The vocabulary has been updated with new words, corrections, and references. Please let me know if you see any problems by emailing alex@ntireader.org.
2020-01-10:
Summary of major recent updates:
Vocabulary
Checking and expanding of the vocabulary with continued focus on
literary Chinese. Current stats are: 93,821 headwords with 14,920
Buddhist terms
Usability
Better offline access has been added by writing a version of the
dictionary that runs in JavaScript on the browser. So, even if you are
totally disconnected from the Internet the dictionary and reader can
still be used. This requires the dictionary to be downloaded to the
browser in JavaScript form in the first place. If you are on a low
bandwidth connection and never had an opportunity to do that then it
still may not work offline.
When segmenting text the identified text segments, multi-character
terms are also broken down into their parts. In the reader vocabulary
dialog box, just click on any word. In dictionary lookup mode click on
the 'split' link. If the text segmentation is not what you expect then
check the individual parts. This requires the dictionary to be
downloaded to the browser in JavaScript form, as for offline access.
There may be some performance issues on long pages. Please email
alex@ntireader.org if you notice problems.
English translations for multi-word expressions are shown under a
'Contained In' header in the headword pages. This makes it easier to
scan example uses for different word senses.
Research
The JavaScript component is available at
https://www.npmjs.com/package/@alexamies/chinesedict-js
A publication describing the research on word networks with co-author
Karen Deng is here:
Amies, A. and Deng, Y. 2019, “Identifying Keywords in the Buddhist
Canon,” in 2019 Pacific Neighborhood Consortium Annual Conference and
Joint Meetings (PNC), IEEE, https://ieeexplore.ieee.org/document/8939631, free download at
academia.edu.
2020-01-04: Added contained in with translation to headword pages.
2019-12-25: Added term splitting. Updated vocabulary.
2019-08-10:
2019-07-01: Major update:
2019-06-22: Vocabulary update
2019-06-09:
Fixed a problem with the word frequency analysis for the corpus:
analysis/corpus_analysis.html
The summary word frequency analysis was broken for some time and is now
restored. Updated the vocabulary for the embedded
Chinese-English dictionary. New numbers are:
Headwords: 90,300
Buddhist word senses: 14,503
2019-03-11: Create the group ntireader-announce, a low volume group for announcements.
2018-09-03: Added highlighted snippets to full text search. Renamed 'Advanced Search' to 'Full Text Search.'
2018-08-11: Added a new feature to search text in document bodies, within collections.”
2018-08-05: Added a new feature to search text in document bodies, labelled as “Advanced Search.”
2018/7/15 Updated to a newer HTML interface that allows users to view basic word information within documents via a dialog box in addition to mouse over and clicking a hyperlink to go to a new page. Added new functionality to search by title.
2018/2/13 Recent concentration has been on further increasing vocabulary and improving references for individual terms. The dictionary now contains over 81,000 headwords, including over 12,000 Buddhist terms.
2017/12/25 Recent concentration has been on increasing vocabulary and improving references for individual terms. The dictionary now contains over 69,000 headwords, including over 12,000 Buddhist terms.
2017/11/11 Recorded a presentation on Using the NTI Reader Chinese-English Dictionary - includes more detailed use of the dictionary.
2017/8/20 Recorded a second presentation on Using the NTI Reader Chinese-English Dictionary - includes more detailed use of the dictionary.
2017/8/19 Recorded a presentation on Basic Use of the NTI Reader - includes an introduction, basic use of the Chinese-English dictionary, and navigation of the Chinese Buddhist canon.
2017/5/5 Translation Memory See Using the NTI Reader as a Translation Aid - adds experimental support for a translation aid with a small number of 2-3 word translation units.
2017/3/6 Updated the page See Translation of Chinese Buddhist Texts with the NTI Reader to be more current based on recent experiences with translators and some study on the subject.
2017/2/11 Volumes 1-55 of the Taishō Shinshū Daizōkyō version of the Chinese Buddhist Canon have been added. See Taishō shinshū daizōkyō.
2016/5/22: Overhaul including:
2015/12/27: Added forms: Tell us about a problem. Tell us about your experience on this web site using this form. Add a new word or suggest a change to a dictionary entry defintion with this form.
2015/2/23: Added the record of Yi Jing's travels A Record of the Buddhist Religion: As Practised in India and the Malay Archipelago 南海寄歸內法傳.
2015/2/8: NTI Reader used to help analyze and translate historic Chinese text for the Electronic Cultural Atlas Initiative Atlas of Maritime Buddhism.
2015/1/31: Update to sister site Chinese Notes to have a more modern look and be more compatible with the NTI Reader site.
2015/1/31: Update to word frequencies, incorporating over 10,000 words from the text of Record of Buddhistic Kingdoms 佛國記.
2014/12/31: Added a Chinese-English bilingual version of Record of Buddhistic Kingdoms 佛國記.
2014/12/31: Reorganized the Diamond Sūtra 金剛般若波羅蜜經 and related commentaries.
2014/12/24: Reorganized the Amitābha Sūtra 佛說阿彌陀經.
2014/12/24: Created a new text entry for the Sumati Sūtra Chinese-English Text 妙慧童女經<.
2014/12/14: Split the NTI Buddhist Text Reader (this site) of from chinesenotes.com.