NTI Reader

Dictionary Help

There are number of features of the NTI Reader dictionary intended to help non-native Chinese users read Chinese text, including literary Chinese and Buddhist texts. Those are explained here as well as some of the more advanced uses. Also, see the chinesenotes.com Help page for help using basic features.

Introduction

There are number of features of the NTI Reader dictionary intended to improve usability. You can find definitions of modern and literary Chinese (文言文) and Buddhist words. You do not have to look words up one at a time in the NTI Reader dictionary. You can simply paste an entire block of text into the input field. Even though Chinese text does not have spaces separating the words, you should be able to use the dictionary without knowing where words start or end in a text. In addition, the dictionary will help you find the best word sense.

If you prefer watching about how to use the site instead of reading about it, please see these videos:

Basic Use

For basic use of the dictionary type in or cut and paste in the Chinese or English for the word that you are looking for and click the Search button. You can enter either simplified or traditional Chinese to search on. If you don't get what you are looking for or don't understand the results then keep reading or look at the list of Help topics above.

Each word entry includes a list of word senses. Different word senses may also include different pronunciations and parts of speech (grammar). If you don't see the word sense that you are looking for in the summary page, follow the link to the detailed page for all word senses.

No matches found

If you do not get any matches for your search try searching the whole block of text that you are studying. This mode will always find a result if the input is one or more Chinese characters because it will break the string of characters down into individual characters. At least the individual characters will have entries. See the description in the section Breaking Chinese text into words for more details on this mode.

If the individual characters do not have entries or if you have a word that you feel should be in the dictionary please let me know by sending an email to alex@chinesenotes.com.

Breaking Chinese text into words

The dictionary can help you look up individual words or all the words in a block of text. Because Chinese words do not have spaces between them in sentences it can help to look up all the words in the text. Especially if you are a beginner or trying to make the jump from study to using Chinese in real life it can often be hard to figure out the boundaries between the words. This is where pasting a whole block of text into the dictionary search input field can help.

Disambiguating different senses of a Chinese word

Chinese is a rich language with a long history and this leads to words that have many different meanings. That is, the same character or combination of characters has different meanings in different contexts. There are two different cases of this: (a) a homonym, where the words are truly different, and (b) polysemy, where there a multiple related senses of the word. Chinese is a different from languages like English where more-or-less phonetic spellings are used because the same character may have multiple different pronunciations. In fact, 'spelling' does not really apply to Chinese. Determining the correct meaning for a word in a particular sentence or other context is called word-sense disambiguation.

The English equivalents in the dictionary are often separated by / characters. It is useful to give more than one English word describing a single Chinese word because the English words themselves can have multiple meanings. In addition, words usually have a range of meanings and can be used in a range of contexts. So, giving several English words can give a sense of the range of uses of a Chinese word. In addition to the range of meanings possible, a translator will need to choose the most appropriate word to use in a translation. It is useful to have several equivalents in the dictionary to choose from.

Let's look at an example, the character 是 (pinyin: shì), which is one of the most common words in Chinese that most often means 'is.' There are at least eight different meanings found for this character. The most common word sense in modern Chinese is verb is. However, the most common sense in literary Chinese is the pronoun this. Think of the word returned in the summary table as the word sense with the most likely best fit.

To disambiguate between modern and literary Chinese click the use Options link on the dictionary page and select the type radio button. Because there are so many different word senses of words in literary Chinese it can be hard to pick the most appropriate one. The other word senses can be seen by cliking on the word senses link.

When reading an article or a book you are aware of the sense of words based on the words around them. The NTI Reader dictionary helps you in the same way although it is only a statistical selection because computers cannot really think like people. If you paste a block of text into the input field in literary Chinese mode (the default) then it will select words based on both preceding words and the overall most frequently used words.

Consider another example: the word 他 (tā), which usually means he, the singular second person masculine pronoun in modern Chinese. That is what you find in the dictionary if you do a lookup in modern Chinese mode. However, if you lookup the word in literary Chinese mode, you will find that the first definition listed is the pronoun other / another. However, if you paste a block of text in you may get a better suggestion as the first entry. Try pasting the text block 哆他伽哆夜 from the dhāranī ending the Amitabha Sutra. You will find that the word sense listed first for 他 is ta, with the note used for the sound in mantras.

The dictionary does all this by basing the suggested word in the table of returned results based on statistical selection compiled from a treebank. A treebank, also known as a tagged corpus, is a body of text tagged for part of speech (noun, verb, adjective, etc). The NTI Reader dictionary treebank is based on literary Chinese Buddhist texts. It is tagged for both part of speech and word sense. There are two frequency tables: a unigram table for the overall frequency of words and a bigram table that records how common a word is given the word that came before it.

See Abbreviations for a list of abbreviations used.

See the References page for a full list of references.

Similar phrases

The website can also help to find similar words and phrases, if the term that you are searching for is not in the dictionary. The screenshot below shows the results for looking up a phrase. This is not in the dictionary but a close match is found and displayed.

Similar Terms Screenshot