Metadata Used in the NTI Reader

This page describes the structure and use of metadata included in the NTI Reader. For a description of the creation and style of the metadata see the Style Guide.

Metadata is data that describes data. Importantly, metadata enables resource discovery. Two basic kinds of metadata are used in the NTI Reader: corpus metadata and dictinary metadata. The corpus metadata is described in this page. The dictionary metadata is described on the page Development of the NTI Reader Dictionary. The NTI Reader corpus metadata describes the texts in the Chinese Buddhist canon, such as title, translator, and Taishō number. Dictionary metadata is data about words.

The corpus metadata is described in the table below.

Table: NTI Reader Corpus Metadata
Taishō number For example, T 1666 for the Treatise on the Awakening of Faith in the Mahāyāna.
Taishō volume The volume is not strictly needed since the number uniquely determines the identity of a text. However, it may help users navigating the canon.
Title Text titles in the Taishō in traditional Chinese and English or Sanskrit, where available. The titles in Chinese pinyin will be given if the English or Sanskrit title is not available. For example, Treatise on the Awakening of Faith in the Mahāyāna 《大乘起信論》. There may be several versions of the title, which may be helpful if the text is known by more than one name. The Taishō number can still be used to identify the text in this case.
Structure The links to the files and HTML pages containing the content of the text. This is typically, 'Scroll 1', 'Scroll 2', etc.
Section names For example Āgamas, Jātaka and Prajñāpāramitā
Attribution Translator, author, editor, other other person(s) that the text is attributed to
Method produced For example, translated, compiled, composed or spoken
Date range The date that the text is believed to have originated or been translated.
Genre These correspond roughly with the section names but are not exactly the same. For example, the Jātaka section contains avadāna in addition to Jātaka texts. The 九部經 jiǔbùjīng navāṅga-śāsana 'nine kinds of teaching' are also encompassed in genre.
Notes Freeform notes that may include English translations. if they exist; mapping to other canons, such as the Pali or Korean; Internal structure for large texts, such as the Āgamas.

Input of the metadata and text for the corpus could be entered by hand, ie by typing or by cut-and-paste. However, this would have taken too long. The project used scripting with Jupyter notebooks and Python. The Jupyter notebooks created are an intermediate artifact that is only used by the creator of the the metadata and text files. The notebooks are be saved to GitHub as part of the project for version tracking and future reuse.


Pinyin   English