NTI Reader

Metadata Used in the NTI Reader

This page describes the structure and use of metadata included in the NTI Reader. For a description of the creation and style of the metadata see the Style Guide.

Metadata is data that describes data. Importantly, metadata enables resource discovery. Two basic kinds of metadata are used in the NTI Reader: corpus metadata and dictinary metadata. The corpus metadata is described in this page. The dictionary metadata is described on the page Development of the NTI Reader Dictionary. The NTI Reader corpus metadata describes the texts in the Chinese Buddhist canon, such as title, translator, and Taishō number. Dictionary metadata is data about words.

The corpus metadata is described in the table below.

Table: NTI Reader Corpus Metadata
Item	Description
Taishō number	For example, T 1666 for the Treatise on the Awakening of Faith in the Mahāyāna.
Taishō volume	The volume is not strictly needed since the number uniquely determines the identity of a text. However, it may help users navigating the canon.
Title	Text titles in the Taishō in traditional Chinese and English or Sanskrit, where available. The titles in Chinese pinyin will be given if the English or Sanskrit title is not available. For example, Treatise on the Awakening of Faith in the Mahāyāna 《大乘起信論》. There may be several versions of the title, which may be helpful if the text is known by more than one name. The Taishō number can still be used to identify the text in this case.
Structure	The links to the files and HTML pages containing the content of the text. This is typically, 'Scroll 1', 'Scroll 2', etc.
Section names	For example Āgamas, Jātaka and Prajñāpāramitā
Attribution	Translator, author, editor, other other person(s) that the text is attributed to
Method produced	For example, translated, compiled, composed or spoken
Date range	The date that the text is believed to have originated or been translated.
Genre	These correspond roughly with the section names but are not exactly the same. For example, the Jātaka section contains avadāna in addition to Jātaka texts. The 九部經 jiǔbùjīng navāṅga-śāsana 'nine kinds of teaching' are also encompassed in genre.
Notes	Freeform notes that may include English translations. if they exist; mapping to other canons, such as the Pali or Korean; Internal structure for large texts, such as the Āgamas.

Input of the metadata and text for the corpus could be entered by hand, ie by typing or by cut-and-paste. However, this would have taken too long. The project used scripting with Jupyter notebooks and Python. The Jupyter notebooks created are an intermediate artifact that is only used by the creator of the the metadata and text files. The notebooks are be saved to GitHub as part of the project for version tracking and future reuse.