Overview

The Chinese Text Project (CTP) offers a digital library and research tool focused on early Chinese texts, providing access to a corpus of classical Chinese literature, philosophy, and historical documents. Established in 2006, the platform digitizes and organizes texts, along with accompanying dictionaries and translation aids, to support academic study and computational analysis. Its API enables developers and researchers to programmatically query and retrieve data from this extensive collection, which encompasses a wide range of texts from the pre-Qin period through the Qing dynasty.

The CTP API is designed for use cases such as building specialized research applications, integrating classical Chinese text data into digital humanities projects, performing large-scale computational linguistics studies, and developing educational tools. Users can access text content, metadata, and dictionary definitions. For example, researchers can retrieve specific passages for semantic analysis, extract character definitions for lexical studies, or compile data sets for machine learning models focused on ancient Chinese. The API's capabilities extend to searching across the corpus, retrieving full text, and accessing linguistic annotations. Developers can utilize the API to automate tasks that would otherwise require manual text processing, thereby streamlining research workflows.

The project emphasizes accuracy and comprehensive coverage, making it a resource for scholars of sinology, history, philosophy, and linguistics. The API’s design supports various data retrieval methods, allowing for granular control over queries and returned data formats. This flexibility is crucial for interdisciplinary projects that combine traditional textual scholarship with modern computational methods. The CTP project aligns with broader trends in digital humanities, which seek to apply computational tools to humanistic inquiry, as discussed by organizations such as the W3C Digital Humanities Interest Group. By providing structured access to unstructured text data, the CTP supports initiatives aimed at quantitative analysis of historical languages.

The platform also includes translation tools and textual criticism features, which can be integrated into custom applications via the API. This allows developers to build systems that not only retrieve texts but also assist in their interpretation and comparison. The API supports a generous free tier for individual researchers and projects with defined rate limits, with options for custom access for institutions or projects requiring higher volumes of data or specialized services. This tiered approach ensures accessibility for a broad user base while accommodating intensive research needs.

Key features

  • Extensive Text Database: Access to a corpus of classical Chinese texts, including philosophical works, historical records, and literary pieces from various dynastic periods.
  • Dictionary Integration: Programmatic access to dictionary definitions and lexical information for individual characters and compounds within the classical texts.
  • Text Search Capabilities: Advanced search functionality to locate specific phrases, characters, or concepts across the entire text collection.
  • Metadata Retrieval: Access to metadata associated with each text, such as author, date, and textual variants.
  • Translation Tools: Features to aid in the translation of classical Chinese, which can be integrated into custom applications.
  • Text Analysis Tools: Support for extracting linguistic features and performing various forms of computational text analysis.

Pricing

The Chinese Text Project API offers a tiered access model:

Plan Description Key Features
Free Tier Generous usage with rate limits. Access to the full text database and dictionary; suitable for individual researchers and small projects. Rate limits are applied per IP address and request frequency.
Custom Access Tailored plans for higher volume or specialized needs. Increased rate limits, dedicated support, and potential for custom data exports or integrations. Pricing determined on a case-by-case basis.

Pricing as of 2026-05-28. For detailed rate limits and custom plan inquiries, refer to the Chinese Text Project API documentation.

Common integrations

  • Digital Humanities Platforms: Integration with platforms like Omeka or Scalar for annotating and presenting classical Chinese texts within digital exhibitions.
  • Computational Linguistics Frameworks: Use with Python libraries such as NLTK or SpaCy for advanced text processing, part-of-speech tagging, and semantic analysis of classical Chinese.
  • Academic Research Tools: Embedding CTP data into custom research applications developed in Python, R, or other languages for specialized studies.
  • Educational Software: Development of language learning apps or historical research tools that leverage CTP's dictionaries and text corpus.

Alternatives

  • CBETA Chinese Electronic Tripitaka API: Focuses specifically on the Buddhist canon, offering a different specialized corpus.
  • Scripta Sinica: An Academia Sinica project providing access to historical Chinese texts, often with different editions and focuses.
  • Wenlin Institute: Offers character dictionaries and text analysis tools with a different approach to data organization and API access.
  • DPLA (Digital Public Library of America): While broader in scope, some of its collections may include digitized classical Chinese texts, though not with the specialized focus of CTP.

Getting started

To begin using the Chinese Text Project API, you can make HTTP requests to its endpoints. The API reference outlines the structure of these requests for accessing texts and dictionary entries. Below is an example in Python demonstrating how to retrieve a text and a dictionary entry.


import requests
import json

# Base URL for the Chinese Text Project API
base_url = "https://ctext.org/" # Note: CTP documentation directs API requests to the main domain with specific pathing

# --- Example 1: Retrieve a classical Chinese text ---
# Let's try to retrieve a part of the Analects (Lunyu)
# The specific path for API requests is often /api/ or similar, as per CTP's API documentation
# For text retrieval, the API might use parameters like 'text' for the text ID and 'level' for granularity.
# The CTP API documentation specifies paths like 'https://ctext.org/{{id}}?api=true&level=all'

text_id = "lun-yu"
# This example is illustrative. Actual API parameters for text retrieval may vary.
# Refer to the official CTP API documentation for precise endpoint and parameter details.
# For general text retrieval, a common pattern might be:
# text_endpoint = f"{base_url}{text_id}?api=true&level=all"
# For demonstration, let's use a known public page that supports API query for content:
text_example_url = f"{base_url}confucianism/analects/zh?searchmode=api"

try:
    print(f"\nAttempting to retrieve text from: {text_example_url}")
    text_response = requests.get(text_example_url)
    text_response.raise_for_status() # Raise an HTTPError for bad responses (4xx or 5xx)
    text_data = text_response.json()
    print("Text Retrieval (partial output):")
    # The actual structure of the JSON for text content needs to be verified from CTP API docs.
    # This assumes a 'content' key, which is common.
    if 'result' in text_data and 'text' in text_data['result']:
        # Print first few sentences or a summary
        sample_text = text_data['result']['text'][0]['text'] if text_data['result']['text'] else "No text content found."
        print(sample_text[:200] + "..." if len(sample_text) > 200 else sample_text)
    else:
        print("Could not parse text content from response structure.")
        print(json.dumps(text_data, indent=2)[:500] + "...") # Print partial raw JSON for inspection
except requests.exceptions.HTTPError as err:
    print(f"HTTP error occurred: {err}")
except requests.exceptions.RequestException as err:
    print(f"Other request error occurred: {err}")
except json.JSONDecodeError:
    print(f"Failed to decode JSON from text response: {text_response.text[:500]}...")


# --- Example 2: Retrieve a dictionary entry for a character ---
# The CTP API documentation indicates dictionary lookups often use a path like /dictionary.pl?word={{character}}&api=true

character_to_lookup = "仁" # 'Ren' - benevolence
dictionary_endpoint = f"{base_url}dictionary.pl?word={character_to_lookup}&api=true"

try:
    print(f"\nAttempting to retrieve dictionary entry for '{character_to_lookup}' from: {dictionary_endpoint}")
    dict_response = requests.get(dictionary_endpoint)
    dict_response.raise_for_status() # Raise an HTTPError for bad responses (4xx or 5xx)
    dict_data = dict_response.json()
    print(f"Dictionary entry for '{character_to_lookup}':")
    # The structure of dictionary data needs to be verified from CTP API docs.
    # This assumes a 'result' key containing a list of entries.
    if 'result' in dict_data and dict_data['result']:
        first_entry = dict_data['result'][0]
        print(f"  Pinyin: {first_entry.get('pinyin', 'N/A')}")
        print(f"  Definition: {first_entry.get('definition', 'N/A')}")
    else:
        print("No dictionary entry found for this character.")
        print(json.dumps(dict_data, indent=2)[:500] + "...") # Print partial raw JSON for inspection
except requests.exceptions.HTTPError as err:
    print(f"HTTP error occurred: {err}")
except requests.exceptions.RequestException as err:
    print(f"Other request error occurred: {err}")
except json.JSONDecodeError:
    print(f"Failed to decode JSON from dictionary response: {dict_response.text[:500]}...")

Before executing, ensure you have the requests library installed (pip install requests). The specific structure of API responses, especially for text content, can vary. Always consult the Chinese Text Project API reference for the most up-to-date endpoints and response formats.