Project Introduction

An overview of the Chinese Vocal Synthesis Archive (Project CVSA)

查看中文版本

Chinese Vocal Synth Archive (Project CVSA) is a dedicated platform for the collection, documentation, and preservation of information surrounding Chinese singing voice synthesis (SVS).

While several platforms systematically organize data within the Chinese virtual singer community, each serves a distinct niche:

  • Moegirlpedia (萌娘百科): A comprehensive wiki-style encyclopedia (MediaWiki) containing extensive records of Chinese virtual singer songs and voicebanks.
  • VCPedia: Established by former Moegirlpedia editors, this site serves as a specialized information aggregator focused exclusively on Chinese SVS content via a traditional wiki format.
  • VocaDB: A global collaborative database for Vocaloid, UTAU, and other synthesizers. While it hosts a vast majority of Chinese SVS works, it focuses primarily on structured metadata (artists, discography, and PVs).1
  • TDD (天钿Daily): A data-driven discussion site that periodically crawls and analyzes VC-related statistics to highlight industry trends and dimensions.

Identifying the Gaps

Despite their strengths, existing platforms face specific limitations:

  • Manual Overhead: Moegirlpedia, VCPedia, and VocaDB rely almost entirely on manual entry and human editing for song inclusion and updates.
  • Content Depth: VocaDB excels at metadata but often lacks descriptive context, such as background stories or detailed producer insights.
  • Scope: TDD focuses strictly on statistical trends and lacks qualitative or descriptive information about the works themselves.

Our Mission: Project CVSA

Project CVSA integrates the strengths of its predecessors while addressing these functional gaps. Our goal is to create a more efficient and descriptive archive by implementing:

  • Fully Automated Discovery: Programmatic identification and creation of new song entries.
  • Automated Metadata Extraction: High-efficiency harvesting of technical song data.
  • Dynamic Statistics: Automated collection and tracking of song performance metrics.
  • Hybrid Collaboration: While leveraging automation for data, we actively encourage community contributors to provide descriptive content and perform quality control.
  • Resource Integration: Under appropriate licensing, we cite and aggregate data from existing reputable sources to ensure a comprehensive knowledge base.

This document is provided under the CC BY-NC-SA 4.0 license.

  1. Excerpted from VocaDB, originally licensed under CC BY 4.0.

On this page