Collection Software, E-Resource OpenSpeaks Data Pages Open Speaks Data Pages
About this Item
Title
- OpenSpeaks Data Pages
Other Title
- Open Speaks Data Pages
Names
- Panigrahi, Subhashish, researcher, filmmaker
Created / Published
- [India] : OpenSpeaks
Headings
- - Odia language--Dialects
- - Endangered languages--India--Odisha
Genre
- Data sets
Notes
- - Website for dataset created 2018.
- - OpenSpeaks is a project to document underrepresented languages. It was founded by Subhashish Panigrahi, a documentary filmmaker and open culture advocate who divides his time between India and Canada. The present dataset on the Odia language includes nearly 70,000 audio recordings of Odia-language words, phrases and sentences in the Waveform Audio File Format (WAVE) as a part of the ongoing speech data project OpenSpeaks Voice: Odia. The dataset includes speech recordings of two speakers, nearly 600 recordings by the late Musamoni Panigrahi in the Balesoria/Baleswari (Northern) dialect of Odia which are extracted from the archival footage used for the 2022 documentary Nani Ma, and the remaining by Subhashish Panigrahi in both the Balesoria and Mugalbandi (Central) of Odia which are made using the web-based open-source tool Lingua Libre. The subfolders for each speaker includes sub-subfolders for words or phrases beginning with or of vowels, consonants and numerals. Recordings of or with each character in its beginning has separate folders within the vowel and consonant folders. Each such folder, containing recordings by Subhashish Panigrahi, has a .csv metadata file containing track-level information (title in Odia in Unicode, transliterated title in ALA-LC Romanization, filename in HEX NCR, and duration in second) whereas there is just one .csv file containing metadata for all the recordings by Musamoni Panigrahi. A guide book is also included in the dataset, detailing the purpose and process of data collection, drawing from a paper and talk by Subhashish Panigrahi at the Web Conference (WWW) 2022 titled "Building a Public Domain Voice Database for Odia". Three open-source text replacement converters were built and used to handle the filename and title conversations, available online on the Odia Wikipedia. This dataset includes data from four periodic releases, OpenSpeaks Voice: Odia Volume I and II, and Balesoria-Odia Volume I and II, of February-March 2023 as DVD-ROMs. This is the largest speech data repository in the Odia language under a Public Domain release at the time of reporting, with over 25 hours of recording.
- - Nearly 66,000 words (approximately 22 hours) in Odia under a CC0 1.0 (Public Domain) License on Wikimedia Commons (August 2022); another 8 hours of voice data of recording of sentences under CC0 1.0 on Mozilla Common Voice (March 2022).
- - English and Odia.
- - Title from OpenSpeaks website (viewed April 29, 2024). Title divised by the cataloger.
Medium
- Online resource (datasets)
Call Number/Physical Location
- PK2569.4
Repository
- s-Online Electronic Resource
Digital Id
Library of Congress Control Number
- 2024307217
Rights Advisory
- The data in this dataset was captured and curated by Subhashish Panigrahi and made available with his permission for research and scholarly purposes. The Library of Congress does not warrant the accuracy of this data. All downstream uses are the sole responsibility of the user; the data may be subject to copyright, publicity rights, privacy rights, or other legal interests.
- Creative Commons Attribution 4.0 International license.