Top of page

Collection Software, E-Resource OpenSpeaks Data Pages Open Speaks Data Pages

About this Item

Title

  • OpenSpeaks Data Pages

Other Title

  • Open Speaks Data Pages

Names

  • Panigrahi, Subhashish, researcher, filmmaker

Created / Published

  • [India] : OpenSpeaks

Headings

  • -  Odia language--Dialects
  • -  Endangered languages--India--Odisha

Genre

  • Data sets

Notes

  • -  Website for dataset created 2018.
  • -  OpenSpeaks is a project to document underrepresented languages. It was founded by Subhashish Panigrahi, a documentary filmmaker and open culture advocate who divides his time between India and Canada. The present dataset on the Odia language includes nearly 70,000 audio recordings of Odia-language words, phrases and sentences in the Waveform Audio File Format (WAVE) as a part of the ongoing speech data project OpenSpeaks Voice: Odia. The dataset includes speech recordings of two speakers, nearly 600 recordings by the late Musamoni Panigrahi in the Balesoria/Baleswari (Northern) dialect of Odia which are extracted from the archival footage used for the 2022 documentary Nani Ma, and the remaining by Subhashish Panigrahi in both the Balesoria and Mugalbandi (Central) of Odia which are made using the web-based open-source tool Lingua Libre. The subfolders for each speaker includes sub-subfolders for words or phrases beginning with or of vowels, consonants and numerals. Recordings of or with each character in its beginning has separate folders within the vowel and consonant folders. Each such folder, containing recordings by Subhashish Panigrahi, has a .csv metadata file containing track-level information (title in Odia in Unicode, transliterated title in ALA-LC Romanization, filename in HEX NCR, and duration in second) whereas there is just one .csv file containing metadata for all the recordings by Musamoni Panigrahi. A guide book is also included in the dataset, detailing the purpose and process of data collection, drawing from a paper and talk by Subhashish Panigrahi at the Web Conference (WWW) 2022 titled "Building a Public Domain Voice Database for Odia". Three open-source text replacement converters were built and used to handle the filename and title conversations, available online on the Odia Wikipedia. This dataset includes data from four periodic releases, OpenSpeaks Voice: Odia Volume I and II, and Balesoria-Odia Volume I and II, of February-March 2023 as DVD-ROMs. This is the largest speech data repository in the Odia language under a Public Domain release at the time of reporting, with over 25 hours of recording.
  • -  Nearly 66,000 words (approximately 22 hours) in Odia under a CC0 1.0 (Public Domain) License on Wikimedia Commons (August 2022); another 8 hours of voice data of recording of sentences under CC0 1.0 on Mozilla Common Voice (March 2022).
  • -  English and Odia.
  • -  Title from OpenSpeaks website (viewed April 29, 2024). Title divised by the cataloger.

Medium

  • Online resource (datasets)

Call Number/Physical Location

  • PK2569.4

Repository

  • s-Online Electronic Resource

Digital Id

Library of Congress Control Number

  • 2024307217

Rights Advisory

  • The data in this dataset was captured and curated by Subhashish Panigrahi and made available with his permission for research and scholarly purposes. The Library of Congress does not warrant the accuracy of this data. All downstream uses are the sole responsibility of the user; the data may be subject to copyright, publicity rights, privacy rights, or other legal interests.
  • Creative Commons Attribution 4.0 International license.

Additional Metadata Formats

Rights & Access

The Library of Congress is providing access to The Selected Datasets Collection for educational and research purposes. The Library has obtained permission for the use of many materials in the Collection, and presents additional materials for educational and research purposes in accordance with fair use under United States copyright law. Researchers should watch for modern documents that may be copyrighted (for example, published in the United States more than 95 years ago, or unpublished and the author died less than 70 years ago).

You are responsible for deciding whether your use of the items in this collection is legal. You are also responsible for securing any permissions needed to use the items. You will need written permission from the copyright owners of materials not in the public domain for distribution, reproduction, or other use of protected items beyond that allowed by fair use or other statutory exemptions. Some content may be protected under international law. You may also need permission from holders of other rights, such as publicity and/or privacy rights.

More about Copyright and other Restrictions

Credit Line: Library of Congress, Digital Collections Management and Services Division

Cite This Item

Citations are generated automatically from bibliographic data as a convenience, and may not be complete or accurate.

Chicago citation style:

Panigrahi, Subhashish, Researcher, Filmmaker. OpenSpeaks Data Pages. [India: OpenSpeaks, 2018] Software, E-Resource. https://www.loc.gov/item/2024307217/.

APA citation style:

Panigrahi, S. (2018) OpenSpeaks Data Pages. [India: OpenSpeaks] [Software, E-Resource] Retrieved from the Library of Congress, https://www.loc.gov/item/2024307217/.

MLA citation style:

Panigrahi, Subhashish, Researcher, Filmmaker. OpenSpeaks Data Pages. [India: OpenSpeaks, 2018] Software, E-Resource. Retrieved from the Library of Congress, <www.loc.gov/item/2024307217/>.