why crowdsourced transcription?
July 23, 2013
I signed up to try out FromThePage as “ultrasaurus” a couple of weeks ago. It’s creator Ben Brumfield (@benwbrum) recognized me from RailsBridge and the NPR story and reached out via Twitter. Ben’s blog, Collaborative Manuscript Transcription is a wealth of information about crowd-sourced transcription.
Jason Shen and I were able to connect on the phone while Ben was in Austin TX at Social Digital Scholarly Editing conference last week. Ben kindly gave us an overview of the landscape of crowd-source transcription projects and the open source software this is behind a few of them. We also got a glimpse of how he got started in this fascinating corner of next generation web tech and why he quit his day job to work on crowd-sourced transcription solutions full-time.
FromThePage started as a family history hobby. As a software developer, he was able to create a web site, originally based on MediaWiki, later moved to Ruby on Rails, to allow other people to help with transcription. Working with his great-great-grandmother’s diaries, he saw how a bunch of people could really do research on a topic together. He was inspired by wikipedia, by the idea of getting a community together — not just to comment, but actually edit, beyond the abilities of a single person. With a wiki format, someone who was a good typist could type it all up, another with special knowledge could make corrections, etc.
Wikipedia used to feature more prominently “what links here” — this is an index. He wanted to use this to figure out a way to link portions of text with other places those subjects are mentioned. He very quickly found mediawiki is not the tool — the difference between text and articles about the text were not clear.
One of the challenges is deciding how to handle the material, what are the guidelines for transcription: encoding abbreviations and incorrect spellings, etc.? There very detailed, technical solutions like TEI, an XML format, and less formal text markup. He started reviewing other systems, posting on his blog, speaking on the topic for the last two years. A number of organizations were interested in using FromThePage. Some people pay, some can’t — it’s open source. After 15 months of doing this full-time, he has worked on enhancements on FromThePage, a new transcription tool for structured data, and other sites built with different tech, based on the needs of the content and the community.
Ben’s personal mission is to transform the landscape of what amateurs do with their own material. Right now, if you are someone who has a lot of old historic papers or diaries. what you do is sit down and write a book about it — and you probably write a crappy book. Ben would like these folks to provide their materials in a way that other people can use it.