In 2010, author and professor Clay Shirky delivered a rousing TED talk in which he used the phrase ‘cognitive surplus’ to describe the one trillion hours of leisure time humans collectively have each year (a great deal of which is spent watching television), which could be harnessed to advance human knowledge through civic engagement. He concluded that: ‘free cultures get what they celebrate. [...If we] celebrate and support and reward the people trying to use cognitive surplus to create civic value [...] we’ll be able to change society’. One way that Galleries Libraries Archives and Museums (GLAMs) can harness cognitive surplus is through web-based crowdsourcing.
What can Crowdsourcing offer GLAMs?
Academic crowdsourcing invites members of the public to work with specialists to conduct research, for example, to transcribe documents or add metadata to a collection of images, video or audio clips. This data is used in real science, social science, or humanities investigations and should, ideally, lead to publication. Crowdsourcing within GLAMs may not be oriented around a specific research question or publication, but around making collections more accessible for future research and accessibility. GLAM crowdsourcing can be the seedbed of future scholarly research.
Zooniverse is the world leading research crowdsourcing platform, hosting over sixty bespoke projects in diverse subject areas, and provides a free platform where anyone can build their own project. The non-profit platform was founded at the University of Oxford in 2007 with a single crowdsourcing project called Galaxy Zoo. We now have branches at the Adler Planetarium in Chicago and the University of Minnesota, Minneapolis.
Zooniverse has since partnered with academics and GLAM professionals around the world to produce projects in a range of STEM subjects (from astrophysics to ecology to climatology) and more recently in GLAM and humanities subjects (from papyrology to art history to renaissance English literature). Initiatives in the library sector include the Biodiversity Heritage Library’s move to classify illustrations from early periodicals and a joint initiative with the New York Public Library to transcribe historic mortgage and bond ledgers from immigrants to the city.
GLAMs have been engaging volunteers with their collections for well over a century, usually by inviting select individuals into an institution and training them to do work that cannot be done by staff due to time or money constraints. On site volunteers often build up valuable knowledge and skills and contribute a great deal to their chosen institutions, but training and supervising them also poses challenges. There is a limit to how many volunteers can be trained, supported on site, and indeed attracted and retained in the first place. Online volunteering enabled by crowdsourcing platforms, such as Zooniverse.org offer an alternative or complementary form of engagement that has many benefits. Online projects can reach a wider range of individuals—including those who are less able-bodied or unable to travel. Such projects require less training and time commitment from volunteers and typically attract a larger number of participants than on site programs.They also enable GLAMs to open up rare collections to the public without concern for their material safety and security.
How does Zooniverse work?
All bespoke Zooniverse projects, including those built on the free project builder have a few core components that differentiate them and the platform as a whole from other crowdsourcing projects and platforms. Each image, audio or video file (data point) is independently assessed by multiple individuals, whose responses are then aggregated using a variety of algorithms to determine what is in a given image. With relatively quick tasks, such as animal identification in Snapshot Serengeti, upwards of 70 people will see each image, whereas with more complex text transcription tasks, three to 10 people will do each line or page (depending on the project). Studies have found that non-expert classifiers are nearly as good as experts, with Zooniverse data used in well over 100 publications.[i] All projects have an object-oriented discussion forum called Talk. Here volunteers can ask questions, interact with researchers and fellow volunteers, create their own ‘collections’, and use hashtags to group together areas of interest. An example of the latter is #female from the Science Gossip project, which indicates mention of female proto-citizen scientists of the nineteenth century.
Another important feature is the ability to address project datasets where they are too large to be processed in house. The Zooniverse platform offers the chance to engage people with a large-scale dataset that might otherwise be neglected. In the GLAM context this offers the chance to profile collections that are unlikely ever to be transcribed or analysed in greater detail by existing staff.
Zooniverse in Action
The first Zooniverse humanities project was Ancient Lives, which invited volunteers to transcribe ancient papyri one letter at a time using a clickable keyboard on their screen. Volunteers only needed to character match and did not have to be fluent in ancient Greek. Real time translation of transcriptions gave volunteers the chance to understand the texts they were working on, and flag up discoveries of unknown works by ancient writers. A blog kept volunteers up to date with the research team’s work and explained how the crowdsourced data was being used. The team of 250,000+ volunteers made more than 1.5 million transcriptions between 2011 and 2014.
The next Zooniverse humanities project was Operation War Diary, launched in 2014 to commemorate the outbreak of the First World War. This partnership between the National Archives (UK), the Imperial War Museum, and Zooniverse invites users to tag and transcribe dates, times, places, and names found in British WWI field diaries. Volunteers are also asked to interpret the diaries by adding tags from a drop down list, such as coming under fire, recuperation, eating, sport, and so on. Historian Richard Grayson has used the data to penetrate more deeply than ever before into records of soldiers’ daily lives on the front.[ii]
All of the project metadata will eventually be integrated into the National Archive catalogues, although not as soon as hoped due to character length limitations in the free text field originally designated to hold the material. This limitation raises an important for any GLAM specialist seeking to harness crowdsourcing at their institution: does your content management system (CMS) support the storage of extra metadata and if not where can you make the results of a crowdsourcing project available? Github is a viable option, as is a bespoke website maintained by your institution, but be sure to investigate this before embarking on a crowdsourcing project.
In the case of project AnnoTate, a partnership between Tate archives and museums and Zooniverse, a new CMS is able to accommodate the large amounts of free text and extra metadata. Tate wanted to create a crowdsourcing project as a means of democratizing access to their collections of artists’ sketchbooks, letters, diaries and other papers: documents that are not typically accessible to non-specialists. As of November 2016 the project has attracted 1,323 registered volunteers and many unregistered visitors, who have retired 10,464 pages. Retirement in this case means that each line has been transcribed by at least three people whose transcriptions concur on 95% of the characters in the line with anaverage agreement of 97%.
In developing AnnoTate and Shakespeare’s World, my driving research question was: how could we simplify the rather difficult task of full text transcription? This was particularly the case for the early modern English manuscripts from the Folger Shakespeare Library (our Shakespeare’s World partner). I drew inspiration from Ancient Lives and Operation War Diary, as well as the less likely muse: Penguin Watch.
Ancient Lives taught me that it’s possible to make hard tasks accessible through the right interface design. What could have been a project targeted at specialists became instead, a project for anyone capable of pattern recognition. In Shakespeare’s World volunteers transcribe using their keyboard, but can also click buttons on their transcription box on the screen to capture frequently occurring brevigraphs (handwritten abbreviations) such as ‘ye’ for the, or wch for which, etc. This approach blends the Ancient Lives functionality with regular typing.
Operation War Diary is too complex and asks too much time of users. My advice to you is to keep your project workflow concise and compelling and focus on the most important aspect of the data for your institution or researchers? A successful crowdsourcing project does not attempt to gather all of the available information, but to gather a comprehensive and meaningful strand of information about a given collection. The free project builder means you can readily launch another project to tackle different aspects of the same dataset.
Penguin Watch, which just asks you to click on individual penguins, taught me the beauty (and addictiveness) of simplicity. I wanted to bring simplicity and brevity to transcription, and so both AnnoTate and Shakespeare’s World allow users to transcribe a single line or short phrase at a time, by clicking on either side of a line or phrase and transcribing it into a pop-up box. Although a page might be complex, long or hard to read, most people can spare a minute or less to transcribe a line. Grey dots appear around lines that are deemed complete by our algorithms, which redirect people’s efforts to unfinished lines. To date Shakespeare’s World has 2,479 registered volunteers as well as non-registered people who have submitted 87,433 lines of transcription and retired 2,819 pages.
The tools underlying projects such as Penguin Watch and Galaxy Zoo are already available on the free project builder, while full text and audio transcription tools are currently under development with the generous support of an Institute of Museums and Libraries grant. Text transcription will probably be available in 2017, followed by audio transcription tools in 2018 and 2019.
GLAM organizations keen to develop their own crowdsourcing projects should explore the available documentation on how to build a project, and best practices), which guide you through the design, launch and long term phases of a project. While building a project is easy and requires relatively little technical support from Zooniverse or your institution, make sure you have the time to support your volunteers. Zooniverse’s Talk, social media such as blogging, Twitter, Instagram, and indeed in person or on site events all provide important channels for engaging volunteers with your collections. Crowdsourcing projects, like puppies and kittens, are not just for Christmas.
[i] Alexandra Swanson et al, ‘Snapshot Serengeti, High-Frequency Annotated Camera Trap Images of 40 Mammalian Species in an African Savanna’, Scientific Data, 2 (2015) <https://doi.org/10.1038/sdata.... and others.
[ii] Richard Grayson, ‘A Life in the Trenches? The Use of Operation War Diary and Crowdsourcing Methods to Provide an Understanding of the British Army’s Day-to-Day Life on the Western Front’, British Journal for Military History, 2.2 (2016), 160–85.
Banner image credit: compilation of Heath Van Singel designed Zooniverse logos via Wikipedia, original resized and cropped (CC BY-SA 4.0).
Screenshots of Ancient Lives and Shakespeare’s World transcription keyboards provided by Dr Victoria Van Hyning.
Related knowledge and skills