Video Labeling Game Waisda?: Preliminary results and ongoing research

Ten months ago the entry “The Wisdom of the Crowds in the Audiovisual Archive Domain” was posted on this research blog. In it, the interest of the Images of the Future consortium in creating social and open archives was discussed. Due to the large scale digitisation of archival materials the opportunities of offering public access to these materials has increased dramatically. One of the ways in which archives can provide access is by creating opportunities for social tagging. This allows people to annotate archival materials with their own key terms (tags). This is not only beneficial for the original tagger, who can use their own tags to find these self-annotated materials more easily in the future, but also for other users that are searching through the user-annotated collection with similar search terms. The tags that are added by users might overlap with the metadata that is produced by experts, or generate new terms and consequently new ways of looking at and finding archival materials. This might bridge the semantic gap[1] between the vocabulary that annotation experts use and the ways in which the general public refers to and interprets (audio-visual) information.

The debate on whether tagging and other crowdsourcing possibilities will actually contribute to the accessibility of archives, or that it will just cause chaos and make finding materials more complicated and murky is still in full swing.[2] There have been some pilot projects on tagging and other crowdsourced metadata which generated some interesting and encouraging data. Notably, partners of the Flickr: The Commons project used this popular photo sharing website to make their collections more accessible, and for collecting annotations by the public. The Nationaal Archief (the Dutch national archive, and one of the Images for the Future partners) and Spaarnestad Photo were part of this project. The results were promising[3], but more hard data is needed to show in what way tagging can be beneficial to archives. Thus, the result of the Images for the Future consortium’s interest in user generated metadata was the development of tagging game through which moving images can be annotated by the public. Through this game, a dataset that will be gathered which can be used to answer some questions raised in this ongoing and topical debate.

Waisda? What’s that?

The online video labeling game Waisda? (which translates to What’s that?), is a project that is an initiative managed by the Netherlands Institute for Sound and Vision in close collaboration with the Dutch public broadcaster KRO (Catholic Radio Broadcasting). The game was developed by internet agency Q42. When the six month pilot project ends in December, the VU University Amsterdam will research various possibilities for implementing these user-generated  tags, and will develop new versions of the game with improved and extended game design and interface options.

Waisda? was launched in May and allows players to annotate Polygoon newsreel journals and KRO programmes such as Boer zoekt Vrouw (Farmer Wants a Wife), Spoorloos (Find my Family) and Memories. Recently the archive of Barend en Van Dorp, a popular Dutch talk show (broadcast from 1990 to 2006), was added to Waisda?

The basis of the game is simple. Players go to the Waisda? website and are presented with a selection of four different episodes of the programmes mentioned above. They can choose any of these programmes to start tagging. The programmes do not start from the beginning, but are played sequentially on the website. Therefore the players drop in at the point that video happens to be at. This means they never have to wait for a game to start, but can start tagging straight whenever they want. Players are asked to tag what they see and hear and receive points for a tag if it matches a tag that their opponent has typed in. The reasoning behind this is that a tag is probably valid if at least two people agree on it. This is the same assumption that is made in the case of the Games with a Purposethat were developed by Luis von Ahn, now professor of Computer Science at Carnegie Mellon University.[4] His ESP game demonstrator, in which two players add tags to a picture, was so successful that Google has licensed it under the name ‘Google Image Labeler’[5].

Right now the Netherlands Institute for Sound and Vision is working on a preliminary evaluation of Waisda? This involves both performing more research on how the game itself can be improved and analysing the crowdsourced tags that were added by the players in the last months. Since Waisda? was launched in May 2009, well over five hundred videos were tagged by a total of almost 2,300 unique players. There are 150 registered players, most of whom return frequently to play the game. So far over 14,000 unique tags have been added via Waisda? and this number still rises every day. In the end, the dataset generated by the people that play Waisda? will be used for in-depth research that will result in recommendations for the improvement of the accessibility of, and search functionalities for, audiovisual archives with crowdsourced metadata.

Tag analysis and research

The tags that were added so far were compared to the terms in the GTAA thesaurus the Netherlands Institute for Sound and Vision uses to classify the audiovisual materials in their archive, and almost 15 % of the tags provided a perfect match. This may not seem like a big number at first glance, but the GTAA contains only very specific terms like person names, genres and topics, and it was therefore not expected that many tags would match with this professional thesaurus. The tags were also compared to another database called Cornetto which contains the bulk of all official Dutch words, and another 45 % of the tags matched. There is some overlap between the Sound and Vision thesaurus and Cornetto, but still well over half the tags added via Waisda? are definitely usable based upon this first simple quantitative analysis.

This does not imply that the other half is not. There are, for instance, tags that contain spelling or typing errors but point to relevant tags. After analysing a representative and random sample of the tags a little under 10 % of them turned out to contain an error. It is expected that this percentage will eventually be lower, since there are players that enter their erroneous tag correctly after realising their mistake, in order to still receive points.

There are also tags that consist of more than one word and that are not recognised as correct terms in the Sound and Vision thesaurus or Cornetto. For example, the tag ‘illegitimate children’ does not appear in the GTAA thesaurus or the Cornettovocabulary, but the individual words do appear in Cornetto. Thus, by separating the tags that consist of multiple terms and that do not match either thesaurus they can still prove to be very useful.

Another area that requires additional research are tags that appear in multiple categories of the thesaurus and Cornetto. The tag ‘link’ means ‘dangerous’ or ‘connection’ in Dutch, among other things, and the term is therefore ambiguous. To find out which meaning the tagger intended, one solution would be to analyse the tags that were added to that video in proximity to ‘link’. If ‘scary’ and ‘exciting’ were added besides ‘link’ it is possible to semantically determine that in this case the meaning ‘dangerous’ is the most plausible. Ideally, semantic software can be used to make these determinations automatically.

These and other topics are analysed in a follow-up research project that is executed in close collaboration with the VU University Amsterdam. This project is part of the European PrestoPRIME programme, in which various partners are collaborating to “research and develop practical solutions for the long-term preservation of digital media objects, programmes and collections.”[6] The university’s research will take three years, and will result in in-depth advice on how to process and implement user generated content such as the Waisda? tags, as well as new implementations for game and interface design, which will be discussed later on in this article.

Ideal case study

Thus, it will take a while before the definitive analysis and results of the tags are published. Therefore, Sound and Vision is also working on a smaller project which will provide information on the usefulness of the tags that have been added so far. For this project, the collection of Barend en Van Dorp (mentioned above) has been chosen as a case study, because this collection so far contains only a very basic metadata descriptions. The aim is to use the tags that are added by the Waisda? players directly on the website that contains the Barend en Van Dorp archive, which will improve the navigation and search options significantly. This will enhance the current metadata. Another implementation that will be made possible is to use the tags to navigate within a video. When someone adds a tag to a programme when playing Waisda?, this tag is automatically connected to that specific moment in the programme. For example, when there are several musical performances in a programme, and players add the tag ‘guitar’ at multiple moments throughout the video, it will be possible to click on that specific tag and to see at what points in the video they occur. Consequently, the tags can be used to navigate through a video and to jump to the moments that a tag was added. The first tests look very promising.

Motivations for tagging / motivating tagging

Part of the current research at the Netherlands Institute for Sound and Vision is how to further improve the game itself and its appeal to the general audience. In order to gather as many high-quality tags it is vital that players are motivated to play the game often, and to play it seriously. According to standard motivation theories, there are two basic reasons which motivate people. These have been described at length by Edward Deci and Richard Ryan, professors in the Department of Clinical and Social Sciences in Psychology at the University of Rochester. They make a distinction between intrinsic and extrinsic motivations, where the former “refers to doing something because it is inherently interesting” and the latter “refers to doing something because it leads to a separable outcome.”[7]

In the case of a game such as Waisda? both intrinsic and extrinsic factors can play a role in player’s motivation. People can take part in the game because it is fun and interesting (intrinsic), but also because adding tags to the programmes helps Sound and Vision to improve the accessibility of their collections (extrinsic). These motivations can occur simultaneously, even though some people might be more inclined to play for fun and others to participate in a collaborative project. Thus, both types of motivations are important to keep in mind when trying to get people to play the game.

More specifically, Lex van Velsen, PhD student at the Department of Technical and Professional Communication of the University of Twente, and Mark Melenhorst, researcher at technological institute Novay, have investigated the motivations of people that tag moving image materials. They distinguish three different main categories:

1. Motivations related to indexing
2. Motivations related to socializing
3. Motivations related to communicating.[8]

They conducted interviews with various internet user groups about tagging moving image materials, and their conclusion was that the main motivation for tagging is to make it easier for others to find materials, and motivations related to indexing in general.[9] This means that extrinsic motivations are the most important when it comes to tagging moving image materials, and that altruism plays an important role. Therefore, Waisda? does not just focus on motivating people that like to play online games, but also aims to engage people that are interested in web projects related to crowdsourcing and social tagging and in giving archives a helping hand in improving their collections.

Van Velsen and Melenhorst’s findings are corroborated by the research done for the steve.museum project. This is a collaborative project of a consortium of museums based on the premise that social tagging can improve the accessibility of museum collections. During this project, a test was conducted by the Metropolitan Museum of Art in which people were specifically invited to help tag objects, as opposed to the general steve website where people had to go by themselves to start tagging. The people that were approached by the Metropolitan Museum of Art added four times more tags to the objects than the taggers that went to the steve.museum website to tag on their own accord.[10]This implies that people who are targeted by a museum with the request to help them out to improve access to their collections through tagging, are much more motivated. In a steve.museum survey taken by a group of active taggers, the main motivations to tag were “to help museums document art work”, “for fun” and “to improve search for others”[11]. This supports Van Velsen en Melenhorst’s statement that extrinsic indexing motivations are the most important reason for people to tag. However, since the intrinsic motivation of fun is also important according to the steveresearch, it is vital to optimise the game design and interface of Waisda?.

Game design and interface

It is often said that the basis of a good game is that it has to be fun. As game designer David Ethan Kennerly says: “When discussing the art of game design, fun is the yardstick.”[12] Since fun is also part of people’s motivations to tag, the optimal strategy for a ‘serious game’ such as Waisda? is the convergence of a feeling of enjoyment and of helping the archive improve the accessibility of its collections. However, fun is a multi-faceted and complicated term. What is fun for one person is not necessarily compelling to another. Especially in the case of a ’serious game’, that “aim[s] to be both fun and playable [...] but at the same time be useful for a non-entertainment purpose.”[13] Combining both the traditional ‘fun’ elements of a game with persuading people to help out the archive by playing it posed interesting challenges to the game design process.

Game design is described on Wikipedia as “the process of designing the content and rules of a game.”[14] Thomas W. Malone, professor at MIT, wrote a paper in 1982 in which he outlines the most important game design and interface principles that motivate players. Even though the text is over 25 years old, the various parameters for successful game design that Malone gives[15] still hold true today and are still often quoted and used.[16] One example is Luis von Ahn, who as was mentioned earlier developed various ‘Games with a Purpose’. In one of his papers on developing these games, he also cites Malone’s game-design principles as vital elements in creating enjoyable games.[17] The most important of Malone’s motivational elements for a game such as Waisda? will be elaborated on below.

Goal

Malone states that every good game needs a clear goal, but that players need to have the room to create their own within the game as well. The main goal of gamers playing Waisda? is to gather as many points as possible. As noted earlier, points are given when a player enters a tag that is also submitted by another player at the same point in the video, within a ten second window. However, there are additional ways in which a player can gather points, such as adding new and original tags or trying to get a successful sequence of matching tags. Thus, a player can choose from various options in order to reach the goal of gathering as many points as possible.

Scoring-keeping and performance feedback

The closer within the ten second window the matching tags are submitted, the more points players receive. These and other scoring mechanisms (which Malone calls performance feedback), are incorporated in the game design so that players “know how well they are achieving their goals”[18]. This challenges the player to do as well as possible and consequently, the quality of the tags is relatively high, since a player only receives points for adding a tag that matches that of another player. Players get performance feedback during the game through various uses of the interface. When a player receives points, this is shown by giving a tag a green colour. Also, when a player matches multiple tags in succession, a multiplier comes into effect, which is shown by putting exclamation points next to the tags. Waisda? also provides high score lists, a game design element that is not mentioned by Malone, but that Von Ahn rightfully points out as being important in motivating players.[19] There are various high score lists in Waisda? based on different parameters, such as the top 5 high-scoring players of the day, and a top 5 of players that have added the most tags. Finally, users can see their personal statistics on their profile page. Right now, only a basic overview of statistics is offered. This could be expanded, for instance by showing how many minutes of video a player has tagged so far, which programmes the player contributed to and comparing the score of a player to those of others, or to the score of friends that are also registered.

Audio and visual effects

This game design feature is related to performance feedback. Malone suggests using audio and visuals to indicate a correct or incorrect entry. As was mentioned above, a player gets visual feedback when there is a match with another tag entry. No audio effects are used to support these visual effects of the interface, since players are already watching a programme. The VU Amsterdam University will further experiment with incorporating icons in the interface that indicate how a player is doing, or what the status of a player is.

Timed response

Right now players can not choose to start at a specific point in the game, because the programmes used for the game are being streamed in a constant, sequential manner. Since many programmes are over thirty minutes long, and people usually do not play that long, the players themselves have to decide when to stop the game. Ideally, we would like to work with clearly separated items or logical breaks within programmes so that it will become more clear for the player when a game will end. Malone calls this the importance of timed responses. During the development of the game it was decided to possibly implement this element at a later stage, since segmenting the various programmes is complicated and time-consuming.

Levels

Another game design element is a variable difficulty level. Right now, players cannot get more complicated tagging challenges. To improve the game design it would be helpful to incorporate this. Also, the possibility of climbing up to ever higher and / or difficult levels based on their point total would motivate players more. One way of increasing difficulty is by introducing taboo tags. For instance, if the tag ‘train’ has already been added a lot to a (part of) a video, this could appear in the interface as a taboo word that the player cannot use anymore. This will stimulate the player to be more creative and will consequently result in a more complex and original range of tags.

From the above discussion of game design and interface elements it has become clear that they are intertwined, and that there are many options to choose from. To determine which elements work the best in stimulating users to add more and better tags, further experiments will be done. In these experiments, new game design and interface elements will be added one by one in isolation, in order to see which ones players respond to the most. This will be done with various scoring mechanisms, performance feedback, adding different levels and other functionalities such as taboo words. Another interface possibility is to provide players with auto-completion options. For instance, when a famous person appears in a programme and a player wants to type in the name, the system can detect this and suggest a list of names that the player can then choose from. This helps the player to write the name correctly, and can increase the pace of the game. Research done on the data of the steve.museum project has shown that the type of interface can influence tagging behaviour,[20] thus further investigating interface options is vital.

Developments in the (near) future

Thus far, there has been a preliminary study of the tags, which showed that players need clearer instructions that will help optimise their game play. Many tags consist of more than one word, which often do not contribute to the player’s score, since it is not likely that another player will add these exact terms as well. Even though these tags are still useful for the research, it is vital that players feel motivated to play the game and thus get the best rewards for their effort. These and other tips and information will be published in the near future in the form of renewed information pages.

Another element that will be implemented soon is the use of Twitter and e-mail notifications to motivate players, and to attract new ones. If someone is playing Waisda? alone, the system will automatically detect this and send out a Tweet with the request to join in. Also, players will be able to invite friends to the game more easily via Twitter or e-mail.

The VU University Amsterdam will continue to work on Waisda? the coming years. They will investigate how to best implement the tags in the future, partly based on the semantic links between the tags and the synsets (collections of synonyms) in Cornetto. Also, they will keep working on the improvement of the game design and interface of Waisda?, as well as researching ways to further improve player motivations to tag.

First of all though, in December, the first evaluation of Waisda? will be completed in the form of a report. In it, the tags, player motivation, and game design and interface aspects mentioned above will analysed more in-depth. So far, more than enough tags have been added to Waisda? to make informed conclusions about this unique pilot project, and to give recommendations for the future development of the game. However, more data means more knowledge, so if you would like to help us, you are most welcome to come and play Waisda? and experience this project for yourself.


[1] The term ‘Semantic gap’ does not only refer to the discrepancy between the terms and systems that professional information specialists use to add metadatato materials and the terms employed by the general public to find these same materials. Another meaning of the term is the ‘gap’ between the automatic annotations and interpretations by computers and (semantically more complex) user queries. For more information, read Smeulders et. al (2000): p. 1349-1380, or for a more recent take on the issue Jörgensen (2007).

[2] There is a very nice debate panel that took place at the 2007 Supernova conference between crowdsourcing proponent David Weinberger and Web 2.0 antagonist Andrew Keen, a video of which you can find here: http://supernovahub.com/2007/07/video-andrew-keen-and-david-weinberger/

[3] Moortgat (2009).

[4]Von Ahn and Laura Dabbish (2008): p. 62.

[5] Robertson et al (2009): p. 3937.

[6] http://www.prestoprime.eu/project/index.en.html. Accessed 12 October 2009.

[7] Ryan and Deci (2000): p. 55.

[8] Van Velsen en Melenhorst (2009): p. 224.

[9] Van Velsen en Melenhorst (2009): p. 229.

[10] Trant (2009): p. 94.

[11] Leason (2009): http://www.archimuse.com/mw2009/papers/leason/leason.html.

[12] Kennerly (2003): http://finegamedesign.com/fun_is_fine.html.

[13] Frank (2007): p. 1.

[14] Wikipedia, “Game Design” (2009): http://en.wikipedia.org/wiki/Game_Design

[15] Malone (1982): p. 65.

[16] For instance, when looking up the original article in Google Scholar, you see that it has been quoted 230 times. http://scholar.google.nl/scholar?cites=3349127790165661160&hl=nl.

[17] Von Ahn and Dabbish (2008): p. 63.

[18] Malone (1982): p. 65.

[19] Von Ahn and Dabbish (2008): p. 63-64.

[20] Trant (2009): p. 90.

 

Literature

Ahn, Luis von and Laura Dabbish. “Designing games with a purpose.” Communications of the ACM, vol. 51, no. 8 (August 2008): p. 58–67. http://www.cs.cmu.edu/~biglou/GWAP_CACM.pdf. Accessed 12 October 2009.

Frank, Anders. “Balancing Three Different Foci in the Design of Serious Games: Engagement, Training Objective and Context.” In Conference Proceedings DiGRA 2007. Tokyo: DiGRA, 2007. http://www.digra.org/dl/db/07312.29037.pdf. Accessed 22 September 2009.

Jörgensen, Corinne. “Image access, the semantic gap, and social tagging as a paradigm shift”. In Proceedings 18th Workshop of the American Society for Information Science and Technology Special Interest Group in Classification Research. Milwaukee, Wisconsin, 20 October 2007. http://dlist.sir.arizona.edu/2064/. Accessed 17 October 2009.

Keen, Andrew and David Weinberger. “Debate Panel.” Lecture and debate, Supernova 2007 Conference, San Francisco, 22 June 2007. Full video on: http://supernovahub.com/2007/07/video-andrew-keen-and-david-weinberger/. Accessed 14 October 2009.

Kennerly, David Ethan. “Fun is Fine. Toward a Philosophy of Game Design.” David Ethan Kennerly. http://finegamedesign.com/fun_is_fine.html (published 22 June 2003). Accessed 2 October 2009.

Leason, Tiffany. “Steve: The Art Museum Social Tagging Project: A Report on the Tag Contributor Experience.” Conference paper, Museums and the Web 2009 Conference, Indianapolis, Indiana, USA, April 15-18, 2009. http://www.archimuse.com/mw2009/papers/leason/leason.html#. Accessed 7 October 2009.

Malone, T.M. Heuristics for designing enjoyable user interfaces: Lessons from computer games. In Proceedings of the Conference on Human Factors in Computing Systems(Gaithersburg, MD, 15-17 March 1982). New York: ACM Press, 1982, p. 63–68.

Moortgat, Judith. “Taking Pictures to the Public. Evaluatieverslag Nationaal Archief & Spaarnestaf Photo op Flickr The Commons.” Nationaal Archief, 8 juni 2009, versie 1.0. http://www.nationaalarchief.nl/images/3_16370.pdf. Accessed 15 September 2009.

Robertson, S., M. Vojnovic, and I. Weber. “Rethinking the ESP Game.” In CHI EA ’09: Proceedings of the 27th international conference extended abstracts on Human factors in computing systems. New York: ACM, p. 3937–3942.

Ryan, R. M. and E. L. Deci. “Intrinsic and extrinsic motivations: Classic definitions and new directions.” Contemporary Educational Psychology, vol. 25, issue 1 (January 2000): p. 54–67. http://www.psych.rochester.edu/SDT/documents/2000_RyanDeci_IntExtDefs.pdf. Accessed 12 October 2009.

Smeulders, A., M. Worring, S. Santini, A. Gupta and R. Jain. “Content Based Image Retrieval at the End of the Early Years.” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no 12 (2000): p. 1349-1380. http://www.science.uva.nl/research/publications/2000/SmeuldersTPAMI2000. Accessed 15 October 2009.

Trant, Jennifer. “Tagging, Folksonomy and Art Museums: Results of steve.museum’s research.” Available at conference.archimuse.com (versie 3 februari 2009): http://conference.archimuse.com/files/trantSteveResearchReport2008.pdf. Accessed 10 October 2009.

Velsen, Lex van en Mark Melenhorst. “Incorporating user motivations to design for video tagging.” Interacting with Computers, vol.21, no.3 (July 2009): p.221-232.

Leave a Reply