Waisda? Video Labeling Game: Evaluation Report

The Waisda? (which translates to What’s that?) video labeling game was launched in May 2009. It invites users to tag what they see and hear and receive points for a tag if it matches a tag that their opponent has entered. Waisda? is the world’s first operational video labelling game. The underlying assumption is that tags are most probably valid if there’s mutual agreement. Over 2,000 people played the project and within six months, over 340k tags have been added to over 600 items from the archive. Initial findings have been published earlier, when the pilot period was still running. This evaluation report (PDF download, in Dutch), includes a quantitative and qualitative analysis of the tags, as well as a usability study of the game environment and a study into the incentives that apply to people playing the game. The evaluation report is written by Lotte Belice Baltussen, in collaboration with Maarten Brinkerink and Johan Oomen of the Netherlands Institute for Sound and Vision R&D Department. Researchers at the VU University Amsterdam, Business Web & Media Section, also provided crucial input. The VU University Amsterdam carries out this research in light of their involvement in the PrestoPRIME European research project.

The evaluation report provides evidence that crowdsourcing video annotation in a serious, social game setting can indeed enhance retrieval of video in archives. It features success factors organizations need to take into account in setting up services that aim to actively engage their audiences online. The main conclusions are listed below:


Waisda? managed to attract a large audience. Since its launch on the 19th of May 2009 Waisda? was consulted by 9.198 unique visitors and gathered 340.551 tags describing 604 items, added by a total of 2.296 players. The website was consulted 12.297 times. 3,61 pages were visited on average. Only 38% of the visitors didn’t look further than the home page. Average game sessions lasted for 6 minutes and 45 seconds. On the 3th of November 2009) 42.068 unique tags have been added. The total amount of tags added by players is 340.551, of which 40,3% (137.421 tags) consists of matching tags (tags added by two more players within a time frame of 10 seconds).

The majority of players (1.051, or 45,8%) added between one to ten tags. A smaller number of players (810, 35,3%) added between ten and a hundred tags, and less then half of that number (372, 16,2%) added between a hundred and a thousand tags. Only a few players added more than a thousand tags (63, 2,7%), but together were responsible for adding the largest number of al contributed tags. The longest session lasted about three hours, in which one player added 3.329 tags. This indicates that a project like Waisda? shouldn’t only aim for a wide audience, but should also find a way to specifically target these ‘super taggers’.

More than 70% of the traffic on the website was generated through referrals by external websites. The three main referring websites also resulted in the lowest bounce rate, suggesting that visitors that arrive at the website through an external link are more specifically interested in the content and the project than direct visitors. Also, increases in the number of registered players are strongly related to promotional activity. Lastly there is a clear relation between the most popular and heavily tagged content, and the efforts by the Dutch public broadcaster – and project partner – KRO to promote playing Waisda? with this content through their – very popular – programme website. These results show the importance of extensive external promotion of a project like this, aimed at relevant target groups. This implies that Waisda? should target these audiences through existing and popular channels related to the content available within the game, or communities interested in tagging and innovative projects within the cultural heritage sector. For a continuation of the project, and to have video labelling as a standards service, it is important to actively collaborate with websites from broadcasters delivering content for Waisda?. For example by posting an article to notify potential players, an explicit call to action or a contest.

Earlier research (see “Literature” below) has shown that altruism is an important motivation for playing Waisda?. Therefore the ‘about’ section of the website should emphasize the benefits of player activity to the (public) accessibility of the content and further research on tagging. The current research on tagging shows that taggers that are explicitly invited to help an institution by tagging, are notably more active. To further promote Waisda? a strategy that targets these altruistic players should be developed. Besides that, players should be given a sense of the impact of their activity by experimenting with ways to demonstrate the usefulness of the tags for searching through the content.

Apart from altruism, the evaluation showed that the video content itself has also proven to be a motivational factor for players to play the game. The most popular channel on Waisda? contains a popular Dutch reality show with a weekly viewing audience of millions. To attract a broad constituency of users, it is important to expand the diversity of the content available on Waisda?, and experiment with different types of content. Research has shown a particular interest from users in popular talk shows reflecting on recent events, programmes aimed at children and historical footage. Also, it is important to keep the content fresh. For example, at the moment there are already 29 items that contain over 2.000 tags.

Although Waisda? can be played in solitude (against so-called bots), user research has shown that the vast majority of players prefer playing against others. This shows the importance of a substantial and active community of players. Next to the abovementioned promotion on external websites, organizing a contest on Waisda? has shown how handing out prizes can motivate players. It is certainly worthwhile to further experiment with this. Apart from that, social media can also play an important role to position Waisda? as a serious social game, i.e. linking to existing social networks such as Facebook, Twitter and Hyves. The integration with Twitter is already carried out within the pilot.

Game and Interface Design

Studying the latest literature on social tagging has shown that most of the decisions that were made during the design of Waisda? correspond with recommendations from renowned experts and results from related studies. However, there are also some suggestions for improvements. User research has shown that people experience the website as being clear and synoptic. The majority is also positive about the layout of the website. Still, a few things need more explicit clarification. For instance a small portion of visitors do not get the goal of the website instantly. To further reduce the bounce rate of the game and motivate players to actually play Waisda?, the homepage needs to be adjusted to better communicate the goal of the game in a single glance.

Most taggers are fairly inexperienced with the temporal aspect of the moving images tagged within Waisda?. Players should be made aware of the fact they can add multiple tags over the course of the whole video. It is also important to point players to the fact that they can still add a tag after the subject has disappeared from the image because they weren’t done typing (the system keeps track of this ‘delay’ during the entry of a tag). It turned out some players felt the scoring mechanism to lack transparency. This should be explained more clearly, for example by providing a short instruction video or an example game session. The textual pages on the website do answer most questions players might initially have about Waisda?, but the navigation structure could be made more straightforward.

User research (carried out through an extensive questionnaire and the execution of focus groups) has also shown that the meaning of the high scores on the homepage is not clear. This should be clarified, for example by renaming the high scores. Players would also like to see their actual score next to their ranking within the high scores. Usability testing has shown that the position of the tag input field needs emphasizing, to prevent players from overlooking it. Finally, the user research show players appreciate the overlay tips that appear while playing the game, although they sometimes overlook them and like them to be emphasised more.

Literature studies, user research and practical experience with Waisda? have shown that both intrinsic and extrinsic factors play a role in the motivation of players. The recent literature also supports the initial concept of the Waisda? project, that assumes that a game setting is a good way to motivate people to tag (audiovisual) archive material. This shows it is important to make sure that the game design also motivates players that are not particularly interested in tagging per se, or feel that in general tagging is too much of an effort. Besides that, it is crucial to provide a good game design and game play, so the altruistic players also enjoy Waisda?. Ideally, the intrinsic and extrinsic factors come together in the game and interface design.

User research provided concrete recommendations to improve the game design. The statistics on the personal profiles of the players were appreciated, and players would like to see additions. For instance, users pointed out it would motivate them to see the high scores of players directly above and beneath them, the material they contributed tags to and more detailed statistics on how their scoring history. Apart from this, players also voiced a need for feedback on the activity of their opponents during a game. Also, they would like to see different levels of difficulty introduced to the game. When playing in solitude, players should be able to still receive points in retrospect when another player matches their tags in a later session. Furthermore, players constantly need to be encouraged to keep adding tags, and not only focus on watching the video. Because a majority of the players mentioned they want to work towards a goal, one last thing to think about is for example to introduce time limits. A next version of the game should consider all the abovementioned improvements.

Tag Analysis and Further Research

Analyses of the most recent database dump of tags shows that 5,8% of the tags match with the terms in the GTAA thesaurus the Netherlands Institute for Sound and Vision uses to classify their collection. Apart from this, 23,6% corresponds with Cornetto, a linguistic database that contains the bulk of all official Dutch words. Since only a small number of tags is present in both databases (1.135, or 2,7%), it can already be assumed that almost a third of the tags is an existing and correct word.

A professional senior cataloguer (employee of the Netherlands Institute for Sound and Vision) has judged the tags added to two episodes on their usefulness. The selected episodes were the best-tagged episode (from the popular Dutch reality show mentioned before, with 19.322 tags added to it) and an episode that was tagged with an averaged number of tags (a documentary series about a former news correspondent situated in the U.S. returning to the Netherlands with 738 tags).

  • Looking at the best-tagged episode 45% of the tags were deemed useful with 27,45% having a low and 11,76% having a high accuracy.
  • The averagely tagged episode contained 72,69% tags deemed useful with 26,39% having a low and 19,44% having a high accuracy.

The senior cataloguer noted that in general the useful tags describe the material in a different way than keywords that catalogues add do. Firstly because the tags focus on describing what is seen and heard within a programme, while the professional metadata for audiovisual content focuses on the subjects that a programme refers to. Apart from that, the tags also describe instances from a programme, instead of a logical segmented part and or entire episode. The fact that the crowdsourced tags for the audiovisual material differ from professional metadata is no surprise, and possibly even an indication that the tags contribute to bridging the semantic gap (the gap between professional descriptions, based on a closed vocabulary, and the free search terms used by potential audiences to find the material). However, further research on the usefulness of the tags is needed, for example by conducting search experiments with different types of end users.

To describe the episode as a whole, only two tags from the top 20 most added tags of the averagely tagged episode proved to be useful. For the best-tagged episode none of the top 20 tags were deemed useful to describe the complete episode. Tags added to the documentary series episode were notably more often useful than tags added to the reality show. They were more defining and specific. The reality show contained more general tags and lacked specificity. These findings contradict the assumption that the more a tag is added to an episode, the higher the usefulness of this tag is to the audiovisual archive. The content seems to influence the specificity of the tags that are entered. It is also striking that in the case of the reality show more tags correspond with the GTAA or Cornetto database, but still the tags added to the documentary series episode were deemed more useful by the professional senior cataloguer. This suggests that when a programme contains a multitude of specific items or topics, this might result in more specific and useful tags. The way content seems to influence the way it is tagged demands further research. Since the metadata for audiovisual collections mostly only describe collections on an item level, time-based metadata like tags can result in an important progress in servicing media professionals looking for specific fragments. It is therefore important to further develop this research to discover how and to what degree the tags can be used within the professional metadata in the catalogue of the Netherlands Institute for Sound and Vision.

Within the PrestoPRIME European research project the Web & Media Section of the VU University Amsterdam will investigate how additional technology (such as the use of vocabularies and auto-completion) can help to increase the quality of the contributions. Their research will focus on topics like ambiguity, synonyms, narrower and broader terms, typos, tag overflow, profanity, and will also develop more advanced methods to analyse the (usefulness of) the tags. Also, more research will be carried out to examine the possible implications of the tags in relation to the professional metadata in the catalogue of the Netherlands Institute for Sound and Vision. The VU will also look at possible improvements of the Waisda? Game play and interface design. The Netherlands Institute for Sound and Vision also participates within PrestoPRIME and will hence be closely involved. Currently, the future of Waisda? is examined. Later this year, an improved version (taking on board the results of the evaluation as well as additional functionalities proposed by the VU) will be launched.

For more information on Waisda?, please contact Maarten Brinkerink.


