Roeland Ordelman

Skip to content
  • Profile
  • Projects

AV Collections in a Research Infrastructure: Three Caveats

To foster scholarly research using large data collections in the art and humanities, the CLARIAH project is developing a research infrastructure that aims to streamline access to large audiovisual collections and related context collections, available at different locations in The Netherlands. Also, it provides scholars with robust and sustainable tools to work with these collections. Gateway to the data and tools in the infrastructure is the Media Suite, a portal that helps scholars to explore, select, analyze and annotate data collections.  Many practical issues arise in the process of making data collections from various institutions available within the infrastructure in a way that effectively supports scholarly use. The identification of such issues and developing strategies to address these is pivotal to the success of a research infrastructure.

Research Pilots

To test the emerging infrastructure, ‘Research Pilots’ were awarded by CLARIAH, six of them focussing on the audiovisual domain. Scholars defined a research question and suggested data collections and tools that they need to address the research question in the Media Suite. Recently, we organized a workshop with scholars, content-owners, and CLARIAH developers, to discuss the details of the data requirements of scholars and to investigate the alignment of these with the status of the CLARIAH infrastructure. The workshop improved our mutual understanding of large, institutional data collections in a research infrastructure but also made clear that there are barriers to overcome to serve the needs of scholars with respect to collection access. We identified three caveats with respect to effectively using these collections in practice.

Assumptions

The first one is that scholars make assumptions about the data collections that may not always be valid. As explained by NISV’s expert in media history Bas Agterberg, the process of audiovisual archiving through the years has been influenced by many practical issues, ranging from the take-up of collections assembled for other purposes than archiving, mergers with other institutes, to institutional data selection policies that changed over time for various reasons. So, when a scholar would be interested in a specific type of programming in a specific time-period, it is important to understand that there may be gaps in the archive that could for instance influence representativeness off the data for research. From a research infrastructure perspective, the lesson learned is that we should put an effort in documenting data collections, for example by providing pointers to the existing documentation available with collection owners.

Metadata archaeology

The second issue with collections is that it is often far from obvious how to trace specific programs or genres in the metadata. For scholars, a question like “give me all autobiographical documentaries between 1965 and 1975” makes perfect sense. However, it may require some ‘metadata archaeology’ to discover which metadata fields to query and how to query them, to be able to select the desired items from a collection. As is the case with the collections themselves, also the metadata have a history with respect to its origin, metadata models and protocols for filling the fields. The Media Suite provides a “Collection inspector” that could be helpful in providing statistics on the completion of individual metadata fields in a collection and distribution over the years. However, the ‘raw’ field names may not always make sense for scholars without background knowledge on the metadata model of a specific collection. To improve its usefulness for scholars, the metadata fields in the Collection Inspector may need to be mapped to a comprehensible format.  A minimum requirement is that for each of the collections in the infrastructure we can provide documentation on its metadata model so that the rationale behind the naming of fields can be tracked down.

Search granularity

The third issue with respect to the usability of data collections in the infrastructure is the availability of transcripts such as subtitles or manually or automatically generated speech transcripts, that can be used for searching relevant clips in large amounts of data. However, such transcripts are typically sparse. For instance, for the broadcast data in the NISV collections, synchronized subtitles are only available from 2006 onwards. To improve search granularity for collections without subtitles, CLARIAH is setting up an automatic speech recognition service that is embedded in the infrastructure, capable of processing very large data collections. One of the models for use is that when scholars require speech transcripts for specific collections or date ranges, this service can be called upon on request.

Integration

The Media Suite development team is working on (strategies for) the integration of multimedia data collections from DANS (oral history), EYE (film), KB (newspapers for comparative search) and Beeld en Geluid (program guides), in close collaboration with the content owners. The goal is to enable scholars to analyze these data collections in the Media Suite, access the source data (e.g., view content) via available platforms from content owners (e.g., Delpher), and when necessary, address issues on data archaeology and granularity as discussed above.

 

 

 

 

Share this:

  • Click to share on Twitter (Opens in new window)
  • Click to share on Facebook (Opens in new window)
  • Click to print (Opens in new window)
  • Click to email this to a friend (Opens in new window)

Like this:

Like Loading...

Related

Leave a Reply Cancel reply

Fill in your details below or click an icon to log in:

Gravatar
WordPress.com Logo

You are commenting using your WordPress.com account. ( Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. ( Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. ( Log Out /  Change )

Cancel

Connecting to %s

Jun 30, 2017
Roeland Ordelman
Blogs
CLARIAH, collections, infrastructure, media suite, metadata

Post navigation

Special session on video hyperlinking: what to link and how to do that?
Using open content for a music video

ORCID:

orcid.org/0000-0001-9229-0006

Related

  • Nederlands Institute voor Beeld en Geluid
  • Human Media Interaction (UT)
  • Video Hyperlinking
  • Beeld en Geluid Labs
  • Research Gate
  • Open Nederlandse Spraakherkenning
  • CLARIAH Media Suite

Recent Posts

  • Zijn robots te vertrouwen met kinderen? Onderzoek van start op UT
  • Preparing for ICT with Industry 2020 at Beeld en Geluid
  • Unlocking Archives for Scholarly Research
  • Challenges in Enabling Mixed Media Scholarly Research with Multi Media Data in a Sustainable Infrastructure
  • AV in the spotlight at DH2018

Blog Stats

  • 4,619 hits

Twitter

My Tweets

Links

  • MediaEval Benchmark Evaluation
  • Research topics at NISV

Archives

  • Feb 2020
  • Jan 2020
  • Oct 2018
  • Jun 2018
  • Jun 2017
  • May 2017
  • Jan 2017
  • Aug 2016
  • Feb 2016
  • Dec 2015
  • Nov 2015
  • Jul 2015
  • Mar 2013
  • Feb 2013
  • Jan 2013
Create a free website or blog at WordPress.com.
Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy
  • Follow Following
    • Roeland Ordelman
    • Join 195 other followers
    • Already have a WordPress.com account? Log in now.
    • Roeland Ordelman
    • Customise
    • Follow Following
    • Sign up
    • Log in
    • Copy shortlink
    • Report this content
    • View post in Reader
    • Manage subscriptions
    • Collapse this bar
loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.
%d bloggers like this: