September 26, 2023

The significance of metadata

min read

Metadata, defined as “data about data,” plays a crucial role in scholarly research by helping to describe resources in a structured manner. In the context of journal articles, metadata encompasses unique, persistent identifiers (PIDs), such as the International Standard Serial Number (ISSN), the Digital Object Identifier (DOI), the Open Researcher and Contributor ID (ORCID), as well as journal information such as volume, issue, and page numbers. Similarly, book metadata encompasses elements such as the International Standard Book Number (ISBN), title, page numbers, and author information.

When metadata is standardized and machine-readable, it presents numerous, significant benefits. It helps acknowledge research funders appropriately, facilitates searchability and discoverability of the content, enables reproducibility, and more. In essence, metadata serves as the linchpin in the seamless exchange, discovery, and recognition of scholarly contributions.

However, the potential of metadata to yield positive impact hinges on its accuracy, consistency, and robustness. Currently, the scholarly publishing industry grapples with challenges stemming from a lack of synchronization across publication phases, leading to the loss of crucial metadata. Additionally, the management of metadata requires a substantial amount of manual effort. This deficiency has far-reaching consequences, negatively impacting discoverability and interoperability with other systems. 

The case for rich metadata is evident. However, integrating it seamlessly into existing processes in the scholarly publishing landscape remains a significant challenge.

The Publisherspeak 2023 breakout session for this theme was chaired by David Haber (American Society for Microbiology). Contributors to this breakout session include Jerry Orvedahl (Sage Publishing), Rob O'Donnell (Rockefeller University Press), Natalie Ngo (American Society of Nephrology), Audra Cox (American Physiological Society), Andrew Harmon (Endocrine Society), Paul Gee (The JAMA Network), Margaret Donnelly (Wiley), James Brandt (Ingenta), Tim Marney (American Psychiatric Association), and Chris Reid (American Association for the Advancement of Science).

Challenge identified

At Publisherspeak 2023, the group handling this theme identified 3 core challenges pertaining to metadata infrastructure:

  1. Getting different systems to use data the same way
  2. Incentivising/Enabling authors to provide the correct metadata up front as they are the source of truth
  3. How to message and explain the value added by early metadata capture to all parties interacting with the publishing workflow (this includes funding agencies, authors, research institutions, libraries, aggregators, and more)?

Out of these 3 challenges, “Incentivising/Enabling authors to provide the correct metadata up front as they are the source of truth” was selected as the top challenge to focus on, and the group built their Solution Canvas to address this challenge.

Strategic solutions to enable and ensure accurate metadata

Ensuring accurate and comprehensive metadata from the outset is paramount as authors serve as the primary source of truth in scholarly research. Recognizing this, the group at Publisherspeak 2023 collaboratively brainstormed strategies to enable the collection of precise metadata directly from authors. 

The group put forth a strategy that involves mandating authors to provide metadata, supported by a centralized app featuring an author-centric metadata collection hub managed by institutions. Additionally, the group suggested a research project-centric data collection hub, aiming to streamline and simplify the author submission process.

These strategies not only enhance the submission experience for authors but also benefit publishers by ensuring clean, easily manageable data. The result is more discoverable content driven by accurate metadata, ultimately leading to fewer post-publication corrections.

It was recognized that this is a twofold challenge:

  1. The technical problem of centralized metadata capture
  2. The behavioral change required if metadata capture and maintenance became a point of emphasis in the research lifecycle

Of the two challenges, it is the second that is the most difficult. Despite our new world of interoperability and digital transformation, the black box of print workflows still dominate and inform so many of our systems. Because of that, any new process or technical solution is engineered around these frequently outdated assumptions about what is being produced and distributed as research outputs. It must be remembered that one of the most vital outputs of research (in addition to the discovery itself) is clear precise information about said discovery. It is not necessarily a research article in the form of a pretty PDF file or elaborate webpage.Those are just byproducts.

With the article being just a single planet circling the solar system of a given research endeavor, it is vital that the metadata about science becomes central and top of mind to the researcher. Our workflows and processes need to emphasize that point in each step, making it easy and simple for authors to confirm, change, and update those key girders that connect their science to the world.

A challenge of this nature calls for collaboration from various stakeholders, including corresponding and contributing authors, funders, research institutions, indexers, and publishers as ancillary stakeholders. A holistic approach to this challenge, as outlined by the Solution Canvas, can help bridge the gap between the potential benefits of rich metadata and the current challenges faced in its implementation within the scholarly publishing ecosystem.

Further reading

No items found.

Frequently asked questions

Everything you need to know about the product and billing.

Latest from the blog

Ready to witness what agility
in publishing looks like?