When scientific citations become illegal: Revealing ‘secretly cited references’

A researcher working alone – disconnected from the world and the wider scientific community – is a classic but misleading image. In reality, research is based on continuous exchange within the scientific community: first you understand the work of others, then you share your findings.

Reading and writing articles published in scientific journals and presented at conferences is a central part of being a researcher. When writing a scientific article, researchers should cite the work of colleagues to provide context, describe sources of inspiration, and explain differences in approaches and results. Positive citation by other researchers is an important measure of the visibility of a researcher’s own work.

But what happens when this citation system is manipulated? A recent paper in the Journal of the Association for Information Science and Technology by our team of academic sleuths—including information scientists, a computer scientist, and a mathematician—has revealed a sneaky method for artificially inflating citation counts through metadata manipulation: sneaky references.

Hidden manipulation

People are becoming increasingly aware of scientific publications and how they work, including their potential flaws. Last year alone, over 10,000 scientific papers were retracted. The problems surrounding citation gaming and the damage it does to the scientific community, including damaging its credibility, are well documented.

Citations of scholarly work adhere to a standardized referencing system: each reference explicitly mentions at least the title, author names, year of publication, journal or conference name, and page numbers of the cited publication. These details are stored as metadata, not directly visible in the text of the article, but assigned to a digital object identifier, or DOI – a unique identifier for each scholarly publication.

References in a scientific publication allow authors to justify methodological choices or to present the results of previous studies. This emphasizes the iterative and collaborative nature of science.

However, we discovered through a chance encounter that some unscrupulous actors have added extra references, invisible in the text but present in the metadata of the articles, when they submitted the articles to scientific databases. The result? The number of citations for certain researchers or journals has skyrocketed, even though these references were not cited by the authors in their articles.

Accidental discovery

The investigation began when Guillaume Cabanac, a professor at the University of Toulouse, wrote a post on PubPeer, a website dedicated to post-publication peer review, where scientists discuss and analyze publications. In the post, he described how he had noticed an inconsistency: a Hindawi journal article that he suspected of being fraudulent because it contained awkward sentences had far more citations than downloads, which is highly unusual.

The post caught the attention of several sleuths who are now the authors of the JASIST article. We used a scholarly search engine to look for articles that cited the original article. Google Scholar found none, but Crossref and Dimensions did find references. The difference? Google Scholar likely relies primarily on the body of the article to extract the references that appear in the bibliography section, while Crossref and Dimensions use metadata provided by publishers.

A new type of fraud

To understand the extent of the manipulation, we examined three scientific journals published by the Technoscience Academy, the publisher responsible for the articles with questionable citations.

Our research consisted of three steps:

  1. We have listed the references that appear explicitly in the HTML or PDF versions of an article.

  2. We compared these lists with Crossref’s metadata and found that additional references had been added to the metadata, but were not included in the articles.

  3. We checked Dimensions, a bibliometric platform that uses Crossref as a metadata source, and found even more inconsistencies.

In the journals published by Technoscience Academy, at least 9% of the recorded references were “hidden references.” These extra references were only in the metadata, distorting citation counts and giving certain authors an unfair advantage. Some legitimate references were also lost, meaning they were not in the metadata.

Furthermore, when analyzing the surreptitious references, we discovered that some researchers profited greatly from them. For example, a single researcher affiliated with the Technoscience Academy profited from more than 3,000 additional illegal citations. Some journals from the same publisher profited from a few hundred additional surreptitious citations.

We wanted our results to be externally validated, so we posted our study as a preprint, informed Crossref and Dimensions of our findings, and provided them with a link to the preprinted study. Dimensions acknowledged the citation violations and confirmed that their database mirrors Crossref’s data. Crossref also acknowledged the extra references in Retraction Watch and stressed that this was the first time it had been made aware of such an issue in its database. The publisher has taken action to resolve the issue based on Crossref’s investigation.

Implications and possible solutions

Why is this discovery important? Citation counts have a major impact on research funding, academic promotions, and institutional rankings. Manipulation of citations can lead to unfair decisions based on false data. More worryingly, this discovery raises questions about the integrity of systems for measuring scientific impact, a concern that has been highlighted by researchers for years. These systems can be manipulated to promote unhealthy competition among researchers, tempting them to take shortcuts to publish faster or earn more citations.

To combat this practice we propose several measures:

  • Strict metadata verification by publishers and agencies such as Crossref.

  • Independent audits to ensure data reliability.

  • Greater transparency in the management of references and citations.

This study is, to our knowledge, the first to report on metadata manipulation. It also discusses the impact this can have on the evaluation of researchers. The study highlights, again, that overreliance on statistics to evaluate researchers, their work, and their impact can be inherently flawed and misguided.

Such overreliance is likely to encourage questionable research practices, including the formulation of hypotheses after results are known, or HARKing; the splitting of a single data set into multiple papers, known as salami slicing; data manipulation; and plagiarism. It also hampers the transparency that is essential for more robust and efficient research. While the problematic citation metadata and sneaky references have now apparently been resolved, the corrections may, as is often the case with scientific corrections, have been made too late.

This article was published in partnership with Binaire, a blog for understanding digital issues.

This article is republished from The Conversation, an independent nonprofit organization that brings you facts and analysis to help you understand our complex world.

It was written by: Lonni Besançon, Linköping University and Guillaume Cabanac, Toulouse Institute for Computer Science Research.

Read more:

Lonni Besançon receives funding from the Marcus And Amalia Wallenberg Foundation.

Guillaume Cabanac receives funding from the European Research Council (ERC) and the Institut Universitaire de France (IUF). He is the administrator of the Problematic Paper Screener, a public platform using metadata from Digital Science and PubPeer via no-cost agreements.

Thierry Viéville does not work for, advise, own shares in, or receive funding from any company or organization that would benefit from this article. Furthermore, he has disclosed no relevant affiliations beyond his academic appointment.

Leave a Comment