As you know, here at the DataLab we are working on the RetractoBot project to reduce the number of cited retracted papers.

Citations of retracted RCTs are particularly dangerous because such trials provide strong and unbiased evidence of treatment’s safety and efficacy (hierarchy of evidence). Moreover, results of RCTs are often pooled in systematic reviews and meta-analyses, which are used to synthesise the available evidence on a given subject or to justify clinical guidelines.

Recently the subject of retracted RCTs has been in the news (the Guardian and the BMJ) because it turned out that the strong recommendation by the WHO guidelines to use high concentrations of oxygen to prevent infection after surgery were based on a meta-analysis of 15 RCTs, including two from an Mario Schietroma whose five papers were retracted and a few others were suggested to be unreliable.

A recently published re-analysis of 40 papers authored by Schietroma and his group found problems with data integrity in 38 out of 40 publications. According to the first author of this study, Paul Myles, the re-analysis had been prompted “because the WHO guidance ran counter to a longstanding view that high concentrations of oxygen can be toxic” (see the Guardian article). In the conclusions of their paper they wrote: “We found extensive evidence to support an investigation into work by Mario Schietroma’s group. The evidence challenges the veracity of much, if not all, of the published work from this group.” This makes the inclusion of Schietroma’s work in the WHO guidelines look particularly bad, because the retraction seems to be a result of notorious fraud not an unfortunate mistake. So far five out of 115 Schietroma’s papers have been retracted (1 , 2, 3, 4, 5) for reasons such as plagiarism, problems with the data integrity, and statistical analysis.

If Myles and colleagues are right, and if most of Schietroma’s work is questionable, this would put him in the same category of authors as Boldt or Fujii and other notorious names in the world of fabricated trials. But Myles’ results may be exaggerated and not 38 out of analysed 40 trials are fraudulent. Myles and colleagues used a method proposed by Carlisle in 2017 to detect fabricated data using trial-level p-values. This method, however, has to be interpreted with caution as some of the p-values may happen in a trial for benign reasons and do not mean that the study is fraudulent. On the other hand, this method may miss some less extreme manipulations (see re-analysis of Carlisle paper by Scott Piraino).

A recent independent review of the effectiveness of hyperoxia in patients undergoing surgery by de Jonge and colleagues excluded not only the three retracted papers by Schietroma, but also three other trials by the same group that were not retracted but “contained discrepancies that require further investigation”. This new review did not show a definite beneficial effect of hyperoxia on the incidence of infections – so without trials by Schietroma the evidence has become weaker.

But what about old reviews, published before the Schietroma’s trials were retracted? In theory, these reviews should be clearly flagged and corrected, especially when they used a retracted RCT in the main analysis on which the conclusions are based. In real life, old reviews and meta-analyses are rarely corrected [our paper on this subject is currently under review].