Confidence Levels in Scientific Writing: Automated Mining of Primary Literature and Press Releases

AbstractScientific communication includes primary scientific literature written by and for scientists, as well as press releases written about these scientific articles that are used to inform the popular press. By the time new scientific findings are reported by the press, the reporting can often reflect 'spin', or reporting that minimizes uncertainties and exaggerates impact, as compared to the original study. In this work, we examine the role that the press release may play in communicative change, in particular with respect to differences in portrayed confidence. We examine a large corpus of over 15,000 documents collected from online databases covering a range of scientific topics, leveraging automated analysis tools from natural language processing to examine how the readability, sentiment, subjectivity, and portrayed confidence varies. We find that press releases are often easier to read, portray more positive sentiment, use language that implies greater objectivity, and demonstrate higher confidence in the findings.

