Large language models such as ChatGPT can produce increasingly realistic text, with unknown information on the accuracy and integrity of using these models in scientific writing. In order to test the ability to detect artificially generated texts, the authors gathered ten research abstracts from five high-impact factor medical journals (n=50) and asked ChatGPT to generate research abstracts based on their titles and journals. Then they evaluated the abstracts using an artificial intelligence (AI) output detector, plagiarism detector, and had blinded human reviewers try to distinguish whether abstracts were original or generated.

All ChatGPT-generated abstracts were written clearly but only 8% correctly followed the specific journal’s formatting requirements. Most generated abstracts were detected using the AI output detector, with scores (higher meaning more likely to be generated) of median of 99.98% compared with very low probability of AI-generated output in the original abstracts of 0.02%. The AUROC of the AI output detector was 0.94. Generated abstracts scored very high on originality using the plagiarism detector (100% originality). Generated abstracts had a similar patient cohort size as original abstracts, though the exact numbers were fabricated. When given a mixture of original and general abstracts, blinded human reviewers correctly identified 68% of generated abstracts as being generated by ChatGPT, but incorrectly identified 14% of original abstracts as being generated. Reviewers indicated that it was surprisingly difficult to differentiate between the two, but that the generated abstracts were vaguer and had a formulaic feel to the writing.1

Read the full text here.

References

  1. Gao, C. A., Howard, F. M., Markov, N. S., Dyer, E. C., Ramesh, S., Wang, F., & Pearson, A. T. (2022). Comparing scientific abstracts generated by ChatGPT to original abstracts using an artificial intelligence output detector, plagiarism detector, and blinded human reviewers. BioRxiv (Cold Spring Harbor Laboratory). https://doi.org/10.1101/2022.12.23.521610

Disclaimers

  • The material in these reviews is from various public open-access sources, meant for educational and informational purposes only
  • Any personal opinions expressed are those of only the author(s) and are not intended to represent the position of any organization(s)
  • No official support by any organization(s) has been provided or should be inferred