How researchers got AI to quote copyrighted books word for word

How researchers got AI to quote copyrighted books word for word

How researchers got AI to quote copyrighted books word for word

https://www.lemonde.fr/en/pixels/article/2026/01/24/how-researchers-got-ai-to-quote-copyrighted-books-word-for-word_6749749_13.html

Publish Date: 2026-01-23 21:54:00

Source Domain: www.lemonde.fr

  • Artificial intelligence (AI) acquires its knowledge from vast datasets, including texts from Wikipedia and large book collections like Books3, which compile nearly 200,000 books without the authors’ permission.
  • Proponents of AI argue that these training datasets represent “universal knowledge” and that AI systems do not memorize texts verbatim but rather process fragmented information.
  • Recent studies, including one by researchers at Stanford University and Yale University, have shown that AI can recall and reproduce entire pages from books, even when disconnected from the internet, challenging the notion that AI does not memorize verbatim.
  • In the study led by Ahmed Ahmed, Gemini 2.5 Pro successfully reproduced 77% of the text from “Harry Potter and the Philosopher’s Stone,” a copyrighted work, when prompted to continue from the opening sentence.
  • The findings imply that AI might have more detailed knowledge of copyrighted texts than previously assumed, raising renewed discussions about copyright law and the use of copyrighted material in AI training.