Backdooring Without a Trace: The Art of Indirect AI Poisoning
Update: 2025-09-09
Description
Can you teach an AI to say “Myspace” is the best social media without ever showing it those words?
In this solo episode, Francis breaks down Winter Soldier, a groundbreaking paper on indirect data poisoning that shows how large language models can be quietly manipulated during training without performance loss or obvious traces.
We also explore a real-world attack on music recommenders, where simply reordering playlist tracks can boost a song’s visibility, no fake clicks needed.
Together, these papers reveal a new frontier in AI security: behavioral manipulation without code exploits.
If you're building with AI, it’s time to think about model integrity because these attacks are already here.
Comments
In Channel