In-Ear Insights: Reviewing AI Data Privacy Basics
Description
In this episode of In-Ear Insights, the Trust Insights podcast, Katie and Chris discuss AI data privacy and how AI companies use your data, especially with free versions. You will learn how to approach terms of service agreements. You will understand the real risks to your privacy when inputting sensitive information. You will discover how AI models train on your data and what true data privacy solutions exist. Watch this episode to protect your information!
Watch the video here:
Can’t see anything? Watch it on YouTube here.
Listen to the audio here:
https://traffic.libsyn.com/inearinsights/tipodcast-ai-data-privacy-review.mp3
- Need help with your company’s data and analytics? Let us know!
- Join our free Slack group for marketers interested in analytics!
[podcastsponsor]
Machine-Generated Transcript
What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for listening to the episode.
Christopher S. Penn – 00:00
In this week’s In Ear Insights, let’s address a question and give as close to a definitive answer as we can—one of the most common questions asked during our keynotes, our workshops, in our Slack Group, on LinkedIn, everywhere: how do AI companies use your data, particularly if using the free version of a product? A lot of people say, “Be careful what you put in AI. It can learn from your data. You could be leaking confidential data. What’s going on?” So, Katie, before I launch into a tirade which could take hours long, let me ask you, as someone who is the less technical of the two of us, what do you think happens when AI companies are using your data?
Katie Robbert – 00:43
Well, here’s the bottom line for me: AI is any other piece of software that you have to read the terms in use and sign their agreement for. Great examples are all the different social media platforms. And we’ve talked about this before, I often get a chuckle—probably in a more sinister way than it should be—of people who will copy and paste this post of something along the lines of, “I do not give Facebook permission to use my data. I do not give Facebook permission to use my images.”
And it goes on and on, and it says copy and paste so that Facebook can’t use your information. And bless their hearts, the fact that you’re on the platform means that you have agreed to let them do so.
Katie Robbert – 01:37
If not, then you need to have read the terms, the terms of use that explicitly says, “By signing up for this platform, you agree to let us use your information.” Then it sort of lists out what it’s going to use, how it’s going to use it, because legally they have to do that. When I was a product manager and we were converting our clinical trial outputs into commercial products, we had to spend a lot of time with the legal teams writing up those terms of use: “This is how we’re going to use only marketing data. This is how we’re going to use only your registration form data.” When I hear people getting nervous about, “Is AI using my data?” My first thought is, “Yeah, no kidding.”
Katie Robbert – 02:27
It’s a piece of software that you’re putting information into, and if you didn’t want that to happen, don’t use it. It’s literally, this is why people build these pieces of software and then give them away for free to the public, hoping that people will put information into them. In the case of AI, it’s to train the models or whatever the situation is. At the end of the day, there is someone at that company sitting at a desk hoping you’re going to give them information that they can do data mining on. That is the bottom line. I hate to be the one to break it to you. We at Trust Insights are very transparent. We have forms; we collect your data that goes into our CRM.
Katie Robbert – 03:15
Unless you opt out, you’re going to get an email from us. That is how business works. So I guess it was my turn to go on a very long rant about this. At the end of the day, yes, the answer is yes, period. These companies are using your data. It is on you to read the terms of use to see how. So, Chris, my friend, what do we actually—what’s useful? What do we need to know about how these models are using data in the publicly available versions?
Christopher S. Penn – 03:51
I feel like we should have busted out this animation.
Katie Robbert – 03:56
Oh. I don’t know why it yells at the end like that, but yes, that was a “Ranty Pants” rant. I don’t know. I guess it’s just I get frustrated. I get that there’s an education component. I do. I totally understand that new technology—there needs to be education.
At the end of the day, it’s no different from any other piece of software that has terms of use. If you sign up with an email address, you’re likely going to get all of their promotional emails. If you have to put in a password, then that means that you are probably creating some kind of a profile that they’re going to use that information to create personas and different segments. If you are then putting information into their system, guess what?
Katie Robbert – 04:44
They have to store that somewhere so that they can give it back to you. It’s likely on a database that’s on their servers. And guess who owns those servers? They do. Therefore, they own that data.
So unless they’re doing something allowing you to build a local model—which Chris has covered in previous podcasts and livestreams, which you can go to Trust Insights.AI YouTube, go to our “So What” playlist, and you can find how to build a local model—that is one of the only ways that you can fully protect your data against going into their models because it’s all hosted locally. But it’s not easy to do. So needless to say, Ranty Pants engaged. Use your brains, people.
Christopher S. Penn – 05:29
Use your brains. We have a GPT. In fact, let’s put it in this week’s Trust Insights newsletter. If you’re not subscribed to it, just go to Trust Insights.AI/newsletter. We have a GPT—just copy and paste the terms of service. Copy paste the whole page, paste in the GPT, and we’ll tell you how likely it is that you have given permission to a company to train on your data.
With that, there are two different vulnerabilities when you’re using any AI tool. The first prerequisite golden rule: if you ain’t paying, you’re the product. We warn people about this all the time. Second, the prompts that you give and their responses are the things that AI companies are going to use to train on.
Christopher S. Penn – 06:21
This has different implications for privacy depending on who you are. The prompts themselves, including all the files and things you upload, are stored verbatim in every AI system, no matter what it is, for the average user. So when you go to ChatGPT or Gemini or Claude, they will store what you’ve prompted, documents you’ve uploaded, and that can be seen by another human.
Depending on the terms of service, every platform has a carve out saying, “Hey, if you ask it to do something stupid, like ‘How do I build this very dangerous thing?’ and it triggers a warning, that prompt is now eligible for human review.” That’s just basic common sense. That’s one side.
Christopher S. Penn – 07:08
So if you’re putting something there so sensitive that you cannot risk having another human being look at it, you can’t use any AI system other than one that’s running on your own hardware. The second side, which is to the general public, is what happens with that data once it’s been incorporated into model training. If you’re using a tool that allows model training—and here’s what this means—the verbatim documents and the verbatim prompts are not going to appear in a GPT-5. What a company like OpenAI or Google or whoever will do is they will add those documents to their library and then train a model on the prompt and the response to say, “Did this user, when they prompted this thing, get a good response?”
Christopher S. Penn – 07:52
If so, good. Let’s then take that document, digest it down into the statistics that it makes up, and that gets incorporated into the rest of the model. The way I explain it to people in a non-technical fashion is: imagine you had a glass full of colored sand—it’s a little rainbow glass of colored sand. And you went out to the desert, like the main desert or whatever, and you just poured the glass out on the ground.
That’s the equivalent of putting a prompt into someone’s trained data set. Can you go and scoop up some of the colored sand that was your sand out of the glass from the desert? Yes, you can. Is it in the order that it was in when you first had it in the glass? It is not.
Christopher S. Penn – 08:35
So the ability for someone to reconstruct your original prompts and the original data you uploaded from a public model, GPT-5, is extremely low. Extremely low. They would need to know what the original prompt was, effectively, to do that, which then if they know that, then you’ve got different privacy problems. But is your data in there? Yes. Can it be used against you by the general public? Almost certainly not. Can the originals be seen by an employee of OpenAI? Yes.
Katie Robbert – 09:08
And






















