An AI system needs the data relevant to the task you are giving it, and no more. For answering questions it needs your documents and policies; for automating a workflow it needs the records that workflow touches. The rule is to give it the least it needs to do the job, never your whole business by default.
Information current as at 5 July 2026
It is tempting to think an AI system needs everything, all your files, your customers, your history, to be useful. It does not, and giving it everything is both unnecessary and risky. What a system genuinely needs depends entirely on the job you are asking it to do, and the sensible starting point is always the minimum.
There is no single answer to what data AI needs, because it depends entirely on the task. A tool that answers customer questions needs your policies, product details and common answers. A tool that drafts replies needs examples of how you write. A tool that automates an approval needs the records that approval touches. The mistake is treating AI as one hungry thing that wants all your data. It is a set of specific tasks, and each task needs a specific, and usually small, slice of information to do its job well.
Most business data falls into two shapes. There is your knowledge: documents, policies, guides, past answers, the written record of how your business works and what it knows. A tool that answers questions or drafts content draws on this. And there is your structured data: the rows and fields in your systems, customers, orders, bookings, inventory. A tool that automates a workflow acts on this. Knowing which kind a task needs tells you what to connect and, just as importantly, what to leave well alone. Most first projects need one kind, not both.
If you have made something and it needs to become real, send it over. We will tell you honestly what it needs to be live, safe and yours, whether that is a quick fix you can do or a proper build. No obligation.
The safe default is to give a system the least it needs to do the job, not the most it could theoretically use. If a tool answers questions about your returns policy, it needs your returns policy, not your entire customer database. Scoping data tightly is not only safer if something goes wrong, it usually makes the tool better, because a focused system with relevant information gives sharper answers than one drowning in everything. Broad access is a liability that rarely improves the result and always increases the risk.
Some data deserves particular caution before it goes anywhere near an AI system: personal information about customers or staff, payment details, health or financial records, anything covered by a confidentiality obligation, and anything you would be alarmed to see leak. For these, ask three questions before connecting anything. Does the task genuinely need this specific data, or a smaller subset? Where does the data go once the tool has it, and who can see it? And can you remove access cleanly later? If you cannot answer these confidently, that is a sign to slow down and get the data handling looked at properly before you proceed.
If you have made something and it needs to become real, send it over. We will tell you honestly what it needs to be live, safe and yours, whether that is a quick fix you can do or a proper build. No obligation.
Whether you can name exactly what you want built, or you just know something is leaking, the next step is the same conversation.