Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

are they even sure that the AI even accessed the content that second time? LLMs are really good and making up shit. I have tested this by asking various LLMs to scrape data from my websites while watching access logs. Many times, they don't and just rely on some sort of existing data or spout a bunch of BS. Gemini is especially bad like this. I have not used copilot myself, but my experience with other AI makes me curious about this.


This is it. M365 uses RAG on your enterprise data that you allow it to access. It's not actually accessing the files directly in the cases he provided. It's working as intended.


If this is indeed how copilot is archtected, then it needs clear documentation -- that it is a non-audited data store.

But how then did MS "fix" this bug? Did they stop pre-ingesting, indexing, and caching the content? I doubt that.

Pushing (defaulting) organizations to feed all their data to Copilot and then not providing an audit trail of data access on that replica data store -- feels like a fundamental gap that should be caught by a security 101 checklist.


How would you audit that?


If that's the case, then as noted in the article, the 'as intended' is probably violating liability requirements around various things.


Correct. It is precisely that a user can ask about someone’s medical history (or whatever else) and not be reported that would be in violation of any heavily audited system. LLM Summaries break the compliance.


You allow what it can and can't see. If you include PII and medical records, that's your fault, not MS's.


That’s fair - unless they’re marketing the bot as compliant.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: