GPT Lab has a robust privacy architecture to ensure sensitive and personal information is not sent to third-party LLM AI providers. Here are the key points:
GPT Lab is designed to be self-hosted, meaning all data processing and storage happens within our own infrastructure. This allows us to keep our and your data completely isolated and prevent any leakage to external parties.
GPT Lab supports running open-source LLMs like Llama, Mixtral and Falcon locally on our own hardware (GPU or CPU). This enables a truly air-gapped system where no data ever leaves our premises. This is possible via support of Ollama, GPT4All and FastChat servers.
When using a cloud LLM provider like Replicate, TogetherAI or HuggingFace, GPT Lab can integrate with them. These providers do not retain any data, ensuring your information remains private.
In our self-hosted community version, personally identifiable information (PII) can be detected and anonymized before being ingested into our vector database on-premise. We also incorporate an "Anonymize Scanner" (such as LLM Guard and/or Microsoft Presidio) to act as a digital guardian. This scanner ensures that user prompts remain confidential and free from sensitive data exposure before sending the prompt data to a third-party cloud LLM provider. This step is not necessary when self-hosting the LLM.
In summary, GPT Lab's self-hosted nature, local LLM support, cloud integration options without data retention, and anonymization of personal information provide a robust privacy architecture to prevent sensitive data exposure. This aligns with the "Privacy by Design" principle of EUDPR/GDPR and IT security best practices.
If you want to learn more about the privacy architecture, please reach out to Antonio De Marinis (DIS1 - GPT-Lab Solution Architect).