Search Results

Blog Posts (3)

Other Pages (2)

3 results found with an empty search

Privacy and Security when using LLMs
Security and Privacy in the Use of LLMs With the rise of Large Language Models (LLMs) such as ChatGPT, the need for clear policies around security and privacy is growing. These powerful tools can offer enormous benefits, but they also present risks that must not be ignored. In this article, I discuss key considerations and concrete measures to remain secure and compliant. Privacy: How to Stay Compliant? Privacy is a fundamental right, and legislation like the GDPR (General Data Protection Regulation) imposes strict requirements on the processing and storage of personal data. According to the GDPR, personal data of Europeans may only be processed outside the EU if appropriate protection mechanisms are in place. This means organizations must clearly know where their AI provider’s servers are located. Practical Example: Suppose an employee summarizes meeting minutes using an LLM like ChatGPT. If these minutes include names and contact information, they cannot simply be shared with a standard AI tool hosted on servers outside Europe. The solution? Anonymize the data or use a service hosted in Europe, such as Azure OpenAI or a Google Gemini server. Measures to Stay Compliant: • Use AI tools with servers within the EU. • Verify that providers comply with GDPR guidelines. • Share no personal data unless strictly necessary. By following these steps, you remain within legal boundaries and protect sensitive data. Security: Protect Your Data In addition to privacy, security plays an equally important role in using LLMs. Unintentional data leaks or improper usage can have serious consequences. Preventing Data from Being Used for Training Many AI providers use input data for model training. Organizations must ensure their data is not unintentionally stored or reused. • Disable options that allow data use for training. • Carefully review the AI provider’s terms and conditions. Use of Open-Source Models For maximum control, you can opt for open-source LLMs hosted locally on your servers. This ensures that data never leaves your organization. Models like LLaMA 3 or other open-source alternatives can be installed locally, allowing confidential documents to be processed even if the computer is offline. Data Anonymization and Synthetic Data You can also prevent sending “real” data to an LLM by using anonymous or synthetic data: • Anonymization: Remove all personal identifiers before sharing data with an LLM. • Synthetic Data: Use fictitious data that has the same structure as real data. Example: A data analyst wants to generate a chart for an internal presentation using an LLM. Instead of real client data, the analyst creates a synthetic dataset with the same structure. The LLM then generates the code for the chart, and the analyst replaces the synthetic data with the real data. This eliminates the risk of exposing sensitive information. Additional Measures: • Ensure encryption during data transfer. • Limit access to LLM tools to authorized users. • Conduct regular audits to assess compliance and security. Transparency from AI Providers: Building Trust Through Clarity Trust in AI starts with transparency. A reliable provider openly communicates about: • Data Processing: How and why data is used. • Storage Locations: Where your data is physically stored. • Security: Measures taken, such as encryption or access control. Practical Tips for Choosing a Provider: 1. Select AI providers that comply with local laws and regulations, such as GDPR. 2. Verify that the provider performs independent security audits and shares results transparently. 3. Explicitly ask whether your data is used for model training and how to disable this. User Education: Awareness of Data Sharing No security policy is effective without well-informed users. User education plays a crucial role in preventing unintended risks. What Users Need to Know: 1. What to Share and Not Share: Users must understand that sensitive data, such as names or financial information, should never be shared with an LLM without precautions. 2. Using Synthetic Data: Train employees to use synthetic data for testing or tasks involving LLMs. 3. Recognizing Risks: Teach users to identify unsafe situations, such as tools requesting more data than necessary. Example of User Awareness: An employee summarizing client contracts with an LLM must first anonymize the documents and ensure no confidential information is shared. Conclusion LLMs offer significant opportunities to increase productivity and efficiency, but they require careful handling. By implementing privacy and security measures, collaborating with transparent providers, and investing in user education, organizations can fully benefit from LLMs without compromising security. Focusing on the right models, data anonymization, synthetic data, and aware users ensures that LLMs can be used securely.
AI Applications You Can Build Today
With the rise of LLMs (Large Language Models), AI has become accessible to everyone. You no longer need vast amounts of data or computing power to train an AI; you can now leverage pre-trained LLMs to build your own AI applications. In practice, I’ve noticed that while many organizations are interested, they’re often hesitant to take the first step. This hesitation is usually tied to perceived risks associated with AI. For instance, media attention has focused heavily on hallucinating LLMs—models that generate perfect-looking text but produce complete nonsense. Other concerns include fears that data input into LLMs might be used for unintended purposes or that personal data might leave Europe, potentially violating GDPR compliance. These fears are partially valid: there isn’t much experience with LLMs yet, and improper implementation can indeed lead to such issues. However, there are already AI applications we can build today using LLMs that are both reliable and trustworthy. To understand how to use LLMs effectively, we need to revisit their purpose. This begins with the concept of AI, or Artificial Intelligence. In computer science, we consider a system intelligent when it can perform tasks that humans do naturally or learn from a young age, such as recognizing objects in images or understanding and generating human language. While these tasks are simple for humans, they were nearly impossible for computers until a few years ago. To train computers in language skills, models were developed and trained on vast amounts of text from the internet. Using neural networks (a topic for another article), these models eventually became capable of understanding and generating human language. We experience this progress daily, whether it’s talking to a navigation system in the car or giving commands to an assistant like Siri. However, as these models gained language skills, they also acquired “knowledge” from the texts they processed. For example, they likely know Amsterdam is the capital of the Netherlands and can provide a correct answer to that question. But the accuracy diminishes with more complex queries, such as the capital of Germany. The LLM may have read texts stating Berlin is the capital and others claiming Bonn is. As a result, the answer may not always be reliable. For this reason, when building LLM applications today, I recommend using only their language processing capabilities and avoiding reliance on their factual knowledge. By adhering to this limitation, we can already create many reliable applications. For instance, we can build chatbots that use LLMs to answer questions based on documents linked to the chatbot. The LLM can be programmed to refrain from answering if the information isn’t in the documents. Compared to traditional chatbots, LLM-based chatbots excel because they can understand the intent behind a question and provide relevant answers. Traditional chatbots, on the other hand, only work when a question matches predefined question-answer pairs exactly. An excellent example of success with an AI-powered chatbot is Klarna. Their chatbot not only improves customer satisfaction by providing better answers faster (reducing the response time from 11 minutes to just 2), but it also saves Klarna €40 million annually by performing the work of 700 employees. Other use cases for LLMs that leverage only their language capabilities include summarizing documents and extracting structured data from them. At EPSA, for instance, we’ve developed an application that analyzes invoices and assigns them to the correct expense categories, as well as a tool for contract analysis. By using LLMs in this way, we can create trustworthy AI applications today that deliver real value.
AllEyesOnAutomation
The AllEyesOnAutomation seminar, organised by hyperautomation consultancy Roboyo , took place in Berlin, featuring presentations from companies that use AI applications in their daily operations. Several interesting examples were shared: • Permira Investment Fund utilizes AI to analyze potential investments. They have implemented a training program where all employees learn how to craft effective prompts to obtain better answers. Additionally, they’ve developed a set of example prompts for common tasks, which employees can use as a starting point. This approach not only improves their analysis of potential investments and portfolio companies but also helps minimize biases. By leveraging LLMs’ ability to detect sentiment in texts, this aspect is integrated into their decision-making process. • Portuguese bank Millennium BCP presented a case on end-to-end process automation. Previously reliant on OCR and RPA, they have now replaced these tools with LLMs and APIs, providing greater flexibility and efficiency. • Sofware company Rainbird has developed explainable AI using graph databases to store knowledge. This enables the creation of a chatbot that not only provides accurate answers but also explains why those answers are correct. The technology is particularly useful for complex policy questions, such as government regulations and insurance policies, where straightforward answers may not be readily available. The system ensures that responses comply with established policies. A key takeaway from these examples is that the frontrunners in generative AI (genAI) did not gain their advantage with the recent rise of genAI alone. They began much earlier, organizing their data and automating processes as much as possible. This strong foundation enables them to achieve significant improvements with AI in a relatively short timeframe. Back to Home