Javelin Technology Series

Are you guarding your data?

sharathr

AI Engineering

October 19, 2023

Large language models are a technological marvel, with an incredible capability to generate human-like text, answer queries, and perform myriad text-based tasks. They can be immensely valuable for businesses, researchers, and everyday users.

However, one of the foremost concerns is that the data sent to the LLM might end up in its training set. These models could potentially “remember” or “regurgitate” the sensitive data in future outputs. This is particularly problematic for PII or PHI data. Personally Identifiable Information (PII) or Personal Health Information (PHI) refers to any data that can be used to identify a specific individual or their medical history. This might include names, addresses, Social Security numbers, phone numbers, and other sensitive details about a person’s age or health.

This increasingly prevalent exposure can inadvertently leak sensitive data to public databases or private models. Such a scenario is quickly becoming a significant concern for cybersecurity experts and compliance teams worldwide. Addressing these potential data vulnerabilities becomes even more pressing as AI’s footprint expands.

While providers of large models, such as OpenAI, offer assurances that their systems do not retain specific inputs, the potential leakage of PII/PHI data into these models remains a critical concern for several reasons:

Regulatory Implications: Sensitive data, like PII/PHI, fall under various corporate and regional regulations. Industries, especially regulated ones, are mandated to handle this data with utmost care — a guarantee that becomes uncertain once the data is shared with model providers[1].

Challenges with GDPR: The GDPR’s right-to-be-forgotten policy presents a unique challenge. Currently, there’s no straightforward way to request that providers like OpenAI selectively redact specific PII data fields from their storage.

Irretrievability: If PII/PHI data inadvertently becomes part of a model’s training set, extracting and sanitizing it becomes a near-impossible task.

Transmission Vulnerabilities: Even if the model does not assimilate the PII into its training data, there’s the ever-present risk of data interception during transmission. No system is impervious to security vulnerabilities.

Loss of Control: Transmitting PII to an external system inevitably leads to losing control over that data. This becomes a pivotal concern when users are unsure about the data’s backend handling, storage, or potential sharing protocols.

Protecting Sensitive Data Using Javelin

Positioned strategically on the network edge, Javelin acts as a protective intermediary between applications and the models they interact with. This unique positioning gives Javelin a vantage point, allowing it to scrutinize, filter, and manage the data between these two entities.

As data travels from applications destined for various models, Javelin can be configured to analyze and filter out any potential Personally Identifiable Information (PII) or other sensitive data. This ensures that the models never receive data they shouldn’t, safeguarding against inadvertent data exposure.

Javelin empowers enterprises to customize their data protection measures within its feature suite. Javelin’s Data Loss Prevention (DLP) setting can be toggled depending on specific routes with particular sensitivity. For instance, enabling PII detection for a route: myusers lets you specify strategies to obscure sensitive fields in LLM requests.

These strategies offer varied degrees of concealment:

mask: Replaces sensitive data with a uniform ‘########’ placeholder.

redact: Completely removes the identified sensitive data from requests.

replace: Substitutes the sensitive information with generalized placeholders relevant to the data type.

inspect: Simply inspects the LLM requests for sensitive data leaks and takes appropriate actions.

Now, lets take a look at this in action…

‍

‍

Combined with these strategies, you can configure Javelin to enforce restrictions.

‍

‍

For example, you might want just to inspect LLM requests and reject any calls to LLMs that are suspected of containing sensitive information:

‍

‍

A powerful feature for leak detection is to notify your security team when sensitive data is detected:

‍

‍

Ready to move your LLM Applications to production? Make sure your data is safe.

At its core, Javelin’s architecture embraces a zero-trust security philosophy, gearing it for production deployment to help Enterprises transition their LLM Applications from prototype to production with robust policy & security guardrails around model use. It can operate as a security firewall at the network edge, protecting against data leaks. We are working on advanced algorithms and real-time monitoring capabilities to detect and block suspicious data transmission activities, further bolstering this protective shield.

Learn more today!

Book A Demo

Javalin Technology Series

Continue Reading

Introducing Overwatch: Code Agent Security

When developers open their IDEs today, they’re not just writing code. They’re working alongside agents, tools, and servers that can generate, analyze, and even ship code on their behalf. The rise of the Model Context Protocol (MCP) has made it easier than ever for these agents to plug directly into local environments. But the line between helpful and harmful servers is far thinner than most people realize.

When Agents Chain Tools, The Risk Multiplies

Over-privileged access has been a top enterprise risk for decades, granting rights far beyond what’s needed, often leading to breaches and compliance failures. Now it’s back, with a new twist for agentic AI.

Announcing the Ramparts MCP Toolkit on Docker Hub

Javelin is proud to announce that the Ramparts MCP Toolkit is officially available on the Docker Hub registry. We’ve made setup simple with a single docker pull command, enabling any developer to deploy enterprise-grade MCP security scanning in under two minutes.

Are you guarding your data?

sharathr

Protecting Sensitive Data Using Javelin

Continue Reading

Introducing Overwatch: Code Agent Security

When Agents Chain Tools, The Risk Multiplies

Announcing the Ramparts MCP Toolkit on Docker Hub

Stay in touch

Platform

Solutions

Resources

Resources

Company

Company