Skip to main content

AI & ML Companies · Cybersecurity Training

Samsung. OpenAI. DeepSeek. NVIDIA. $1.5B+ in Breach Costs — None Started With a Hack.

AI companies hold the most valuable intellectual property in the world — model weights, training data, proprietary architectures — and employees are the attack surface. A single engineer pasting code into an unsanctioned AI tool can expose IP worth hundreds of millions. Live training that closes the human gap before the incident, and satisfies NIST AI RMF, EU AI Act Art. 26(6), SOC 2, and your enterprise security questionnaire.

Train Your ML Team → Executive Briefing →

$1.5B+

Combined anchor breach costs

4

Named breach case studies

3

Training drills per session

5

Compliance frameworks covered

The Incidents That Define the AI/ML Threat Landscape

Each breach below exposes a distinct human-layer failure — and a training gap your team almost certainly shares.

⚠ CANONICAL SHADOW AI INCIDENT

Samsung — March 2023, 38TB of Semiconductor IP

Shadow AI · No Policy · Internal Disclosure

Three separate incidents within weeks of Samsung lifting its internal ban on generative AI: engineers pasted confidential semiconductor fab source code into ChatGPT for debugging, uploaded meeting notes from restricted design reviews, and shared proprietary chip yield-optimization logic. Each employee acted independently. None believed they were violating policy — because no AI-specific policy existed. Samsung responded by building internal LLMs, but the IP had already left the building. The root cause was the absence of data classification guidance for AI tool inputs.

Key lesson: Consumer AI tiers default to training on user inputs unless explicitly disabled. Without an AI Acceptable Use Policy and employee training, every engineer is one paste event away from an IP breach.

⚠ INSIDER & EXTERNAL THREAT

OpenAI — 2023 Internal Forum Breach & 2024 Credential Attacks

Internal Forum Exfiltration · Credential Phishing · Nation-State Concern

In early 2023, a threat actor gained access to an internal OpenAI discussion forum used by employees and researchers, exfiltrating details about the company's AI research approaches and internal technical discussions. OpenAI did not disclose the incident publicly; it was revealed via The New York Times in July 2024. A senior executive raised concerns that the attacker might be linked to a foreign adversary seeking to steal AI technology — concerns that were reportedly not escalated to the board or federal authorities. Separately, OpenAI's systems were targeted in multiple credential-based attacks through 2024, as threat actors targeted employees with access to model weights, research pipelines, and training infrastructure.

Key lesson: AI model companies are explicit nation-state targets. Internal communication platforms are as valuable as source code repositories — they describe what you're building, how, and why. Access controls and insider-threat training must extend to research tooling, not just production systems.

⚠ EXPOSED PRODUCTION DATABASE

DeepSeek — January 2025, Publicly Accessible ClickHouse Database

Misconfigured Database · Chat Logs · API Keys · PII Exposure

Security researchers at Wiz discovered a publicly accessible ClickHouse database belonging to DeepSeek, the Chinese AI lab, containing over one million log entries. The exposed data included plaintext chat history from user interactions, API keys and authentication tokens, backend service metadata, and system prompt configurations. The database required no authentication to query. The exposure was discovered within days of DeepSeek's R1 model release — meaning the company was operating a production AI service with an unauthenticated database accessible from the public internet. DeepSeek secured the database after disclosure. The incident prompted multiple governments to restrict DeepSeek deployments on government devices.

Key lesson: AI infrastructure moves fast. Speed-to-market culture creates the same misconfiguration risks that plagued early cloud adoption. Every engineer deploying AI services needs explicit training on database authentication hygiene, API key rotation, and infrastructure security review before any production launch.

⚠ AI INFRASTRUCTURE SUPPLY CHAIN

NVIDIA — February 2022, 1TB Exfiltrated by Lapsus$

Credential Compromise · GPU Source Code · 71,000 Employee Records · Extortion

Lapsus$, an extortion group with a track record of social engineering and credential theft, compromised NVIDIA's network via stolen VPN credentials. Approximately 1TB of data was exfiltrated including GPU driver source code, proprietary DLSS (Deep Learning Super Sampling) AI model architecture code, hardware schematics, and credentials for over 71,000 NVIDIA employees. Lapsus$ demanded NVIDIA remove hash-rate limiting from consumer GPUs. NVIDIA refused. The attackers subsequently leaked the stolen code publicly. The breach exposed that foundational AI infrastructure vendors — GPU manufacturers, cloud ML platform providers, model hosting services — are premium targets whose compromise cascades across the entire AI supply chain.

Key lesson: If your AI stack depends on any vendor's hardware, drivers, or pre-trained components, a compromise at that vendor affects you. AI supply chain risk requires the same third-party security scrutiny applied to SaaS vendors — including vetting credentials hygiene, incident notification obligations, and access scope.

Four Threat Vectors Unique to AI & ML Organizations

Traditional security awareness training wasn't built for teams that deploy language models, maintain training pipelines, and handle novel data types. These four vectors require AI-specific training.

🧠

Model IP Theft & Weight Exfiltration

Model weights trained on proprietary datasets represent years of compute investment. Adversaries — including nation-state actors — target engineers with access to model repositories, MLflow registries, and Hugging Face private spaces. A single compromised developer credential can expose the entire model lineage. Training covers: privileged access hygiene for model repositories, recognition of model-exfiltration phishing lures, and secure handoff protocols for model weights.

☁️

Training-Data Poisoning & Pipeline Compromise

Data poisoning attacks inject adversarial samples into training pipelines to corrupt model outputs — shifting classification boundaries, embedding backdoors, or reducing safety filter efficacy. These attacks are often silent: the poisoned model passes all standard evaluations until the trigger condition is met. Training covers: data provenance verification, third-party dataset vetting, pipeline access controls, and recognition of anomalous evaluation divergence that may signal poisoning.

🔑

API Key Leakage & Credential Hygiene

Engineers routinely handle API keys for OpenAI, Anthropic, Cohere, AWS Bedrock, Azure OpenAI, and HuggingFace. Keys accidentally committed to public repositories, embedded in Jupyter notebooks, or shared in Slack messages create immediate exposure — often exploited within minutes of public indexing. Training covers: pre-commit secret scanning, environment variable hygiene, key rotation cadence, blast-radius containment when a key is confirmed leaked, and proper key scope minimization for CI/CD pipelines.

👤

Shadow AI & Unsanctioned Tool Adoption

Research from Cyberhaven (2024) found that 85% of employees use at least one AI tool not approved by IT. In AI companies, where engineers and researchers have higher tool autonomy and faster adoption cycles, the shadow AI problem is acute. Consumer AI tiers default to training on user inputs unless explicitly disabled — meaning confidential code, internal architectures, and customer data shared with unsanctioned tools may enter third-party model training pipelines. Training covers: approved AI tool taxonomy, data classification framework for AI inputs, and AUP enforcement without blocking legitimate productivity workflows.

Compliance Frameworks That Apply to AI & ML Companies

Your enterprise customers' security questionnaires are getting harder. Here's how SecurEveryone's AI/ML training maps to the frameworks they're asking about.

Framework Key Requirement Training Coverage
NIST AI RMF 1.0
GV-1.1, GV-2.2, MAP-1.6
Workforce AI risk awareness, organizational AI risk policies, and ongoing measurement of human-AI interaction risks Full coverage — documented evidence package included
EU AI Act
Art. 26(6), Annex III
AI literacy and operational training for staff operating or supervising high-risk AI systems; enforced from August 2026 Full coverage — Art. 26(6) training documentation
SOC 2 Type II
CC6.1, CC6.2, CC7.3, CC9.2
Logical access controls, vendor risk management, change management, and security awareness training documentation Full coverage — SOC 2 CC6/CC7 evidence ready
ISO 27001:2022
A.6.3, A.8.7, A.8.24
Information security awareness training, malware protection, and use of cryptography policies — extended to AI systems and tools Full coverage — ISO 27001 A.6.3 documented
GDPR / CCPA
Art. 32, Art. 25; Cal. Civ. Code §1798.150
Training on data minimization, lawful basis for AI processing of personal data, and data-by-design principles for AI systems handling PII Full coverage — privacy-by-design for AI pipelines

Three Training Drills — Built for AI & ML Teams

Each drill reconstructs a real incident, walks participants through the decision points, and ends with a documented takeaway your security team can cite in audits.

DRILL 1

Samsung Shadow AI Tabletop — "What Can I Paste Here?"

Participants are given a realistic engineering scenario: a production bug at 11 PM, ChatGPT available, the internal LLM too slow. They work through the decision tree in real time — what data classification level does this code carry? Does the tool have a DPA? Is there a training exclusion in place? What's the blast radius if this gets indexed?

Takeaways:

  • Data classification matrix for AI tool inputs (4 tiers: Public / Internal / Confidential / Restricted)
  • Approved AI tool registry with DPA and training-exclusion status for each tool
  • Written Acceptable Use Policy template ready for internal adoption
DRILL 2

API Key Exfiltration Simulation — "The Commit That Didn't Look Dangerous"

Reconstructing a real-world incident pattern: a developer commits a Jupyter notebook to a public GitHub repo with an API key in an environment variable comment. A scanner picks it up within 14 minutes. By the time the developer reverts the commit, the key has been scraped and the associated cloud account has had 47 API calls made against it. Participants walk through the detection, containment, rotation, and blast-radius assessment workflow.

Takeaways:

  • Pre-commit hook configuration for secret detection (git-secrets, detect-secrets, TruffleHog)
  • Key rotation runbook with 15-minute containment SLA
  • Least-privilege scoping checklist for AI platform API keys in CI/CD pipelines
DRILL 3

AI Vendor Vetting Exercise — "Before You Deploy, Who Has Your Data?"

Using the 20-question AI vendor checklist from SecurEveryone's AI Tool Security Vetting Kit, participants evaluate a realistic AI vendor scenario (new productivity tool, no DPA on file, consumer-tier pricing). They identify the data-handling risks, negotiate the required DPA clauses, flag the sub-processor chain, and produce a binary approve/approve-with-controls/reject decision for the security review record.

Takeaways:

  • Completed AI vendor security review template (SOC 2 CC9.2 evidence)
  • DPA clause checklist covering training exclusion, data retention, sub-processors, and breach notification
  • Decision tree for shadow AI discovery — what to do when you find an unsanctioned tool in use

Book a Session for Your AI Team

Live, expert-led, built around your stack — MLflow, HuggingFace, AWS Bedrock, Azure OpenAI, or your own infrastructure.

Personal — $150 → Executive — $390 → Business — $900 flat →
📋

Free Resource

AI Tool Security Vetting Kit — 17-Page PDF

20-question AI vendor checklist · Data classification matrix for ChatGPT/Copilot/Claude/Gemini · Sample DPA + AI addendum clauses · NIST AI RMF GOVERN/MAP/MEASURE/MANAGE control mapping · SOC 2 + EU AI Act crosswalk · Shadow AI discovery guide · AUP template · Approve/approve-with-controls/reject decision tree.

Download Free — No Credit Card →

Common questions from AI and ML security teams.

What are the biggest cybersecurity risks for AI and ML companies?

AI and ML companies face five primary threat vectors: (1) Model IP theft — adversaries exfiltrate model weights and training data via compromised credentials or insider access; (2) Training-data poisoning — attackers inject adversarial samples into pipelines to corrupt model behavior; (3) API key and token leakage — engineers accidentally commit keys to public repositories, exposing infrastructure; (4) Prompt injection — attackers manipulate deployed models to bypass safety guardrails or exfiltrate system prompts; (5) Shadow AI adoption — employees using unsanctioned consumer AI tools expose confidential code and customer PII to third-party training pipelines. The Samsung incident (38TB of semiconductor IP via ChatGPT paste events) remains the canonical example.

What did the Samsung ChatGPT breach reveal about AI tool security?

In March 2023, Samsung engineers pasted confidential source code, meeting notes, and chip design documentation into ChatGPT. Three separate incidents occurred within weeks — each involving different employees, none of whom believed they were violating policy because no AI-specific policy existed. Consumer AI tiers at the time defaulted to training on user inputs unless explicitly disabled. The root cause was the absence of data classification guidance for AI tool inputs and lack of employee training on what 'confidential' means in the context of consumer AI platforms.

How does NIST AI RMF 1.0 apply to AI company security training?

NIST AI RMF 1.0 organizes AI risk management across four functions: GOVERN, MAP, MEASURE, and MANAGE. For workforce training, the GOVERN function (GV-1.1 through GV-6.2) requires organizations to establish policies, accountability structures, and training programs. MEASURE (MS-2.5, MS-2.6) requires ongoing monitoring of human-AI interaction risks. SecurEveryone's AI/ML training maps to GV-1.1 (organizational risk policies), GV-2.2 (workforce awareness), and MAP-1.6 (AI risk identification at the team level), providing documented evidence for SOC 2 and ISO 27001 assessments.

What does the EU AI Act require for AI companies regarding employee training?

The EU AI Act (enforcement beginning August 2026 for high-risk AI systems) requires that staff involved in operating, supervising, or interpreting outputs of high-risk AI systems have sufficient AI literacy and operational training under Article 26(6). For AI companies building systems in employment, credit, infrastructure, or biometric categories, this creates a legal obligation to document training. SecurEveryone's AI/ML training delivers the documented workforce training evidence required under Article 26(6) and covers the 14 high-risk AI system categories in Annex III.

How did the NVIDIA Lapsus$ breach expose AI company supply chain vulnerabilities?

In February 2022, Lapsus$ compromised NVIDIA via stolen VPN credentials and exfiltrated approximately 1TB of data including GPU driver source code, DLSS AI model architecture details, hardware schematics, and credentials for over 71,000 employees. The breach exposed that foundational AI infrastructure vendors — GPU manufacturers, cloud ML platforms, model hosting services — are premium targets whose compromise cascades across the entire AI supply chain. AI companies must apply the same third-party security scrutiny to their infrastructure vendors as they do to SaaS vendors.

Related Training Programs

💻 SaaS & Technology Training → 🏦 Financial Services Training → 🏭 All 33 Industry Programs →

Your engineers are your biggest IP risk. Train them before the incident.

Book a live session today. Each session is 60–120 minutes, held over Zoom, built around your team's actual AI stack — MLflow, HuggingFace, AWS Bedrock, Azure OpenAI, or your own infrastructure. Walk away with a completed NIST AI RMF evidence record, an AI Acceptable Use Policy, and a team that knows what a Samsung-style shadow AI incident looks like before it happens.

Train Your ML Team → Executive Briefing → Book Business (Unlimited) →

Sessions from $150 · Unlimited users on Business tier · 24/7 emergency access available

SecurEveryone · NIST AI RMF 1.0 / EU AI Act Art. 26(6) / SOC 2 CC6/CC7 / ISO 27001 A.6.3 · $150–$900 · Live expert coaching