我扫描了2500个Hugging Face模型,以查找恶意软件/问题。以下是数据。
I scanned 2,500 Hugging Face models for malware/issues. Here is the data

原始链接: https://github.com/ArseniiBrazhnyk/Veritensor

## Veritensor:AI供应链零信任安全 Veritensor 是一款安全工具,旨在通过验证 AI 模型的**安全性、真实性和合规性**来保护 AI 供应链。与传统杀毒软件不同,它深入分析 AI 特定格式,如 Pickle、PyTorch、Keras 和 GGUF,通过与 Hugging Face 等注册表的密码学验证,识别恶意代码(RCE、注入)和篡改。 主要功能包括**深度静态分析**(反编译字节码以查找隐藏攻击)、**许可证防火墙**(阻止具有限制性许可证的模型)以及通过与 Sigstore Cosign 集成,对 Docker 容器进行签名,从而实现**供应链安全**。 Veritensor 可以无缝集成到 CI/CD 管道(GitHub Actions、GitLab、pre-commit),并提供通过 PyPI 和 Docker 进行灵活部署的选项。它提供 SARIF 和 SBOM 等格式的详细报告,并允许通过 `veritensor.yaml` 配置文件自定义安全策略。 定期签名更新可确保最新的威胁检测。

## Hugging Face 模型安全扫描总结 一名开发者创建了 **Veritensor**,一个用于扫描人工智能模型潜在安全问题和许可问题的 CLI 工具,其动机是加载模型权重时存在远程代码执行 (RCE) 的风险。他们扫描了 2,500 个 Hugging Face 模型,发现 **86 个存在问题**,分类如下: * **损坏的文件 (16):** Git LFS 指针被误认为是二进制文件。 * **隐藏的许可协议 (5):** 非商业许可协议嵌入在模型头文件中。 * **影子依赖 (49):** 模型尝试导入未安装的库。 * **可疑代码 (11):** 使用诸如 `STACK_GLOBAL` 之类的技术,可能表明隐藏的恶意软件(主要在较旧的 numpy 文件中)。 * **扫描错误 (5):** 缺少本地依赖导致无法加载。 Veritensor 不同于简单的正则表达式扫描器,它通过 *模拟* 数据加载(不执行)并使用哈希值与 Hugging Face 的版本验证文件完整性。它还检查元数据中的许可限制,并可以对容器进行签名以增加安全性。 该工具支持 PyTorch、Keras 和 GGUF,可在 PyPI 上获取 (`pip install veritensor`),扫描数据在 GitHub 上公开可用 ([https://github.com/ArseniiBrazhnyk/Veritensor](https://github.com/ArseniiBrazhnyk/Veritensor))。 正在征求反馈,并讨论现有的解决方案,如 SafeTensors,以及对扫描标志进行更精细控制的必要性。
相关文章

原文

Hugging Face Spaces PyPI version Docker Image License CI Security

Veritensor is the Zero-Trust security tool for the AI Supply Chain. It replace naive model scanning with deep AST analysis and cryptographic verification.

Unlike standard antiviruses, Veritensor understands AI formats (Pickle, PyTorch, Keras, GGUF, Wheels) and ensures that your models:

  1. Are Safe: Do not contain malicious code (RCE, Reverse Shells, Lambda injections).
  2. Are Authentic: Have not been tampered with (Hash-to-API verification against Hugging Face).
  3. Are Compliant: Do not violate commercial license terms (e.g., CC-BY-NC, AGPL).
  4. Are Trusted: Can be cryptographically signed before deployment.

  • Deep Static Analysis: Decompiles Pickle bytecode and Keras Lambda layers to find obfuscated attacks (e.g., STACK_GLOBAL exploits). Now supports deep scanning of Zip archives (PyTorch) and Python Wheels.
  • Identity Verification: Automatically verifies model hashes against the official Hugging Face registry to detect Man-in-the-Middle attacks.
  • License Firewall: Blocks models with restrictive licenses (e.g., Non-Commercial, AGPL). Veritensor performs a hybrid check: it inspects embedded file metadata first, and automatically falls back to the Hugging Face API if metadata is missing (requires --repo).
  • Supply Chain Security: Integrates with Sigstore Cosign to sign Docker containers. Includes timestamps to prevent replay attacks.
  • CI/CD Native: Ready for GitHub Actions, GitLab, and Pre-commit pipelines.

Via PyPI (Recommended for local use)

Lightweight installation (no heavy ML libraries required).

Via Docker (Recommended for CI/CD)

docker pull arseniibrazhnyk/veritensor:latest

Check a file or directory for malware:

veritensor scan ./models/bert-base.pt

Example Output:

╭────────────────────────────────╮
│ 🛡️  Veritensor Security Scanner │
╰────────────────────────────────╯
                                    Scan Results
┏━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ File         ┃ Status ┃ Threats / Details                    ┃ SHA256 (Short) ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ model.pt     │  FAIL  │ CRITICAL: os.system (RCE Detected)   │ a1b2c3d4...    │
└──────────────┴────────┴──────────────────────────────────────┴────────────────┘
❌ BLOCKING DEPLOYMENT

2. Verify against Hugging Face

Ensure the file on your disk matches the official version from the registry (detects tampering):

veritensor scan ./pytorch_model.bin --repo meta-llama/Llama-2-7b

3. License Compliance Check

Veritensor automatically reads metadata from safetensors and GGUF files. If a model has a Non-Commercial license (e.g., cc-by-nc-4.0), it will raise a HIGH severity alert.

To override this (Break-glass mode), use:

veritensor scan ./model.safetensors --force

📊 Reporting & Compliance

Veritensor supports industry-standard formats for integration with security dashboards and audit tools.

1. GitHub Security (SARIF)

Generate a report compatible with GitHub Code Scanning:

veritensor scan ./models --sarif > veritensor-report.sarif

2. Software Bill of Materials (SBOM)

Generate a CycloneDX v1.5 SBOM to inventory your AI assets:

veritensor scan ./models --sbom > sbom.json

For custom parsers and SOAR automation:

veritensor scan ./models --json

🔐 Supply Chain Security (Container Signing)

Veritensor integrates with Sigstore Cosign to cryptographically sign your Docker images only if they pass the security scan.

Generate a key pair for signing:

veritensor keygen
# Output: veritensor.key (Private) and veritensor.pub (Public)

Pass the --image flag and the path to your private key (via env var).

# Set path to your private key
export VERITENSOR_PRIVATE_KEY_PATH=veritensor.key

# If scan passes -> Sign the image
veritensor scan ./models/my_model.pkl --image my-org/my-app:v1.0.0

3. Verify (In Kubernetes / Production)

Before deploying, verify the signature to ensure the model was scanned:

cosign verify --key veritensor.pub my-org/my-app:v1.0.0

Add this to your .github/workflows/security.yml to block malicious models in Pull Requests:

name: AI Security Scan
on: [pull_request]

jobs:
  veritensor-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Scan Models
        uses: ArseniiBrazhnyk/[email protected]
        with:
          path: './models'
          repo: 'meta-llama/Llama-2-7b' # Optional: Verify integrity
          force: 'false' # Set to true to not fail build on threats

Prevent committing malicious models to your repository. Add this to .pre-commit-config.yaml:

repos:
  - repo: https://github.com/ArseniiBrazhnyk/Veritensor
    rev: v1.3.1
    hooks:
      - id: veritensor-scan

Format Extension Analysis Method
PyTorch .pt, .pth, .bin Zip extraction + Pickle VM Bytecode Analysis
Pickle .pkl, .joblib Deep AST Analysis (Stack Emulation)
Keras .h5, .keras Lambda Layer Detection & Config Analysis
Safetensors .safetensors Header Parsing & Metadata Validation
GGUF .gguf Binary Parsing & Metadata Validation
Python Wheel .whl Archive Inspection & Heuristic Analysis

You can customize security policies by creating a veritensor.yaml file in your project root. Pro Tip: You can use regex: prefix for flexible matching.

# veritensor.yaml

# 1. Security Threshold
# Fail the build if threats of this severity (or higher) are found.
# Options: CRITICAL, HIGH, MEDIUM, LOW.
fail_on_severity: CRITICAL

# 2. License Firewall Policy
# If true, blocks models that have no license metadata.
fail_on_missing_license: false

# List of license keywords to block (case-insensitive).
custom_restricted_licenses:
  - "cc-by-nc"       # Non-Commercial
  - "agpl"           # Viral licenses
  - "research-only"

# 3. Static Analysis Exceptions (Pickle)
# Allow specific Python modules that are usually blocked by the strict scanner.
allowed_modules:
  - "my_company.internal_layer"
  - "sklearn.tree"

# 4. Model Whitelist (License Bypass)
# List of Repo IDs that are trusted. Veritensor will SKIP license checks for these.
# Supports Regex!
allowed_models:
  - "meta-llama/Meta-Llama-3-70B-Instruct"  # Exact match
  - "regex:^google-bert/.*"                 # Allow all BERT models from Google
  - "internal/my-private-model"

To generate a default configuration file, run: veritensor init


🧠 Threat Intelligence (Signatures)

Veritensor uses a decoupled signature database (signatures.yaml) to detect malicious patterns. This ensures that detection logic is separated from the core engine.

  • Automatic Updates: To get the latest threat definitions, simply upgrade the package:
    pip install --upgrade veritensor
  • Transparent Rules: You can inspect the default signatures in src/veritensor/engines/static/signatures.yaml.
  • Custom Policies: If the default rules are too strict for your use case (false positives), use veritensor.yaml to whitelist specific modules or models.

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

联系我们 contact @ memedata.com