展示HN：Linnix – 预测故障发生的eBPF可观测性

展示HN：Linnix – 预测故障发生的eBPF可观测性
Show HN: Linnix – eBPF observability that predicts failures before they happen

原始链接: https://github.com/linnix-os/linnix

## Linnix：基于AI的Linux可观测性 Linnix是一个开源的、基于eBPF的Linux可观测性工具，能够深入了解系统行为，并可选地提供基于AI的事件检测。它以极小的开销（<1% CPU）捕获每个进程事件（fork、exec、exit），远低于传统代理。 Linnix可以独立运行，内置规则引擎用于检测诸如fork风暴和CPU尖峰等问题，*并且*可以通过AI增强，以自然语言解释事件——例如“cron作业中出现fork风暴。添加速率限制…”。现在可以下载一个2.1GB量化的`linnix-3b`模型。 **主要特性：** * **低开销：** eBPF最大限度地减少性能影响。 * **基于AI（可选）：** 自然语言洞察，加速根本原因分析。 * **经济高效：** 比Datadog等替代方案便宜60-80%。 * **开源：** Apache-2.0许可，自带LLM。 * **生产就绪：** 在多节点集群上测试过（内核5.8+）。 * **快速设置：** 一条命令安装，5分钟即可准备就绪。 Linnix提供Web仪表盘、API访问和实时指标。它本质上是“进程生命周期Prometheus + AI推理”。它被设计为灵活的，允许用户在基于规则的方法和利用AI进行更深入理解之间进行选择。

## Linnix：基于 eBPF 和 LLM 的预测性可观察性 Linnix 是一款新的、本地运行的可观察性工具，使用 Rust 和 eBPF 构建，旨在*预测* Linux 系统故障，防止中断发生。与被动响应问题的传统监控不同，Linnix 使用本地大型语言模型分析内核级数据——特别是进程行为——以识别异常模式，例如不寻常的内存分配。开发者因对事故警报过晚而感到沮丧而创建了 Linnix，旨在在进程崩溃*之前*捕获内存泄漏等问题。它目前支持 Linux 5.8+、Docker/Kubernetes，并将数据导出到 Prometheus。虽然前景可观，但早期反馈强调了项目缺乏完善和测试的问题，一位评论员质疑文档中未对齐的图表和过于大胆的声明所受到的审查程度。该项目是开源的（Apache 2.0），作者正在寻求关于有用的故障场景的反馈，以便进行针对性改进。

原文

eBPF-powered Linux observability with AI incident detection

Linnix captures every process fork, exec, and exit with lightweight CPU/memory telemetry using eBPF. Works standalone with built-in rules engine, or add AI for natural language insights.

✨ NEW: linnix-3b model now available! Download the 2.1GB quantized model from Releases or use the automated setup script.

Traditional monitoring tells you "CPU is high". Linnix tells you WHY and WHAT TO DO.

⚡ Zero Overhead: <1% CPU usage with eBPF probes (vs 5-15% for traditional agents)
🧠 AI-Powered (Optional): Natural language insights - "Fork storm in cron job. Add rate limit to /etc/cron.d/backup"
🎯 Works Without AI: Built-in rules engine detects incidents out-of-the-box
💰 Cost-Effective: 60-80% cheaper than Datadog or Dynatrace, runs on your infrastructure
🔓 Open Source: Apache-2.0 license, no vendor lock-in, BYO LLM (or none)
🚀 Production-Ready: Battle-tested on multi-node clusters, kernel 5.8+

Feature	Linnix (OSS)	Prometheus + Grafana	Datadog	Elastic APM
Setup Time	5 minutes	2-3 hours	30 minutes	1-2 hours
CPU Overhead	<1% (eBPF)	2-5% (exporters)	5-15% (agent)	10-20% (APM)
Instrumentation	Zero	Manual exporters	Agent install	Code changes
AI Insights	✅ Built-in	❌ No	⚠️ Paid add-on	❌ No
Incident Detection	✅ Auto	⚠️ Manual rules	✅ ML (paid)	⚠️ Manual alerts
Cost (10 nodes)	$0	~$50/mo hosting	~$1,500/mo	~$1,000/mo
Data Privacy	✅ Your infra	✅ Your infra	❌ Vendor cloud	⚠️ Self-host option
BYO LLM	✅ Any model	N/A	❌ No	❌ No

Bottom line: We're Prometheus for process lifecycle + AI reasoning layer. Use both!

💡 Note: AI is optional! Linnix works out-of-the-box with its built-in rules engine for detecting fork storms, CPU spikes, and runaway processes. Add AI later for natural language explanations.

🎯 One-Command Setup (New!)

# Complete eBPF monitoring with AI - ready in 5 minutes
git clone https://github.com/linnix-os/linnix.git && cd linnix
./setup-llm.sh

# Then open: http://localhost:8080 (Web Dashboard)

What you get instantly:

✅ Web Dashboard: Real-time visualization at http://localhost:8080
✅ eBPF Monitoring: Every process event captured with <1% overhead
✅ AI Insights: 3B model analyzes incidents every 30 seconds
✅ Live Metrics: Process tree, CPU usage, system overview
✅ Zero Config: Works out of the box, all data local

After running ./setup-llm.sh, you'll have:

Web Dashboard (http://localhost:8080) - Beautiful real-time UI
API Access (http://localhost:3000) - REST endpoints for integration
AI Analysis - Automatic incident detection with explanations
Live Events - Real-time process monitoring stream

Quick Health Check:

curl http://localhost:3000/healthz  # eBPF daemon
curl http://localhost:8090/health   # AI model  
curl http://localhost:3000/insights | jq  # Get AI insights

What it does:

Downloads TinyLlama model (800MB) or linnix-3b (2.1GB)
Starts cognitod (eBPF daemon) + llama-server (AI inference)
Runs health checks
Ready for AI insights in < 5 minutes!

🐳 Docker without AI (Rules Engine Only)

git clone https://github.com/linnix-os/linnix.git && cd linnix
docker-compose up -d

# Stream live process events
curl -N http://localhost:3000/stream

# Get incident alerts from rules engine
curl http://localhost:3000/insights | jq

✅ No AI/LLM required | ✅ No Rust toolchain required | ✅ Works on any Linux | ✅ < 1% CPU overhead

# 1. Install cognitod
curl -sfL https://raw.githubusercontent.com/linnix-os/linnix/main/scripts/install.sh | sh

# 2. Start monitoring
sudo systemctl start cognitod

# 3. Stream live events
linnix-cli stream

# 4. Get AI insights
export LLM_ENDPOINT="http://localhost:8090/v1/chat/completions"
export LLM_MODEL="linnix-3b-distilled"
linnix-reasoner --insights

┌──────────────────────────────────────────────────────────────┐
│                    Kernel Space (eBPF)                       │
├──────────────────────────────────────────────────────────────┤
│  fork hook  →  exec hook  →  exit hook  →  CPU/mem sampling │
└────────────────────────┬─────────────────────────────────────┘
                         │ Perf buffers
                         ▼
┌──────────────────────────────────────────────────────────────┐
│                   User Space (cognitod)                      │
├──────────────────────────────────────────────────────────────┤
│  • Event processing    • Process tree tracking               │
│  • State management    • Rules engine                        │
│  • HTTP/SSE API        • Prometheus metrics                  │
└────────────────────────┬─────────────────────────────────────┘
                         │
         ┌───────────────┼───────────────┐
         │               │               │
         ▼               ▼               ▼
   ┌─────────┐    ┌──────────┐   ┌─────────────┐
   │ CLI     │    │ Reasoner │   │ Prometheus  │
   │ Stream  │    │ AI       │   │ Grafana     │
   └─────────┘    └──────────┘   └─────────────┘

Linnix provides comprehensive eBPF-based monitoring with optional AI-powered incident detection:

✅ eBPF monitoring - Kernel-level process lifecycle tracking
✅ Real-time event streaming - SSE endpoints for live data
✅ Process tree tracking - Full ancestry and lineage graphs
✅ CPU/memory telemetry - Lightweight resource monitoring
✅ Local rules engine - Detects fork storms, CPU spikes, runaway processes (no AI needed)
✅ Prometheus integration - Standard metrics export
✅ LLM inference (optional) - Bring your own model for natural language insights (OpenAI, local, etc.)
✅ Training examples - 50+ curated incident samples included

docker run -d \
  --name cognitod \
  --privileged \
  --pid=host \
  --network=host \
  -v /sys/kernel/btf:/sys/kernel/btf:ro \
  -v /sys/kernel/debug:/sys/kernel/debug:ro \
  linnixos/cognitod:latest

Ubuntu/Debian:

wget https://github.com/linnix-os/linnix/releases/latest/download/cognitod_amd64.deb
sudo dpkg -i cognitod_amd64.deb
sudo systemctl start cognitod

RHEL/CentOS:

wget https://github.com/linnix-os/linnix/releases/latest/download/cognitod.rpm
sudo rpm -i cognitod.rpm
sudo systemctl start cognitod

# Install Rust (if not already installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Clone repository
git clone https://github.com/linnix-os/linnix.git
cd linnix

# Build eBPF programs
cargo xtask build-ebpf

# Build and install
cargo build --release
sudo cp target/release/cognitod /usr/local/bin/
sudo cp target/release/linnix-cli /usr/local/bin/
sudo cp target/release/linnix-reasoner /usr/local/bin/

Full documentation: GitHub docs/

Cognitod exposes a REST API on port 3000:

GET /health - Health check
GET /metrics - Prometheus metrics
GET /processes - All live processes
GET /graph/:pid - Process ancestry graph
GET /stream - Server-sent events (real-time)
GET /insights - AI-generated insights
GET /alerts - Active alerts from rules engine

For API examples, see cognitod/examples/.

Linnix works with any OpenAI-compatible LLM endpoint:

🎁 Demo Model (Included)

We provide a distilled 3B model optimized for CPU inference:

# Download demo model (2.1GB)
wget https://github.com/linnix-os/linnix/releases/download/v0.1.0/linnix-3b-distilled-q5_k_m.gguf

# Serve with llama.cpp
./serve_distilled_model.sh  # Starts on port 8090

# Or manually:
llama-server -m linnix-3b-distilled-q5_k_m.gguf \
  --port 8090 --ctx-size 4096 -t 8

# Test the model
export LLM_ENDPOINT="http://localhost:8090/v1/chat/completions"
export LLM_MODEL="linnix-3b-distilled"
linnix-reasoner --insights

Performance: 12.78 tok/s on CPU (no GPU required!)

# Option 1: Local model with llama.cpp
./llama-server -m qwen2.5-7b-instruct-q5_k_m.gguf --port 8090

# Option 2: vLLM
vllm serve Qwen/Qwen2.5-7B-Instruct --port 8090

# Option 3: Ollama
ollama serve qwen2.5:7b

# Configure endpoint
export LLM_ENDPOINT="http://localhost:8090/v1/chat/completions"
export LLM_MODEL="qwen2.5-7b"

# Get insights
linnix-reasoner --insights

You can also use commercial APIs (OpenAI, Anthropic, etc.) by pointing to their endpoints.

Create /etc/linnix/linnix.toml:

[runtime]
offline = false  # Set true to disable external HTTP calls

[telemetry]
sample_interval_ms = 1000  # CPU/memory sampling frequency

[rules]
enabled = true
config_path = "/etc/linnix/rules.yaml"

[api]
bind_address = "127.0.0.1:3000"

[llm]
endpoint = "http://localhost:8090/v1/chat/completions"
model = "qwen2.5-7b"
timeout_secs = 120

Stream events in real-time

# CLI streaming
linnix-cli stream

# Or use curl with SSE
curl -N http://localhost:3000/stream

# For a specific PID
curl http://localhost:3000/graph/1234 | jq .

# All processes
curl http://localhost:3000/processes | jq .

# Get AI-generated insights
linnix-reasoner --insights

# Output:
# {
#   "summary": "System experiencing high CPU due to fork storm...",
#   "risks": ["cpu_spin", "fork_storm"]
# }

Edit /etc/linnix/rules.yaml:

rules:
  - name: fork_storm
    condition: "forks_per_sec > 100"
    severity: critical
    actions:
      - alert
      - log

  - name: cpu_spike
    condition: "process.cpu_percent > 95 AND duration > 60"
    severity: warning
    actions:
      - alert

We love contributions! Here's how to get started:

Fork the repository
Create a feature branch (git checkout -b feat/amazing-feature)
Make your changes
Run tests (cargo test --workspace)
Commit (git commit -m 'Add amazing feature')
Push (git push origin feat/amazing-feature)
Open a Pull Request

See CONTRIBUTING.md for detailed guidelines.

# Clone repo
git clone https://github.com/linnix-os/linnix.git
cd linnix

# Install dependencies
cargo build --workspace

# Build eBPF programs
cargo xtask build-ebpf

# Run tests
cargo test --workspace

# Run clippy
cargo clippy --all-targets -- -D warnings

Found a bug? Please open an issue with:

Your OS and kernel version
Cognitod version (cognitod --version)
Steps to reproduce
Expected vs actual behavior

Linnix is licensed under the Apache License 2.0.

See LICENSE for details.

Linnix uses several open source libraries. See THIRD_PARTY_LICENSES for details.

The eBPF programs in linnix-ai-ebpf/linnix-ai-ebpf-ebpf/ are dual-licensed under GPL-2.0 OR MIT (eBPF programs must be GPL-compatible).

If you find Linnix useful, please star the repo! It helps us grow the community.

If Linnix helps you catch production incidents, add this badge to your README:

[![Powered by Linnix](https://img.shields.io/badge/Powered%20by-Linnix-00C9A7?style=flat&logo=linux&logoColor=white)](https://github.com/linnix-os/linnix)

Linnix is built on the shoulders of giants:

Aya - Rust eBPF framework
Tokio - Async runtime
Axum - Web framework
BTF - BPF Type Format

Special thanks to the eBPF community for making kernel observability accessible!

If you use Linnix in research, please cite:

@software{linnix2025,
  author = {Shah, Parth},
  title = {Linnix: eBPF-powered Linux observability with AI},
  year = {2025},
  url = {https://github.com/linnix-os/linnix}
}

Made with ❤️ by the Linnix team

GitHub • Docs • Blog • Twitter