展示HN：Pyproc – 在无需CGO或微服务的情况下从Go调用Python

展示HN：Pyproc – 在无需CGO或微服务的情况下从Go调用Python
Show HN: Pyproc – Call Python from Go Without CGO or Microservices

原始链接: https://github.com/YuminosukeSato/pyproc

## pyproc：从 Go 调用 Python – 无缝衔接 pyproc 允许直接从 Go 代码调用 Python 函数，*无需* CGO、微服务或网络开销的复杂性。它利用 Unix 域套接字进行快速、隔离的进程间通信，绕过 Python 的全局解释器锁 (GIL)，实现真正的并行性。 **解决的问题：** Go 在性能方面表现出色，但通常需要 Python 来处理机器学习模型（PyTorch、TensorFlow）、数据科学（pandas、numpy）、遗留代码或仅 Python 的库。传统的解决方案，如 CGO，复杂且不稳定，而微服务会引入延迟和开销。 **主要特性：** 零网络延迟、进程隔离（崩溃不影响 Go）、真正的并行性、简单的部署（Go 二进制文件 + Python 脚本）、连接池以及类似函数的 API。 **目标用例：** 集成 Python 机器学习模型、使用 Python 库处理数据、处理适度的 RPS（1-5k）和小型负载，以及从 Python 微服务逐步迁移。非常适合 Kubernetes 同 pod 部署。 **限制：** 不适用于跨主机通信，Windows 支持有限，且未设计用于大规模机器学习服务或实时流。 **要求：** Linux/macOS，Go 1.22+，Python 3.9+。 **性能：** 在 8 个 worker 和 45μs p50 延迟下，可实现高达 200,000 req/s 的性能。包含内置的健康检查和监控。

## Pyproc：在 Go 中运行 Python – 本地且高效 Pyproc 是一个新的 Go 库，它使开发者能够直接从他们的 Go 服务调用 Python 函数，*无需* CGO 的开销或微服务的复杂性。它通过 Unix 域套接字实现这一点，提供低延迟、进程隔离和并行性。主要好处是可以在 Go 应用程序中重用现有的 Python 代码（NumPy、pandas、PyTorch 等），避免网络跳转和运营负担。设置快速 – 安装 Go 和 Python 包，定义一个简单的 Python 工作函数，并从 Go 调用它。目前仅限于 Linux/macOS 上的相同主机/pod 部署，Pyproc 最适合小于 100KB 的请求/响应负载。基准测试显示出令人印象深刻的性能（在 M1 Mac 上约为 45µs p50 延迟和 200k req/s）。该项目是开源的（Apache-2.0），并积极寻求关于 API 设计、错误处理以及潜在的编解码器/传输改进的反馈。代码和文档可在 [https://github.com/YuminosukeSato/pyproc](https://github.com/YuminosukeSato/pyproc) 找到。

原文

Run Python like a local function from Go — no CGO, no microservices.

🎯 Purpose & Problem Solved

Go excels at building high-performance web services, but sometimes you need Python:

Machine Learning Models: Your models are trained in PyTorch/TensorFlow
Data Science Libraries: You need pandas, numpy, scikit-learn
Legacy Code: Existing Python code that's too costly to rewrite
Python-Only Libraries: Some libraries only exist in Python ecosystem

Traditional solutions all have major drawbacks:

Solution	Problems
CGO + Python C API	Complex setup, crashes can take down entire Go service, GIL still limits performance
REST/gRPC Microservice	Network latency, deployment complexity, service discovery, more infrastructure
Shell exec	High startup cost (100ms+), no connection pooling, process management nightmare
Embedded Python	GIL bottleneck, memory leaks, difficult debugging

pyproc lets you call Python functions from Go as if they were local functions, with:

Zero network overhead - Uses Unix Domain Sockets for IPC
Process isolation - Python crashes don't affect your Go service
True parallelism - Multiple Python processes bypass the GIL
Simple deployment - Just your Go binary + Python scripts
Connection pooling - Reuse connections for high throughput

🎯 Target Audience & Use Cases

Perfect for teams who need to:

Integrate existing Python ML models (PyTorch, TensorFlow, scikit-learn) into Go services
Process data with Python libraries (pandas, numpy) from Go applications
Handle 1-5k RPS with JSON payloads under 100KB
Deploy on the same host/pod without network complexity
Migrate gradually from Python microservices to Go while preserving Python logic

Ideal deployment scenarios:

Kubernetes same-pod deployments with shared volume for UDS
Docker containers with shared socket volumes
Traditional server deployments on Linux/macOS

pyproc is NOT designed for:

Cross-host communication - Use gRPC/REST APIs for distributed systems
Windows UDS support - Windows named pipes are not supported
GPU management - Use dedicated ML serving frameworks (TensorRT, Triton)
Large-scale ML serving - Consider Ray Serve, MLflow, or KServe for enterprise ML
Real-time streaming - Use Apache Kafka or similar for high-throughput streams
Database operations - Use native Go database drivers directly

📋 Compatibility Matrix

Component	Requirements
Operating System	Linux, macOS (Unix Domain Sockets required)
Go Version	1.22+
Python Version	3.9+ (3.12 recommended)
Deployment	Same host/pod only
Container Runtime	Docker, containerd, any OCI-compatible
Orchestration	Kubernetes (same-pod), Docker Compose, systemd
Architecture	amd64, arm64

No CGO Required - Pure Go implementation using Unix Domain Sockets
Bypass Python GIL - Run multiple Python processes in parallel
Function-like API - Call Python functions as easily as pool.Call(ctx, "predict", input, &output)
Minimal Overhead - 45μs p50 latency, 200,000+ req/s with 8 workers
Production Ready - Health checks, graceful shutdown, automatic restarts
Easy Deployment - Single binary + Python scripts, no service mesh needed

🚀 Quick Start (5 minutes)

Go side:

go get github.com/YuminosukeSato/pyproc@latest

Python side:

pip install pyproc-worker

2. Create a Python Worker

# worker.py
from pyproc_worker import expose, run_worker

@expose
def predict(req):
    """Your ML model or Python logic here"""
    return {"result": req["value"] * 2}

if __name__ == "__main__":
    run_worker()

package main

import (
    "context"
    "fmt"
    "log"
    "github.com/YuminosukeSato/pyproc/pkg/pyproc"
)

func main() {
    // Create a pool of Python workers
    pool, err := pyproc.NewPool(pyproc.PoolOptions{
        Config: pyproc.PoolConfig{
            Workers:     4,  // Run 4 Python processes
            MaxInFlight: 10, // Max concurrent requests per worker
        },
        WorkerConfig: pyproc.WorkerConfig{
            SocketPath:   "/tmp/pyproc.sock",
            PythonExec:   "python3",
            WorkerScript: "worker.py",
        },
    }, nil)
    if err != nil {
        log.Fatal(err)
    }
    
    // Start all workers
    ctx := context.Background()
    if err := pool.Start(ctx); err != nil {
        log.Fatal(err)
    }
    defer pool.Shutdown(ctx)
    
    // Call Python function (automatically load-balanced)
    input := map[string]interface{}{"value": 42}
    var output map[string]interface{}
    
    if err := pool.Call(ctx, "predict", input, &output); err != nil {
        log.Fatal(err)
    }
    
    fmt.Printf("Result: %v\n", output["result"]) // Result: 84
}

That's it! You're now calling Python from Go without CGO or microservices.

Try the demo in this repo

If you cloned this repository, you can run a working end to end example without installing a Python package by using the bundled worker module.

This starts a Python worker from examples/basic/worker.py and calls it from Go. The example adjusts PYTHONPATH to import the local worker/python/pyproc_worker package.

📚 Detailed Usage Guide

go get github.com/YuminosukeSato/pyproc@latest

# Install from PyPI
pip install pyproc-worker

# Or install from source
cd worker/python
pip install -e .

cfg := pyproc.WorkerConfig{
    ID:           "worker-1",
    SocketPath:   "/tmp/pyproc.sock",
    PythonExec:   "python3",           // or path to virtual env
    WorkerScript: "path/to/worker.py",
    StartTimeout: 30 * time.Second,
    Env: map[string]string{
        "PYTHONUNBUFFERED": "1",
        "MODEL_PATH": "/models/latest",
    },
}

poolCfg := pyproc.PoolConfig{
    Workers:        4,                    // Number of Python processes
    MaxInFlight:    10,                   // Max concurrent requests per worker
    HealthInterval: 30 * time.Second,     // Health check frequency
}

Python Worker Development

from pyproc_worker import expose, run_worker

@expose
def add(req):
    """Simple addition function"""
    return {"result": req["a"] + req["b"]}

@expose
def multiply(req):
    """Simple multiplication"""
    return {"result": req["x"] * req["y"]}

if __name__ == "__main__":
    run_worker()

import pickle
from pyproc_worker import expose, run_worker

# Load model once at startup
with open("model.pkl", "rb") as f:
    model = pickle.load(f)

@expose
def predict(req):
    """Run inference on the model"""
    features = req["features"]
    prediction = model.predict([features])[0]
    confidence = model.predict_proba([features])[0].max()
    
    return {
        "prediction": int(prediction),
        "confidence": float(confidence)
    }

@expose
def batch_predict(req):
    """Batch prediction for efficiency"""
    features_list = req["batch"]
    predictions = model.predict(features_list)
    
    return {
        "predictions": predictions.tolist()
    }

if __name__ == "__main__":
    run_worker()

import pandas as pd
from pyproc_worker import expose, run_worker

@expose
def analyze_csv(req):
    """Analyze CSV data using pandas"""
    df = pd.DataFrame(req["data"])
    
    return {
        "mean": df.mean().to_dict(),
        "std": df.std().to_dict(),
        "correlation": df.corr().to_dict(),
        "summary": df.describe().to_dict()
    }

@expose
def aggregate_timeseries(req):
    """Aggregate time series data"""
    df = pd.DataFrame(req["data"])
    df['timestamp'] = pd.to_datetime(df['timestamp'])
    df.set_index('timestamp', inplace=True)
    
    # Resample to hourly
    hourly = df.resample('H').agg({
        'value': ['mean', 'max', 'min'],
        'count': 'sum'
    })
    
    return hourly.to_dict()

if __name__ == "__main__":
    run_worker()

func callPythonFunction(pool *pyproc.Pool) error {
    input := map[string]interface{}{
        "a": 10,
        "b": 20,
    }
    
    var output map[string]interface{}
    if err := pool.Call(context.Background(), "add", input, &output); err != nil {
        return fmt.Errorf("failed to call Python: %w", err)
    }
    
    fmt.Printf("Result: %v\n", output["result"])
    return nil
}

func callWithTimeout(pool *pyproc.Pool) error {
    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
    defer cancel()
    
    input := map[string]interface{}{"value": 42}
    var output map[string]interface{}
    
    if err := pool.Call(ctx, "slow_process", input, &output); err != nil {
        if err == context.DeadlineExceeded {
            return fmt.Errorf("Python function timed out")
        }
        return err
    }
    
    return nil
}

func processBatch(pool *pyproc.Pool, items []Item) ([]Result, error) {
    input := map[string]interface{}{
        "batch": items,
    }
    
    var output struct {
        Predictions []float64 `json:"predictions"`
    }
    
    if err := pool.Call(context.Background(), "batch_predict", input, &output); err != nil {
        return nil, err
    }
    
    results := make([]Result, len(output.Predictions))
    for i, pred := range output.Predictions {
        results[i] = Result{Value: pred}
    }
    
    return results, nil
}

func robustCall(pool *pyproc.Pool) {
    for retries := 0; retries < 3; retries++ {
        var output map[string]interface{}
        err := pool.Call(context.Background(), "predict", input, &output)
        
        if err == nil {
            // Success
            return
        }
        
        // Check if it's a Python error
        if strings.Contains(err.Error(), "ValueError") {
            // Invalid input, don't retry
            log.Printf("Invalid input: %v", err)
            return
        }
        
        // Transient error, retry with backoff
        time.Sleep(time.Duration(retries+1) * time.Second)
    }
}

FROM golang:1.21 AS builder
WORKDIR /app
COPY . .
RUN go build -o myapp .

FROM python:3.11-slim
RUN pip install pyproc-worker numpy pandas scikit-learn
COPY --from=builder /app/myapp /app/myapp
COPY worker.py /app/
WORKDIR /app
CMD ["./myapp"]

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: app
        image: myapp:latest
        env:
        - name: PYPROC_POOL_WORKERS
          value: "4"
        - name: PYPROC_SOCKET_DIR
          value: "/var/run/pyproc"
        volumeMounts:
        - name: sockets
          mountPath: /var/run/pyproc
      volumes:
      - name: sockets
        emptyDir: {}

logger := pyproc.NewLogger(pyproc.LoggingConfig{
    Level: "debug",
    Format: "json",
})

pool, _ := pyproc.NewPool(opts, logger)

health := pool.Health()
fmt.Printf("Workers: %d healthy, %d total\n", 
    health.HealthyWorkers, health.TotalWorkers)

// Expose Prometheus metrics
http.Handle("/metrics", promhttp.Handler())
http.ListenAndServe(":9090", nil)

Common Issues & Solutions

Issue: Worker won't start

# Check Python dependencies
python3 -c "from pyproc_worker import run_worker"

# Check socket permissions
ls -la /tmp/pyproc.sock

# Enable debug logging
export PYPROC_LOG_LEVEL=debug

// Increase worker count
poolCfg.Workers = runtime.NumCPU() * 2

// Pre-warm connections
pool.Start(ctx)
time.Sleep(1 * time.Second) // Let workers stabilize

# Add memory profiling to worker
import tracemalloc
tracemalloc.start()

@expose
def get_memory_usage(req):
    current, peak = tracemalloc.get_traced_memory()
    return {
        "current_mb": current / 1024 / 1024,
        "peak_mb": peak / 1024 / 1024
    }

Machine Learning Inference

@expose
def predict(req):
    model = load_model()  # Cached after first load
    features = req["features"]
    return {"prediction": model.predict(features)}

@expose
def process_dataframe(req):
    import pandas as pd
    df = pd.DataFrame(req["data"])
    result = df.groupby("category").sum()
    return result.to_dict()

@expose
def extract_pdf_text(req):
    import PyPDF2
    # Process PDF and return text
    return {"text": extracted_text}

┌─────────────┐           UDS            ┌──────────────┐
│   Go App    │ ◄──────────────────────► │ Python Worker│
│             │    Low-latency IPC        │              │
│  - HTTP API │                           │  - Models    │
│  - Business │                           │  - Libraries │
│  - Logic    │                           │  - Data Proc │
└─────────────┘                           └──────────────┘
     ▲                                           ▲
     │                                           │
     └──────────── Same Host/Pod ────────────────┘

Run benchmarks locally:

# Quick benchmark
make bench

# Full benchmark suite with memory profiling
make bench-full

Example results on M1 MacBook Pro:

BenchmarkPool/workers=1-10         10    235µs/op     4255 req/s
BenchmarkPool/workers=2-10         10    124µs/op     8065 req/s  
BenchmarkPool/workers=4-10         10     68µs/op    14706 req/s
BenchmarkPool/workers=8-10         10     45µs/op    22222 req/s

BenchmarkPoolParallel/workers=2-10   100    18µs/op    55556 req/s
BenchmarkPoolParallel/workers=4-10   100     9µs/op   111111 req/s
BenchmarkPoolParallel/workers=8-10   100     5µs/op   200000 req/s

BenchmarkPoolLatency-10            100   p50: 45µs  p95: 89µs  p99: 125µs

The benchmarks show near-linear scaling with worker count, demonstrating the effectiveness of bypassing Python's GIL through process-based parallelism.

pool, _ := pyproc.NewPool(pyproc.PoolOptions{
    Config: pyproc.PoolConfig{
        Workers:     4,
        MaxInFlight: 10,
    },
    WorkerConfig: pyproc.WorkerConfig{
        SocketPath:   "/tmp/pyproc.sock",
        PythonExec:   "python3",
        WorkerScript: "worker.py",
    },
}, nil)

ctx := context.Background()
pool.Start(ctx)
defer pool.Shutdown(ctx)

var result map[string]interface{}
pool.Call(ctx, "predict", input, &result)

gRPC Mode (coming in v0.4)

pool, _ := pyproc.NewPool(ctx, pyproc.PoolOptions{
    Protocol: pyproc.ProtocolGRPC(),
    // Unix domain socket with gRPC
})

Arrow IPC for Large Data (coming in v0.5)

pool, _ := pyproc.NewPool(ctx, pyproc.PoolOptions{
    Protocol: pyproc.ProtocolArrow(),
    // Zero-copy data transfer
})

🚀 Operational Standards

Metric	Target	Notes
Latency (p50)	< 100μs	Simple function calls
Latency (p99)	< 500μs	Including GC and process overhead
Throughput	1-5k RPS	Per service instance
Payload Size	< 100KB	JSON request/response
Worker Count	2-8 per CPU core	Based on workload type

Required Metrics:

Request latency (p50, p95, p99)
Request rate and error rate
Worker health status
Connection pool utilization
Python process memory usage

Health Check Endpoints:

// Built-in health check
health := pool.Health()
if health.HealthyWorkers < health.TotalWorkers/2 {
    log.Warn("majority of workers unhealthy")
}

Alerting Thresholds:

Worker failure rate > 5%
p99 latency > 1s
Memory growth > 500MB/hour
Connection pool exhaustion

Deployment Best Practices

Resource Limits:

resources:
  requests:
    memory: "256Mi"
    cpu: "200m"
  limits:
    memory: "1Gi" 
    cpu: "1000m"

Restart Policies:

Python worker restart on OOM or crash
Exponential backoff for failed restarts
Maximum 3 restart attempts per minute
Circuit breaker after 10 consecutive failures

Socket Management:

Use /tmp/sockets/ or shared volume in K8s
Set socket permissions 0660
Clean up sockets on graceful shutdown
Monitor socket file descriptors

We welcome contributions! Check out our "help wanted" issues to get started.

Apache 2.0 - See LICENSE for details.