Show HN:在 JetBrains AI 助手里使用第三方 LLM API
Show HN: Use Third Party LLM API in JetBrains AI Assistant

原始链接: https://github.com/Stream29/ProxyAsLocalModel

这个Kotlin应用程序充当代理,使JetBrains AI Assistant能够通过模拟LM Studio和Ollama API来使用第三方LLM API(OpenAI、Claude、Qwen、Gemini等)。它解决了免费AI Assistant方案的限制以及在GraalVM原生镜像中使用官方Java SDK的挑战,采用Ktor和kotlinx.serialization实现了轻量级、无反射的实现。 该代理支持流式聊天完成API,并以可运行的fat JAR和GraalVM原生镜像两种方式分发,以实现跨平台兼容性和快速启动。一个YAML配置文件,包含用于编辑器支持的schema注释,允许用户轻松配置API密钥、模型和服务器设置。配置支持热重载,动态更新服务器。该项目突出了Kotlin函数式编程方法在GraalVM原生镜像中的优势,从而提高了启动速度并减少了内存占用。

这个Hacker News帖子讨论了“Show HN: 在JetBrains AI Assistant中使用第三方LLM API”这个项目,该项目允许用户将外部大型语言模型(LLM)与JetBrains AI Assistant集成。 用户分享了他们使用JetBrains AI assistant的经验,指出了其早期版本的不足,但也肯定了新版本的改进以及使用主流LLM的能力。讨论围绕着JetBrains的新系统Junie及其与Claude相比的性能展开。一些人发现它对代码审查、生成REST端点和探索库很有用,强调了尽管对AI炒作持怀疑态度,但其仍能提高生产力。 帖子中提到了OpenRouter等替代方案,并讨论了其成本结构。还建议使用LiteLLM Gateway和enchanted-ollama-openrouter-proxy等其他项目。该帖子还涉及到AI服务协议中竞业禁止条款以及对企业潜在的法律风险的担忧。此外,还讨论了像Cursor、Copilot等工具是否正在成为一些公司的必备工具,以及AI IDE工具是否应该成为核心IDE的核心组件。

原文

Proxy remote LLM API as Local model. Especially works for using custom LLM in JetBrains AI Assistant.

Powered by Ktor and kotlinx.serialization. Thanks to their no-reflex features.

Currently, JetBrains AI Assistant provides a free plan with very limited quotes. I tried out and my quote ran out quickly.

I already bought other LLM API tokens, such like Gemini and Qwen. So I started to think of using them in AI Assistant. Unfortunately, only local models from LM Studio and Ollama are supported. So I started to work on this proxy application that proxy third party LLM API as LM Studio and Ollama API so that I can use them in my JetBrains IDEs.

This is Just a simple task, so I started to use the official SDKs as clients and write a simple Ktor server that provides endpoints as LM Studio and Ollama. The problem appears when I try to distribute it as a GraalVM native image. The official Java SDKS uses too many dynamic features, making it hard to compile into a native image, even with a tracing agent. So I decided to implement a simple client of streaming chat completion API by myself with Ktor and kotlinx.serialization which are both no-reflex, functional and DSL styled.

As you can see, this application is distributed as a fat runnable jar and a GraalVM native image, which makes it cross-platform and fast to start.

The development of this application gives me confidence in Kotlin/Ktor/kotlinx.serialization. The Kotlin world uses more functional programming and less reflexion, which makes it more suitable for GraalVM native image, with faster startup and less memory usage.

Proxy from: OpenAI, Claude, DashScope(Alibaba Qwen), Gemini, Deepseek, Mistral, SiliconFlow.

Proxy as: LM Studio, Ollama.

Streaming chat completion API only.

This application is a proxy server, distributed as a fat runnable jar and a GraalVM native image (Windows x64).

Run the application, and you will see a help message:

2025-05-02 10:43:53 INFO  Help - It looks that you are starting the program for the first time here.
2025-05-02 10:43:53 INFO  Help - A default config file is created at your_path\config.yml with schema annotation.
2025-05-02 10:43:53 INFO  Config - Config file watcher started at your_path\config.yml
2025-05-02 10:43:53 INFO  LM Studio Server - LM Studio Server started at 1234
2025-05-02 10:43:53 INFO  Ollama Server - Ollama Server started at 11434
2025-05-02 10:43:53 INFO  Model List - Model list loaded with: []

Then you can edit the config file to set up your proxy server.

This config file is automatically hot-reloaded when you change it. Only the influenced parts of the server will be updated.

When first generating the config file, it will be created with schema annotations. This will bring completion and check in your editor.

# $schema: https://github.com/Stream29/ProxyAsLocalModel/raw/master/config_v2.schema.json
lmStudio:
  port: 1234 # This is default value
  enabled: true # This is default value
ollama:
  port: 11434 # This is default value
  enabled: true # This is default value
client:
  socketTimeout: 1919810 # Long.MAX_VALUE is default value, in milliseconds
  connectionTimeout: 1919810 # Long.MAX_VALUE is default value, in milliseconds
  requestTimeout: 1919810 # Long.MAX_VALUE is default value, in milliseconds
  retry: 3 # This is default value
  delayBeforeRetry: 1000 # This is default value, in milliseconds

apiProviders:
  OpenAI:
    type: OpenAi
    baseUrl: https://api.openai.com/v1
    apiKey: <your_api_key>
    modelList:
      - gpt-4o
  Claude:
    type: Claude
    apiKey: <your_api_key>
    modelList:
      - claude-3-7-sonnet
  Qwen:
    type: DashScope
    apiKey: <your_api_key>
    modelList: # This is default value
      - qwen-max
      - qwen-plus
      - qwen-turbo
      - qwen-long
  DeepSeek:
    type: DeepSeek
    apiKey: <your_api_key>
    modelList: # This is default value
      - deepseek-chat
      - deepseek-reasoner
  Mistral:
    type: Mistral
    apiKey: <your_api_key>
    modelList: # This is default value
      - codestral-latest
      - mistral-large
  SiliconFlow:
    type: SiliconFlow
    apiKey: <your_api_key>
    modelList:
      - Qwen/Qwen3-235B-A22B
      - Pro/deepseek-ai/DeepSeek-V3
      - THUDM/GLM-4-32B-0414
  Gemini:
    type: Gemini
    apiKey: <your_api_key>
    modelList:
      - gemini-2.5-flash-preview-04-17
联系我们 contact @ memedata.com