SymbolcAI:LLMs的神经象征视角
SymbolicAI: A neuro-symbolic perspective on LLMs

原始链接: https://github.com/ExtensityAI/symbolicai

SymbolcAI是一个Python框架,通过神经符号方法将经典编程与LLM的强大功能相结合。它以句法(字面值)和语义(上下文理解)风格的“符号”对象为特征,易于在它们之间切换。原语和契约是核心概念,通过契约设计原则实现可组合操作并确保LLM输出的可靠性。 配置管理是基于优先级的,使用调试、环境和全局位置的`symai.config.json `、`symsh.config.json `和`symserver.config.json `。OpenAI、Anthropic、WolframAlpha等引擎的API密钥可以通过环境变量或配置文件设置。该框架通过数据收集(选择加入)支持社区驱动的模型训练。该项目获得BSD-3-Clause许可,欢迎捐款、捐赠和投资者。

这条黑客新闻讨论了SymbolcAI,一个由futurisold创建的LLM神经符号框架([email protected]).该工具允许使用LLM对符号进行语义操纵,从而实现上下文比较和逻辑推理等任务。用户发现interpret()函数功能强大。 Futurisold详细介绍了它在构建深度研究代理和使用链式合约的端到端文档生成系统中的应用,其中LLM如果履行相同的合约,则功能等效。他强调了函数式编程原则在框架设计中的重要性。 ExtensityAI团队成员xpitfire共享基准测试结果、插件系统、扩展方法、符号到子符号的转换以及“RickShell”,这是一个可定制的shell。他还链接了人工智能生成的研究论文的例子,包括一篇关于“三体问题”的论文。他提供了一个开源的、类似MCP的API端点服务的链接。 其他用户讨论了与进化和情感AI的潜在集成,与Wolfram Mathematica的比较,以及genaiscript和SynesthesiaLisp等类似项目。
相关文章

原文


SymbolicAI is a neuro-symbolic framework, combining classical Python programming with the differentiable, programmable nature of LLMs in a way that actually feels natural in Python. It's built to not stand in the way of your ambitions. It's easily extensible and customizable to your needs by virtue of its modular design. It's quite easy to write your own engine, host locally an engine of your choice, or interface with tools like web search or image generation. To keep things concise in this README, we'll introduce two key concepts that define SymbolicAI: primitives and contracts.

At the core of SymbolicAI are Symbol objects—each one comes with a set of tiny, composable operations that feel like native Python.

Symbol comes in two flavours:

  1. Syntactic – behaves like a normal Python value (string, list, int ‐ whatever you passed in).
  2. Semantic – is wired to the neuro-symbolic engine and therefore understands meaning and context.

Why is syntactic the default? Because Python operators (==, ~, &, …) are overloaded in symai. If we would immediately fire the engine for every bitshift or comparison, code would be slow and could produce surprising side-effects. Starting syntactic keeps things safe and fast; you opt-in to semantics only where you need them.

How to switch to the semantic view

  1. At creation time

    S = Symbol("Cats are adorable", semantic=True) # already semantic
    print("feline" in S) # => True
  2. On demand with the .sem projection – the twin .syn flips you back:

    S = Symbol("Cats are adorable") # default = syntactic
    print("feline" in S.sem) # => True
    print("feline" in S)     # => False
  3. Invoking dot-notation operations—such as .map() or any other semantic function—automatically switches the symbol to semantic mode:

     S = Symbol(['apple', 'banana', 'cherry', 'cat', 'dog'])
     print(S.map('convert all fruits to vegetables'))
     # => ['carrot', 'broccoli', 'spinach', 'cat', 'dog']

Because the projections return the same underlying object with just a different behavioural coat, you can weave complex chains of syntactic and semantic operations on a single symbol. Think of them as your building blocks for semantic reasoning. Right now, we support a wide range of primitives; check out the docs here, but here's a quick snack:

Primitive/Operator Category Syntactic Semantic Description
== Comparison Tests for equality. Syntactic: literal match. Semantic: fuzzy/conceptual equivalence (e.g. 'Hi' == 'Hello').
+ Arithmetic Syntactic: numeric/string/list addition. Semantic: meaningful composition, blending, or conceptual merge.
& Logical/Bitwise Syntactic: bitwise/logical AND. Semantic: logical conjunction, inference, e.g., context merge.
symbol[index] = value Iteration Set item or slice.
.startswith(prefix) String Helper Check if a string starts with given prefix (in both modes).
.choice(cases, default) Pattern Matching Select best match from provided cases.
.foreach(condition, apply) Execution Control Apply action to each element.
.cluster(**clustering_kwargs?) Data Clustering Cluster data into groups semantically. (uses sklearn's DBSCAN)
.similarity(other, metric?, normalize?) Embedding Compute similarity between embeddings.
... ... ... ... ...

They say LLMs hallucinate—but your code can't afford to. That's why SymbolicAI brings Design by Contract principles into the world of LLMs. Instead of relying solely on post-hoc testing, contracts help build correctness directly into your design, everything packed into a decorator that will operate on your defined data models and validation constraints:

from symai import Expression
from symai.strategy import contract
from symai.models import LLMDataModel # Compatible with Pydantic's BaseModel
from pydantic import Field, field_validator

# ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
#  Data models                                              ▬
#  – clear structure + rich Field descriptions power        ▬
#    validation, automatic prompt templating & remedies     ▬
# ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
class DataModel(LLMDataModel):
    some_field: some_type = Field(description="very descriptive field", and_other_supported_options_here="...")

    @field_validator('some_field')
    def validate_some_field(cls, v):
        # Custom basic validation logic can be added here too besides pre/post
        valid_opts = ['A', 'B', 'C']
        if v not in valid_sizes:
            raise ValueError(f'Must be one of {valid_sizes}, got "{v}".')
        return v

# ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
#  The contracted expression class                          ▬
# ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
@contract(
    # ── Remedies ─────────────────────────────────────────── #
    pre_remedy=True,        # Try to fix bad inputs automatically
    post_remedy=True,       # Try to fix bad LLM outputs automatically
    accumulate_errors=True, # Feed history of errors to each retry
    verbose=True,           # Nicely displays progress in terminal
    remedy_retry_params=dict(tries=3, delay=0.4, max_delay=4.0,
                             jitter=0.15, backoff=1.8, graceful=False),
)
class Agent(Expression):
    #
    # High-level behaviour:
    #  *. `prompt` – a *static* description of what the LLM must do (mandatory)
    #  1. `pre`    – sanity-check inputs (optional)
    #  2. `act`    – mutate state (optional)
    #  3. LLM      – generate expected answer (handled by SymbolicAI engine)
    #  4. `post`   – ensure answer meets semantic rules (optional)
    #  5. `forward` (mandatory)
    #     • if contract succeeded → return type validated LLM object
    #     • else                  → graceful fallback answer
    # ...

Because we don't want to bloat this README file with long Python snippets, learn more about contracts here and here.

To get started with SymbolicAI, you can install it using pip:

Setting up a neurosymbolic API Key

Before using SymbolicAI, you need to set up API keys for the various engines. Currently, SymbolicAI supports the following neurosymbolic engines through API: OpenAI, Anthropic. We also support {doc}local neurosymbolic engines <ENGINES/local_engine>, such as llama.cpp and huggingface.

# Linux / MacOS
export NEUROSYMBOLIC_ENGINE_API_KEY=""
export NEUROSYMBOLIC_ENGINE_MODEL=""
# Windows (PowerShell)
$Env:NEUROSYMBOLIC_ENGINE_API_KEY=""
$Env:NEUROSYMBOLIC_ENGINE_MODEL=""
# Jupyter Notebooks
%env NEUROSYMBOLIC_ENGINE_API_KEY=…
%env NEUROSYMBOLIC_ENGINE_MODEL=…

SymbolicAI uses multiple engines to process text, speech and images. We also include search engine access to retrieve information from the web. To use all of them, you will need to also install the following dependencies and assign the API keys to the respective engines.

pip install "symbolicai[wolframalpha]"
pip install "symbolicai[whisper]"
pip install "symbolicai[selenium]"
pip install "symbolicai[serpapi]"
pip install "symbolicai[pinecone]"

Or, install all optional dependencies at once:

pip install "symbolicai[all]"

And export the API keys, for example:

# Linux / MacOS
export SYMBOLIC_ENGINE_API_KEY="<WOLFRAMALPHA_API_KEY>"
export SEARCH_ENGINE_API_KEY="<SERP_API_KEY>"
export OCR_ENGINE_API_KEY="<APILAYER_API_KEY>"
export INDEXING_ENGINE_API_KEY="<PINECONE_API_KEY>"

# Windows (PowerShell)
$Env:SYMBOLIC_ENGINE_API_KEY="<WOLFRAMALPHA_API_KEY>"
$Env:SEARCH_ENGINE_API_KEY="<SERP_API_KEY>"
$Env:OCR_ENGINE_API_KEY="<APILAYER_API_KEY>"
$Env:INDEXING_ENGINE_API_KEY="<PINECONE_API_KEY>"

See below for the entire list of keys that can be set via environment variables or a configuration file.

SpeechToText Engine: Install ffmpeg for audio processing (based on OpenAI's whisper)

# Linux
sudo apt update && sudo apt install ffmpeg

# MacOS
brew install ffmpeg

# Windows
choco install ffmpeg

WebCrawler Engine: For selenium, we automatically install the driver with chromedriver-autoinstaller. Currently we only support Chrome as the default browser.

SymbolicAI now features a configuration management system with priority-based loading. The configuration system looks for settings in three different locations, in order of priority:

  1. Debug Mode (Current Working Directory)

    • Highest priority
    • Only applies to symai.config.json
    • Useful for development and testing
  2. Environment-Specific Config (Python Environment)

    • Second priority
    • Located in {python_env}/.symai/
    • Ideal for project-specific settings
  3. Global Config (Home Directory)

    • Lowest priority
    • Located in ~/.symai/
    • Default fallback for all settings

The system manages three main configuration files:

  • symai.config.json: Main SymbolicAI configuration
  • symsh.config.json: Shell configuration
  • symserver.config.json: Server configuration

Viewing Your Configuration

Before using the package, we recommend inspecting your current configuration setup using the command below. This will create all the necessary configuration files.

This command will show:

  • All configuration locations
  • Active configuration paths
  • Current settings (with sensitive data truncated)

Configuration Priority Example

my_project/              # Debug mode (highest priority)
└── symai.config.json    # Only this file is checked in debug mode

{python_env}/.symai/     # Environment config (second priority)
├── symai.config.json
├── symsh.config.json
└── symserver.config.json

~/.symai/                # Global config (lowest priority)
├── symai.config.json
├── symsh.config.json
└── symserver.config.json

If a configuration file exists in multiple locations, the system will use the highest-priority version. If the environment-specific configuration is missing or invalid, the system will automatically fall back to the global configuration in the home directory.

  • Use the global config (~/.symai/) for your default settings
  • Use environment-specific configs for project-specific settings
  • Use debug mode (current directory) for development and testing
  • Run symconfig to inspect your current configuration setup

This addition to the README clearly explains:

  1. The priority-based configuration system
  2. The different configuration locations and their purposes
  3. How to view and manage configurations
  4. Best practices for configuration management

You can specify engine properties in a symai.config.json file in your project path. This will replace the environment variables. Example of a configuration file with all engines enabled:

{
    "NEUROSYMBOLIC_ENGINE_API_KEY": "<OPENAI_API_KEY>",
    "NEUROSYMBOLIC_ENGINE_MODEL": "gpt-4o",
    "SYMBOLIC_ENGINE_API_KEY": "<WOLFRAMALPHA_API_KEY>",
    "SYMBOLIC_ENGINE": "wolframalpha",
    "EMBEDDING_ENGINE_API_KEY": "<OPENAI_API_KEY>",
    "EMBEDDING_ENGINE_MODEL": "text-embedding-3-small",
    "SEARCH_ENGINE_API_KEY": "<PERPLEXITY_API_KEY>",
    "SEARCH_ENGINE_MODEL": "sonar",
    "TEXT_TO_SPEECH_ENGINE_API_KEY": "<OPENAI_API_KEY>",
    "TEXT_TO_SPEECH_ENGINE_MODEL": "tts-1",
    "INDEXING_ENGINE_API_KEY": "<PINECONE_API_KEY>",
    "INDEXING_ENGINE_ENVIRONMENT": "us-west1-gcp",
    "DRAWING_ENGINE_API_KEY": "<OPENAI_API_KEY>",
    "DRAWING_ENGINE_MODEL": "dall-e-3",
    "VISION_ENGINE_MODEL": "openai/clip-vit-base-patch32",
    "OCR_ENGINE_API_KEY": "<APILAYER_API_KEY>",
    "SPEECH_TO_TEXT_ENGINE_MODEL": "turbo",
    "SUPPORT_COMMUNITY": true
}

With these steps completed, you should be ready to start using SymbolicAI in your projects.

❗️NOTE❗️Our framework allows you to support us train models for local usage by enabling the data collection feature. On application startup we show the terms of services and you can activate or disable this community feature. We do not share or sell your data to 3rd parties and only use the data for research purposes and to improve your user experience. To change this setting open the symai.config.json and turn it on/off by setting the SUPPORT_COMMUNITY property to True/False via the config file or the respective environment variable.

❗️NOTE❗️By default, the user warnings are enabled. To disable them, export SYMAI_WARNINGS=0 in your environment variables.

Some examples of running tests locally:

# Run all tests
pytest tests
# Run mandatory tests
pytest -m mandatory

Be sure to have your configuration set up correctly before running the tests. You can also run the tests with coverage to see how much of the code is covered by tests:

pytest --cov=symbolicai tests

Now, there are tools like DeepWiki that provide better documentation than we could ever write, and we don’t want to compete with that; we'll correct it where it's plain wrong. Please go read SymbolicAI's DeepWiki page. There's a lot of interesting stuff in there. Last but not least, check out our paper that describes the framework in detail. If you like watching videos, we have a series of tutorials that you can find here.

@software{Dinu_SymbolicAI_2022,
  author = {Dinu, Marius-Constantin},
  editor = {Leoveanu-Condrei, Claudiu},
  title = {{SymbolicAI: A Neuro-Symbolic Perspective on Large Language Models (LLMs)}},
  url = {https://github.com/ExtensityAI/symbolicai},
  month = {11},
  year = {2022}
}

This project is licensed under the BSD-3-Clause License - refer to the docs.

If you appreciate this project, please leave a star ⭐️ and share it with friends and colleagues. To support the ongoing development of this project even further, consider donating. Thank you!

Donate

We are also seeking contributors or investors to help grow and support this project. If you are interested, please reach out to us.

Feel free to contact us with any questions about this project via email, through our website, or find us on Discord: Discord

联系我们 contact @ memedata.com