(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=40345696

Pipecat 是一个开源项目,致力于创建使用语音输入开发对话式 AI 应用程序的框架。 创建者花了一年多的时间探索对话式人工智能开发的各个方面,包括原型设计、组织聚会和举办黑客马拉松。 在此过程中面临的挑战包括延迟问题、降噪技术、语音识别、管道数据和模型/服务切换。 为此,成立了一个团队来开发一个名为 Pipecat 的 Python 库,用于语音(以及可能的多模式)人工智能辅助。 应用范围从私人教练和客户服务机器人到儿童互动讲故事玩具和幽默的社交媒体机器人。 无论专业知识水平或使用的特定工具如何,都欢迎做出贡献。 如需了解更多信息或参与其中,请访问 GitHub 页面、提交拉取请求、加入 Discord 服务器并共享创意项目。

相关文章

原文
I've been obsessed for the past ~year with the possibilities of talking to LLMs. I built a bunch of one-off prototypes, shared code on X, started a Meetup group in SF, and co-hosted a big hackathon. It turns out that there are a few low-level problems that everybody building conversational/real-time AI needs to solve on the way to building/shipping something that works well: low-latency media transport, echo cancellation, voice activity detection, phrase endpointing, pipelining data between models/services, handling voice interruptions, swapping out different models/services.

On the theory that something like a LlamaIndex or LangChain for real-time/conversational AI would be useful, a few of us started working on a Python library for voice (and multimodal) AI assistants/agents.

So ... Pipecat: a framework for building things like personal coaches, meeting assistants, story-telling toys for kids, customer support bots, virtual friends, and snarky social bots.

Most of the core contributors to Pipecat so far work together at our day jobs. This has been a kind of "20% time" thing at our company. But we're serious about welcoming all contributions. We want Pipecat to support any and all models, services, transport layers, and infrastructure tooling. If you're interested in this stuff, please check it out and let us know what you think. Submit PRs. Become a maintainer. Join the Discord. Post cool stuff. Post funny stuff when your voice agent goes completely off the rails (as mine sometimes do).

联系我们 contact @ memedata.com