![]() |
|
![]() |
| But tbh "better" is subjective here. Does the new LLM improve user interactions significantly? Seems like people get obsessed with shiny new models without asking if it’s actually adding value. |
![]() |
| Important distinction is that tincans is not speech to speech. It uses a separate turn/pause detection model and a text to speech final processing step. |
![]() |
| > there needs to be a tool/function calling step before a reply
I built that almost exactly a year ago :) it was good but not fast enough - hence building the joint model. |
![]() |
| Just trying to stay focused on launching first (https://docs.mixlayer.com) and keeping early customers happy, but would love to open source some of this work.
It'd probably be a separate crate from candle. If you haven't checked it out yet, mistral.rs implements some of these things (https://github.com/EricLBuehler/mistral.rs). Eric hasn't done multi-GPU inference yet, but I know it's on his roadmap. Not sure if it helped, but I shared an early version of my llama 3.1 implementation with him. |
![]() |
| It started the conversation by asking if I'd ever heard of the television show Cheers. Every subsequent interaction lead to it telling me more about Cheers. |
![]() |
| > Current AI (even GPT-4o) simply isn't capable enough to do useful stuff.
I'm loving all these wild takes about LLMs, meanwhile LLMs are doing useful things for me all day. |
![]() |
| For me as well… with constant human supervision. But if you try to build a business service, you need autonomy and exact rule following. We’re not there yet. |
![]() |
| We’ve finally managed to give our AI models existential dread, imposter syndrome and stress-driven personality quirks. The Singularity truly is here. Look on our works, ye Mighty, and despair! |
![]() |
| I literally said "hey how are you" and it immediately replied with something like "I've been reading a lot about the ongoing war in Ukraine" and it just escalated from there. Very strange experience! |
However, people here have been spoiled by incredibly good LLMs lately. And the responses that this model gives are nowhere need the high quality of SOTA models today in terms of content. It reminds me more of the 2019 LLMs we saw back in the day.
So I think you've done a "good enough" job on the audio side of things, and further focus should be entirely on the quality of the responses instead.