（评论）

（评论）
(comments)

原始链接: https://news.ycombinator.com/item?id=40916326

本文讨论了一个开发人员的项目，旨在通过 Web 实时通信 (WebRTC) 改进低延迟共享和协同流式传输。开发人员致力于鼓励 Discord、Google Meet 和 Microsoft Teams 等流行平台采用 WebRTC 的交互式播放切换协议 (WHIP) 和无线接口代理 (WIP)，为用户提供更多功能。此外，作者希望强调跨各种编程语言支持 WebRTC 的现有生态系统。由于一家公司声称只有他们的专有协议才能实现低延迟流媒体，因此开发人员感到沮丧。作为回应，开发人员决定创建该技术的开源版本，称为 Broadcast Box。该项目旨在简化设置低延迟共享和联合流媒体的过程，特别是对于游戏玩家而言。该项目的一个关键方面是更轻松地共同串流，允许用户将房间链接到其 OBS（开放广播软件）实例，自动显示参与者的视频，而无需进行大量配置。然而，开发人员承认，在 Wayland 兼容性以及将本地化前向纠错 (L4S) 与现有解决方案集成的方法方面，仍有一些挑战需要克服。最后，作者邀请他们在 GitHub 页面或讨论中发表评论、问题或建议。

I wrote this to solve a few things I cared about.

* I want to show people that native WebRTC players can be a thing. I hope this encourages hangouts/discord/$x to implement WHIP and WHEP it would let people do so much more

* I wanted to make low latency sharing easier. I saw the need for this working on adding WebRTC to OBS and Broadcast Box[0]

* I wanted to show devs what a great ecosystem exists for WebRTC. Lots of great implementations in different languages.

* Was a bit of a ‘frustration project’. I saw a company claiming only their proprietary protocol can do latency this low. So I thought ‘screw you I will make an open source version!’

[0] https://github.com/glimesh/broadcast-box

Hey Sean, we both worked at Twitch Video but I left just as you were joining. I currently work on the Discord video stack and am somewhat curious about how you imagine Discord leveraging WHIP/WHEP. Do you see it as a way for these clients to broadcast outwards to services like Twitch or more as an interoperability tool?

Users what to send WHIP into discord. The lack of control on screen sharing today is frustrating. Users want to capture via another tool and control bitrate/resolution.

Most Broadcast Box users tell me that’s their reason for switching off discord.

———

With WHEP I want to see easier co-streaming. I should be able to connect a room to my OBS instance and everyone’s video auto show up.

I don’t have this figured out yet. Would love your opinion and feedback. Wanna comment on the doc or would love to talk 1:1 ! siobud.com/meeting

What's the plan for Wayland compatibility? For a little while I was able to share a single app - but not the full desktop. Now I can't share anything from Ubuntu 24.04 when using Wayland :(

This would be fabulous, thank you so much for working on that. What kind of latency does dual encoding (on client then on receiver again) adds? Are there codecs that can have multiple streams on the same image (as in zones of independent streams on the video surface)?

It definitely adds latency, not enough to be a bad experience.

We have vdo.ninja today and Twitch's Stream Together. Those both do the 'dual encoding' and it is a good enough experience that users are doing it!

Question, 30ms latency sounds amazing but how does it actually compare to "the standard" sharing tools for desktops, like do you know what the latency on say MSRDP is as comparison or VNC?

I doubt the protocol itself makes a big difference. I bet you can get 30ms with VNC. The difference with BitWHIP.

* Can play WebRTC in browser. That makes things easier to use.

* simpler/hackable software. BitWHIP is simple and uses nvenc etc… if you use nvenc with VNC I bet you can get the same experience

Any plans on integrating L4S with e.g. Tetrys-based FEC and using a way where the congestion feedback from L4S acts on the quantizer/rate-factor instead of directly on bitrate?

It's much more appropriate to do perceptual fairness than strict bitrate fairness.

Happy to have a chat on this btw; you can best catch me on discord.

E.g. "Low Latency DOCSIS"[0] and related, WiFi[1], support it and with the former it's about non-exclusive scarce uplink capacity where cross-customer capacity sharing may rely on post-hoc analysis of flow behavior to check for abuse, switching to forced fairness if caught by such heuristics. For downstream it's even more natural to have shared capacity with enough congestion to matter, but often only the WiFi side would have a large discretionary range for bandwidth scheduling/allocation to matter much.

Apple already opportunistically uses L4S with TCP-Prague and there are real-world deployments/experiments [2] with end-to-end L4S.

Fixed-

[0]: https://github.com/cablelabs/lld [1] relevant excerpt from [0]: Applications that send large volumes of traffic that need low latency, but that are responsive to congestion in the network. These applications can benefit from using a technology known as "Low Latency, Low Loss, Scalable Throughput (L4S)". Support for this technology is including in the LLD feature set, but is beyond the scope of what we have in this repository. Information on L4S can be found in this IETF draft architecture.

[2]https://www.vodafone.com/news/technology/no-lag-gaming-vodaf...

This is awesome. I would love if you had some examples on how to use AntMedia as a source. I am mostly in video engineering so the read the source comes slower to me. This would be really handy in many cases.

I remember around two years ago, we got in touch with a company—without mentioning the name but it has "ripple" in it—and after an hour-long seminar, NDA, password-protected binaries, and other BS, they barely delivered ~150ms latency..

As someone who setup a discord streaming like service to use alongside Mumble, this is very exciting. I couldn’t get anything involving webrtc working reliably, but the only broadcasting clients I found were web browsers and OBS, so I am interested to see how this compares!

What I eventually settled on was https://github.com/Edward-Wu/srt-live-server with OBS and VLC player, which gives robust streaming at high bitrate 4k60, but latency is only 1-2 seconds

Couldn't get it to work in Windows 11. Was able to run the just install script only after editing it to use the full path to the 7zip binary. Said it installed correctly, but then when I try to do `just run play whip` I got this:

  cargo:rustc-cfg=feature="ffmpeg_7_0"
  cargo:ffmpeg_7_0=true

  --- stderr
  cl : Command line warning D9035 : option 'o' has been deprecated and will be removed in a future release
  thread 'main' panicked at C:\Users\jeffr\.cargo\registry\src\index.crates.io-6f17d22bba15001f\bindgen-0.69.4\lib.rs:622:31:
  Unable to find libclang: "couldn't find any valid shared libraries matching: ['clang.dll', 'libclang.dll'], set the `LIBCLANG_PATH` environment variable to a path where one of these files can be found (invalid: [])"
  note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

What is the reason for using "just" here?

I understand people have their tooling preferences, but this looks like something that build.rs or a plain makefile could have handled?

I was also wondering if anyone could chime in on advantages of using just.

I'm familiar with makefiles, is there a particular advantage to using just over makefiles or is it personal preference? (which is a totally valid answer! I'm just wondering if I'm missing something)

I think that the appeal of just is that it is simpler than make. It is not checking timestamps of files, but executes a DAG of tasks unconditionally.

My first thought was that that was dropping one of the main features of make.

On reflection though, the timestamp dependant part isn't really something used much nowadays apart from compiling C.

It'd be cool if it was an opt-in feature for just files so that it could actually function as a replacement for make in all cases.

I went looking in the docs and found this[0] which I'd missed last time I looked into justfiles.

[0] https://github.com/casey/just?tab=readme-ov-file#what-are-th...

I don't really buy his justification that ".PHONY: xxx" is hard to remember so we should have a completly new tool instead.

Make has its issues, but it also has two big advantages: it's simple and everyone already have it.

Everyone already has it... on Linux and Mac. It's pretty rare for it to be available on Windows.

That said I kind of agree. I like the idea of `just` but it does seem like they have just created a complicated DSL.

I think it is better to just write your infra scripting in a real language. I generally use Deno or Rust itself and a thin wrapper that `cargo run`'s it. Using Rust eliminates a dependency.

Anyone who's halfway serious about software development on Windows surely has make there too, and it's not like non-developers are the target audience for 'just' scripts

> Anyone who's halfway serious about software development on Windows surely has make there too

Not even remotely. I know it might be hard to imagine if you only program on Linux/Mac but there's a whole world out there that isn't built on janky shell scripts and Makefiles. If you use C# or Java or Visual C++ or Qt on Windows it's pretty unlikely that you'd have Make. It's kind of a pain to install and you don't need it.

Literally zero of the hundreds of devs I know that do software development on windows have make installed. Why would they? It's not usual in the space at all, that's msbuild

I agree, and even more strongly: you don't even need to remember .PHONY as long as your target names don't overlap with actual filenames, which is usually easy.

In fact, I didn't even know about .PHONY and have used make for a long time. That's what's great about it, even if you stick to the most basic features make is incredibly easy and straightforward. Dare I say, it "just" works lol.

I hate the proliferation of new tools that are the same as a tool that's been around for 20 years and is no different in any significant way except being trendy. Just unnecessary entropy. Our job is to manage and reduce, not maximize entropy.

> it's simple and everyone already have it.

Not always, Go programmers for example often forget that they need C build-tools for their platform to get Make.

It's also just about the furthest thing from simple, the language is nasty so people just use it as an executor, which is a lot of tooling for such a simple use-case.

Also this:

>The explicit list of phony targets, written separately from the recipe definitions, also introduces the risk of accidentally defining a new non-phony target.

... seems to think the only way to define phony targets is:

    .PHONY: foo bar
    foo:
       ...
    bar:
       ...

... which has the problem that bar's definition is distant from its declaration as a phony target. But this form is equivalent and doesn't have that problem:

    .PHONY: foo
    foo:
       ...
    .PHONY: bar
    bar:
       ...

This ability to declare dependencies of a target over multiple definitions isn't even unique to `.PHONY`.

I recently switched my (small) company over to using just files within our codebases and it's been going over very well thus far.

We're building a set of apps that need to run on Linux, MacOS, and Windows so having a consistent solution for each is better than shell scripting and I personally have never felt great about make and it's weirdness.

It also helps that we have a pretty big monorepo so that anyone can bounce from one app to another and `just run` to use any of them, no matter the platform.

Either way the justification for me came from COSMIC[0].

[0] https://github.com/pop-os/cosmic-epoch/blob/master/justfile

John did all the work on this.

Just is nice as a Windows user. When I started committing everything worked really well already. Editing the just stuff also is really easy. Much nicer to read then scripts I think

Ooh, I’ve been looking for a good solution for this for years. Currently I use Parsec, but it’s closed source and not compatible with direct streaming from OBS etc. I’ll definitely check this out.

Always a bit sceprical when it comes to latency claims, especially in the sub 100ms space, but screen sharing 1-1 or video ingest should be a great use case for WebRTC

WebRTC is a great technology, but it still suffers from a scaling problem that is harder to resolve. On top of that, the protocol itself does not define things like adaptive bitrate switching or stalling recovery

Curious to hear what you think of some (proprietary) options for low latency playback like LLHLS LLDASH, WebRTC or HESP

WebRTC has congestion control and Simulcast/SVC, what is missing for adaptive bitrate switching. What is stalling recovery? I believe NACK/PLI handle this?

WebRTC doesn’t have a scaling problem. I think it was a software problem! Twitch, Tencent, Agora, Phenix all do 100k+ these days

I like WebRTC because of the open-ness of it. I also like that I only need one system for ingest and playback. I am HEAVILY biased though, way over invested in WebRTC :) I tend to care about greenfield/unique problems and not enough about scaling and making money

Amazing work! The most I could achieve is ~40ms of video streams, although it was over a cellular network from a drone. But 30ms is a new milestone! I will see if I can repurpose this and test out a real-time video stream from a robot if I get some spare time.

I don't get what it does, exactly? This doesn't seem to be an OBS alternative (judging by the description), but… I mean, isn't it exactly the same as just running OBS directly?

Looks like a LAN tele…er, screen sharing server/client. Presumably you could serve over the internet but it will not get the 30ms latency. Aside from the streaming (I only spent a few minutes reviewing the source) it’s a live jpeg kind of thing. I built something similar to screen share with my kids when we played Minecraft together. It was really for me because once we got in game they would take off and in 5 minutes be screaming for help 10 chunks away in some zombie skeleton infested cave at or near bedrock. Being kids, I never got good enough directions to help them in time. Anyway, it was a fun project. I used CUDA and could get 60fps per client on CAT5 and 45-ish over WiFi, dropping to 10-15fps when I walked in and out of rooms with the laptop. 60fps is 15ms, so 20 is 50fps.

>Presumably you could serve over the internet but it will not get the 30ms latency.

Indeed, you'll have to live with something like 80ms to 100ms latency over the internet and a horrifying 160 ms if you want to have things respond to keyboard and mouse inputs.

Then how does something like moonlight, parsec, or Geforce Now work? Sub-10ms latency, sometimes even sub-5 depending on time of day and network congestion.

Ever heard of the Akamai network? Netflix might be a good example. Trace routes show latency between network hops. To reduce latency you either buy better network hardware, buy better cabling, or reduce hops in n the network. Since the first two are more expensive than the third, if your service must have very fast response between server and client, move the server closer to the client. Large corporations run cache servers in multiple data centers everywhere geographically so the response time for clients is better than their competition. Why new video services struggles to compete with YouTube is in part because YouTube can afford this kind of architecture where a startup cannot. Even if it’s the best code money can buy, it will never provide the same level of experience to users as local cache servers. Kind sucks not body can compete.

It is also a player!

You can either pull the video from a WHEP source or run in a P2P mode. I wanted to demonstrate the flexibility and hackability of it all :)

Yes! I want to add remote control features to it. Lots of things left to do

Any interest in getting involved? Would love your help making it happen

It's not about proving anything, I still find it informative knowing which platform a specific implementation or solution is based on. Seen any screen sharing apps written in Erlang/Elixir lately? Such a contraption would be highly curious and interesting to me.

Otherwise in this instance it'd have been reduced to merely "Here's a thing". Pretty dull and boring.

I like it as a signal for “probably cares about some things that most devs don’t bother to care about”. Speed/responsiveness, for example, in this case.

（评论） (comments)

（评论）
(comments)