Show HN:Slipstream——一个用于有状态流处理的Python库
Show HN: Slipstream – A Python library for stateful stream processing

原始链接: https://slipstream.readthedocs.io/en/1.0.1/

Slipstream 使用数据流模型简化了有状态流式应用程序的开发。它通过并行处理和灵活的源/宿映射提供简洁性,同时允许自由使用任意 Python 代码而无需限制性抽象。通过优化的默认设置,快速启动即可实现高速运行。 Slipstream 消费任何 Async Iterable 源(例如,Kafka,流式 API)并将数据输出到任何 Callable(例如,Kafka,RocksDB,API)。用户可以使用标准 Python 代码执行有状态操作,例如连接、聚合和过滤。还提供依赖流宕机检测以及暂停/纠正功能。 利用基本的 Python 构建块,Slipstream 能够轻松创建类似框架的功能,例如定时器。示例演示了一个定时器触发一个处理程序,该处理程序向打印宿输出 "🐟 - blub",说明了轻松创建具有源、处理程序和宿的流。

Slipstream是一个新的Python库,用于有状态的流处理,旨在并行化数据工作流程。它利用`AsyncIterables`作为数据源,并允许任何`Callable`作为接收器。其关键特性是使用RocksDB保存状态。该库提供检查点功能以处理流中断,并在上游依赖项恢复并到达数据的当前点之前暂停下游流。源代码可在GitHub上找到:[https://github.com/Menziess/slipstream-async](https://github.com/Menziess/slipstream-async)。更多文档可在[slipstream.readthedocs.io](slipstream.readthedocs.io)找到。
相关文章

原文
_images/logo.png

Slipstream provides a data-flow model to simplify development of stateful streaming applications.

  • Simplicity: parallelizing processing, mapping sources to sinks

  • Freedom: allowing arbitrary code free of limiting abstractions

  • Speed: optimized and configurable defaults to get started quickly

Consume any source that can be turned into an Async Iterable; Kafka, streaming API’s, et cetera. Sink or cache data to any Callable; Kafka, RocksDB, API’s. Perform any arbitrary stateful operation – joining, aggregating, filtering – using regular Python code. Detect dependency stream downtimes, pause dependent streams, or send out corrections.

Because everything is built with basic python building blocks, framework-like features can be crafted with ease. For instance, while timers aren’t included, you can whip one up effortlessly:

from asyncio import run, sleep

async def timer(interval=1.0):
    while True:
        await sleep(interval)
        yield

We’ll use print as our sink:

Let’s send our mascot 🐟 – blub downstream on a regular 1 second interval:

from slipstream import handle, stream

@handle(timer(), sink=[print])
def handler():
    yield '🐟 - blub'

run(stream())


# 🐟 - blub
# 🐟 - blub
# 🐟 - blub
# ...

Some things that stand out:

  • We’ve created an Async Iterable source timer() (not generating data, just triggering the handler)

  • We used slipstream.handle to bind the sources and sinks to the handler function

  • We yielded 🐟 - blub, which is sent to all the Callable sinks (just print in this case)

  • Running slipstream.stream starts the flow from sources via handlers into the sinks

The data-flow model that simplifies development of stateful streaming applications!

Contents

Proceed by interacting with Kafka and caching application state in: getting started.

联系我们 contact @ memedata.com