(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=40913736

大家好,我是 Jan,Apify (https://apify.com/) 的创始人 - 全栈网络抓取平台。 在 Crawlee for JavaScript (https://github.com/apify/crawlee/) 的成功以及 Python 社区的需求之后,我们今天推出 Crawlee for Python!主要特点是:- 统一的编程接口 HTTP (HTTPX with BeautifulSoup) & headless browser crawling (Playwright)- Automatic parallel crawling based on available system resources- Written in Python with type hints for enhanced developer experience- Automatic retries on errors or when you’re getting blocked- Integrated proxy rotation and 会话管理 - 可配置的请求路由 - 将 URL 定向到适当的处理程序 - 要抓取的 URL 的持久队列 - 用于表格数据和文件的可插拔存储有关详细信息,您可以阅读公告博客文章:https://crawlee.dev/blog/launching -crawlee-python我们的团队和我很乐意在这里回答您可能有的任何问题。

相关文章

原文
Hey all,

This is Jan, the founder of Apify (https://apify.com/) — a full-stack web scraping platform. After the success of Crawlee for JavaScript (https://github.com/apify/crawlee/) and the demand from the Python community, we're launching Crawlee for Python today!

The main features are:

- A unified programming interface for both HTTP (HTTPX with BeautifulSoup) & headless browser crawling (Playwright)

- Automatic parallel crawling based on available system resources

- Written in Python with type hints for enhanced developer experience

- Automatic retries on errors or when you’re getting blocked

- Integrated proxy rotation and session management

- Configurable request routing - direct URLs to the appropriate handlers

- Persistent queue for URLs to crawl

- Pluggable storage for both tabular data and files

For details, you can read the announcement blog post: https://crawlee.dev/blog/launching-crawlee-python

Our team and I will be happy to answer here any questions you might have.

联系我们 contact @ memedata.com