This is Jan, the founder of Apify (https://apify.com/) — a full-stack web scraping platform. After the success of Crawlee for JavaScript (https://github.com/apify/crawlee/) and the demand from the Python community, we're launching Crawlee for Python today!
The main features are:
- A unified programming interface for both HTTP (HTTPX with BeautifulSoup) & headless browser crawling (Playwright)
- Automatic parallel crawling based on available system resources
- Written in Python with type hints for enhanced developer experience
- Automatic retries on errors or when you’re getting blocked
- Integrated proxy rotation and session management
- Configurable request routing - direct URLs to the appropriate handlers
- Persistent queue for URLs to crawl
- Pluggable storage for both tabular data and files
For details, you can read the announcement blog post: https://crawlee.dev/blog/launching-crawlee-python
Our team and I will be happy to answer here any questions you might have.