(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=43425655

这篇 Hacker News 帖子讨论了 Anthropic 公司宣布 Claude 现在可以搜索网络的新闻。用户们既兴奋又有所怀疑,他们指出,Perplexity、Gemini 甚至 ChatGPT 等竞争对手已经提供了类似的功能。 一个主要的争论点在于 Claude 应该如何处理 `robots.txt` 文件,意见分歧在于它是否应该遵守为自动化爬虫设计的规则,或者在响应直接用户请求时忽略这些规则。一些用户观察到 AI 机器人在他们的网站上忽略了 `robots.txt`。 一些评论重点介绍了变通方法,例如使用 VPN 访问国外新闻,或将 Claude 与 MCP 等外部工具集成以进行网络搜索。一位用户甚至吹嘘说,他用极少的代码在其自己的“Claudine”代理中构建了类似的功能。 该帖子还涉及区域限制(最初仅限美国)、对谷歌搜索霸主地位的潜在影响,以及大型语言模型直接与网络交互与用户预处理信息给他们之间的更广泛趋势。


原文
Hacker News new | past | comments | ask | show | jobs | submit login
Claude can now search the web (anthropic.com)
93 points by meetpateltech 38 minutes ago | hide | past | favorite | 49 comments










Aside, does anyone know of an app like Perplexity for surfing the news in a foreign language (language practice)?

Perplexity's "Explore" tab translates its news to your local language, and its curated news items are all pretty interesting, but the problem is that there are so few of them. I seem to get maybe a dozen stories in a day. I paid their subscription for a month just to listen to the news on my walk, but didn't renew because of this.

A foreign news site like BBC Mundo (Spanish) on the other hand barely has any stories outside of a few niches. Its tech section only has a few stories per week.

Hmm, maybe I want a sort of RSS reader that AI-translates stories for me. But I don't really want to maintain a feed myself either.

Apple News would probably do it since they also have good curation, but afaict they still don't support foreign news sources (why???).



Use a VPN to appear like being in the target country. Use a browser profile where you set the language preference to the one you target.


I wonder if it will actually respect the robots.txt this time.


I don't think it should. If a user asks the AI to read the web for them, it should read the web for them. This isn't a vacuum charged with crawling the web, it's an adhoc GET request.


robots.txt is meant for automated crawlers, not human-driven actions.


Every automated crawler follows human-driven actions.


Welcome to "Context".


It must form the search index somehow. That is prior the human action. Simply it would not find the page at all if it respects.


I remember in late 90s/early 2000 as a teen going to robots.txt to specifically see what they were trying to hide and exploring those urls.

What is the difference if I use a browser or a LLM tool (or curl, or wget, etc) to make those requests?



careful, some of those are honey pots



almost no one does, robots.txt is practically a joke at this point — right up there with autocomplete=off


In what circles is it a joke? Google bots seem to respect it on my sites according to logs.


It's in a small circle of those that do. Blame the internet archive for starting this trend: https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea...


I know an artist that had noindex turned on by mistake in robots.txt for the last 5 years - google, kagi and duckduckgo find tons of links relevant to the artist and the artwork but not a single one from the website.

so not seem to or apparently but matter of fact like. robots.txt works for the intended audience



Apparently, the regular search crawler does it, but the ai thingie doesn't.


Can confirm. My website is flooded with AI bots despite attempts to block crawlers to certain parts of it.


Huh? You can add Google-Extended[1] to opt out from Generative AI summaries.

[1] https://blog.google/technology/ai/an-update-on-web-publisher...



I have replaced all robots.txt rules with simple WAF rules, which are cheaper to maintain than dealing with offending bots.


> in feature preview for all paid Claude users in the United States. Support for users on our free plan and more countries is coming soon

US only



Good news. I integrated Claude with a scrapper to get info from pages and it was not giving hallucinations 99% of the time. Hope this works out of the box now.


What's up with the geoblocking of Claude features? Not the first time it happens


Different geographies have different legal requirements.


Does not really say /how/ it's performing a web search... Is it tapping into it's "own" corpus of material or calling out to some other web search engine?


Excited to see this. I've really been enjoying Claude. It feels like a different, more creative flavor of experience than GPT. I use Claude a lot for dialogues and exploring ideas, like a conversational partner. Having web access will add an interesting dimension to this.


I haven't used Claude yet, but heard many good things. So I'm surprised to see that they're so far behind on this feature.


Has anyone else noticed a significant reduction in 3.7's (via API) abilities over the last 24ish hours?

I wonder if this is related?



Funny thing is that I have the obsidian-mcp-tools installed and today claude-desktop just starting fetching stuff from the web through that because it exposes a fetch tool to claude.

So this limitation is a bit arbitrary anyway.



I added this functionality already some time ago in my Claudine agent:

https://github.com/xemantic/claudine/

It costed roughly 30 lines of code: https://github.com/xemantic/claudine/blob/main/src/commonMai...



Kind of I guess? You should be more explicit that you're funnelling everything to through jina.ai


These are interesting times.

It wasn't long ago that a uni senior who worked for a decade+ on Google Search told me that it was hopeless anyone tries to compete with Google not because it sees a tonne of signals that helps with IR but because of its in-house AI/ML.

It turns out that the org that built the ultimate AI/ML that runs rings around anything that came before it for NLP (and thus IR) was a sister team at Google Translate.

It isn't inconceivable that a kid might be able to build a Google-quality web search, scalability aside, on CommonsCrawls data in a weekend. As someone who built re-ranking algorithms for a search engine built atop Yahoo! and Wikipedia (REST/SOAP) APIs back in the late 2000s as a side project, the current capabilities (of even the open weight models) seem too good to be true.

Google itself though is saved by its enormous distribution advantages afforded to it by Chrome (3B to 5B users) and Android (3B+), aside from its search deals with Apple and other browser vendors.



Any information on what search engine is powering it?


There is already a 100 ways of doing it using MCP

https://glama.ai/mcp/servers?searchTerm=search

What's the benefit of bringing native integration?



The native app that allows for MCP is only available officially on Mac's and the web interface is generally more convenient for non-technical users. Searching and interacting with the web has become a table-stakes feature and was a glaring gap in Claude.


Let me rephrase it.

MCP has the capability to add this functionality.

It would be nice to see MCP getting adoption in their web UI, as well easier UX, rather than more ad hoc features being added natively.



Couldn't a lot of front-ends using Claude API do this already? What's new?


If that's true, they are using a separate search API to get search results and feed it into a regular Claude API call. The difference here is that Anthropic is integrating it directly, like OpenAI and Google have. It doesn't look like it's in the API yet, but presumably that's coming. Then, as with gpt-4o and the Gemini models, you can make a single API call and it will do the searching for you and incorporate the results.


Now I understand why Gitlab was (is?) attacked[0] by those hideous bots.

[0] https://news.ycombinator.com/item?id=43422413



Excited to see how this compares to Perplexity or Gemini. I remember that ChatGPT used to be able to search the web, but last I checked it it couldn't. I wonder why they removed that feature


ChatGPT can search the web (I just checked in my account). It appears to be available for all users (not just paid): https://help.openai.com/en/articles/9237897-chatgpt-search


I definitely tried to web search with ChatGPT a few weeks ago and it couldn't. I don't think I'm making this up. Unless I suffered a TBI.


It told me it can't search the web and then proceeded to search the web


Search was not removed from ChatGPT, although it can be glitchy at times.


Also not all models support it. I think only gpt-4o and gpt-4o-mini support it, although I haven't triple checked that.


As sibling said, search was nog removed. But not all models can use the web search, maybe that is causing your perception?


Awesome, but I also do want to say it’s pretty sad it took this long straight up. Literally no excuse. But I’m glad they finally got to a feature that was launched more than a year ago on competitors.


kagi already lets me use claude to search the web. how is this different?


kagi is searching the web for you, and then injecting the results into the context of the prompt.


Are there any downsides to that approach? It seems like we're moving towards empowering llm's to interact with stuff as if that's better than us doing it for them - is it really?

Eg say I want to build an agent to make decisions, shall I write some code to insert the data that informs the decision into the prompt, return structured data, and then write code to implement the decision?

Or should I empower the llm do those things with function calls?



Mazel Tov!






Join us for AI Startup School this June 16-17 in San Francisco!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact



Search:
联系我们 contact @ memedata.com