(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=39948044

此人描述了将他们的大脑私下在线托管在自己体内的过程,以确保最大程度的机密性。 在较早的互联网时代,在谷歌占据主导地位之前,用户依靠元搜索引擎从各个搜索提供商的垃圾邮件中筛选不太理想的结果。 这些元搜索引擎可访问电话目录和 FTP 站点等专业数据库,从而能够发现标准搜索引擎上无法提供的详细信息。 由于担心 Google 跟踪的 IP 地址侵犯隐私,用户寻求对搜索进行更大的控制。 作者利用软件匿名重新路由请求,同时保留对 Canva 和 Reddit 等宝贵资源的访问权限。 尽管这些服务声称不存储用户数据,但人们仍难以完全相信它们的保证。 长期使用包括个人实例的安全配置和采用 Cloudflare 零信任。 尽管人们认为网络内容质量有所下降,但作者坚持认为,真实的、高价值的材料仍然存在,并且可以通过有效的搜索方法来挖掘。

相关文章

原文


If anyone is interested in searches applied to the full text of every page in your browser history, or to only select pages that you bookmark, check out our project DownloadNet (formerly, and possibly, futurely: "DiskerNet").

It hooks into your browser to give you an augmented experience. The UI is pretty simple (think 1997 era google but without CSS haha), and we don't do anything super complex with search (but could in future), but it works not bad. Check it out!!!

https://github.com/dosyago/DownloadNet

Oh, it also makes your content (again either everything you browsed or only what you booked) available offline. So if you work on an oil rig, or shipping, or long haul freight, can be a good way to browse as normal but save yer satellite bandwidth!!!



Me neither, it really would benefit from a better documentation since I like the idea a lot.

I just tried it out and it seems to be tied to Chrome. Since I use Firefox and Chromium as my daily drivers this does not work for my case. I understand that they probably rely on some Chrome internals to dig through the content, a SOCKS Proxy approach would have worked better and would have no need to switch between a "save" and "serve" mode. But then again I was only scraping the top of it because of the lack of browser support. Will keep an eye on this one though!



it's unclear to me why anyone, particularly anyone with even a passing interest in what the topic of this submission has to offer, would be even remotely interested in being the "master archivist of your own internet browsing."

i don't need anything else archiving anything related to my internet browsing except for my human brain. and yes, that's just me...

but how is the shameless plug of this not just therefore off-topic but diametrically-opposed-to-total-personal-privacy tool appropriate here?



Is funny because this totally offline and locally hosted search engine in DownloadNet is potentially the most private of all.

I get if you’re not interested, but I imagine people interested in locally hosted search-related solutions, may be.

Your view is probably more personal and hard to support in general given this, and given the comment’s position and votes indicating at least some people are interested.

I totally understand why you wouldn’t want your browsing history archived anywhere. But that is what search engines do somewhat. It’s okay, everyone’s different.



i self-host my human brain online in my own skull. i feed it and nurture it so that it can continue to perform and offer me the highest level of privacy i could possibly maintain.


mental privacy, huh, nickburns? That’s an interesting concept.

there is no man in the desert. And no man needs nothing.

Tho I prefer the west coast of Zaire or Suid-Afrika myself.



  mental privacy, huh, nickburns? That’s an interesting concept.
you made your plug under the guise of asking if anyone had interest, i offered mine, and now i think we're done here, keepamovin.


before you go... i apologize for being a dick about it. i'd have to really reflect some more on why it felt necessary to go about it in this way, which is inevitably a deeply personal reflection.

but if i may just say, privacy as a concept for a truly egalitarian society is something very near and critical in my opinion. marketing, on the other hand, is not.

good day to you, sir.



This takes me back. Before Google, meta search tools increased your odds of finding a decent answer between the spammy results from Alta Vista, Hotbot, Lycos, etc.


If you host your own instance:

> SearXNG protects the privacy of its users in multiple ways regardless of the type of the instance (private, public). Removal of private data from search requests comes in three forms:

> 1. removal of private data from requests going to search services

> 2. not forwarding anything from a third party services through search services (e.g. advertisement)

> 3. removal of private data from requests going to the result pages

From: https://docs.searxng.org/own-instance.html#how-does-searxng-...

The docs mention a caveat below at "What are the consequences of using public instances?":

> If someone uses a public instance, they have to trust the administrator of that instance. This means that the user of the public instance does not know whether their requests are logged, aggregated and sent or sold to a third party.



All of that is fine but by simply having your IP, Google can continue to profile you in countless ways with data they collect in other ways and it wouldn't be expensive for them at all.


i think since 'IP address' has become something of a baseline non-technical understanding of one of the critical components of networking, it becomes increasingly difficult for non-netpeeps to fully grasp the many uses and non-uses of addressing.

a proxy (or proxies) and how they can shield but one or many of ' your' IP addresses throughout an egress packet's many hops (and from who or what destination it or those addresses can be shielded) is a pretty advanced concept when you think about it.

not to mention that, at this point, bare source IP address is a pretty dilute tracker compared to other current methods of identity profiling or traffic fingerprinting.

nice succint correction on your part regardless.



I would assume that the relaying can strip the request from identifying information such as IP, cookies and other tracking mechanisms that you get when visiting e.g. google.com.


privacy is achieved through the proxy and therefore aggregation of disparate requests/queries. some anonymity is therefore achieved, at least from the perspective of source search engine operators, by blending into 'the crowd.'

but the idea is not necessarily anonymity so much as privacy by foiling the creation of any even somewhat accurate marketing/data profile derived from 'your search.'



Not just avoiding spam but some meta search engines (Dogpile IIRC) could also search specialist search engines like White Pages and Yellow Pages (long before Yelp etc existed). You'd be able to find business listings and contact info that wasn't normally found on web search engines. They could also include FTP search results which was useful as public anonymous FTPs had yet to fall from use.


Try this guy. Its not Kagi, but the search results are pretty good. Host it yourself on Docker.
    https://felladrin-minisearch.hf.space/


Disclaimer: I am one of the maintainers.

The intent of SearXNG is to be stateless (with no sessions on the server) and to work without JavaScript.

However, this approach limits certain features because of the restricted size of cookies (and other forms of browser storage require JavaScript).



Thank you, that makes a lot of sense. Stateless is very good for privacy and I agree with that approach for a multi-user instance, (which I suppose is the most common use-case).

I'm picturing more of an instance-wide configuration of domain blocks for a private, single-user, self-hosted instance. But I understand this may not be the intended use of the project.



Google used to do that, but then stopped. You can still do it manually by specifying by excluding them in (every) search you do,but the list can get along and it is far from a good user experience.

Kagi has this feature built in and it is a good user experience.

You can also use the uBlacklist browser plugin. My problem with that is that is slows everything down. I am not certain but I think all the works is done after the search is complete. That it filter the actual result. The two above limit it from ever being part of the result.



just installed it to try. For the people that want to give it a try also, I noticed that several of the public list contains legitimate websites such as canva or reddit


I trust myself a bit more than I trust someone else to run my queries sadly. I understand that they claim to store no user data or associations etc, but honestly, it's just their word.


> I understand that they claim to store no user data or associations etc, but honestly, it's just their word.

My guess is that if they are found to do so, then they open themselves up to lawsuits. Not collecting data isn't merely a perk - it's practically the reason Kagi exists.



Another big reason not to keep this stuff is just the cost of dealing with requests from law enforcement. At some point you start getting them.

If you don't have any logs you can just always say the princess is in another castle, since you can't provide data that doesn't exist.

If on the other hand you do have the requested information, you need to determine the validity of the request, and then extract the data; or refuse to comply and possibly put yourself at legal risk. For a smaller business that's probably a can of worms you'd rather avoid opening.



Kagi requires an account to use, which is not great for privacy.

I understand Kagi is generally reputable, but I like the idea of a self-hosted alternative where you're in full control.



Only if you expose it publically without auth while routing queries through your residential connection, which is not an advised configuration.

For personal use, you can run it directly on your machine or access over VPN. Queries to upstream search engines can be forwarded over proxies or VPNs as you see fit. Some work fine over tor and some can go over commercial or DIY tunnels.



To add, I have been running instance for years for family and friends. I run it behind a nginix basic auth with a config that sets a forever cookie first time you login. Really simple. Another good option is cloud flare zero trust.


A ~dozen. Several are technical and use it because it includes several private and paid engines on request.

Config is in a git repo I give access to if requested. One of the technical users modified it to keep pretty minimal logs. I guess they are trusting me to actually use that config but trust is pretty high in the group so not really an issue.



I think this needs a lot more clarification than is provided in this thread.

If you run it locally, and only you use it, then you won't get blocked - a given search engine will see about the same number of requests as if you used it directly.

Add a few house members and you'll still be fine.

(I ran the original searx for a year or two locally - no issues at all).



Virtually all public search engine endpoints see an insane amount of bot activity, often several queries per second.

If you delegate queries to e.g. google or bing at that rate, you'll be ip blocked in a heartbeat.



Ah duh, for some reason my mind didn't go to hosting the search instance locally and I misunderstood.

btw thank you for Marginalia! The spirit of the small web is very important to me.



Difference is a crawler paces the requests, respects robots.txt and rate limits, and doesn't typically invoke 50-100MB disk I/O per request.

Like I don't mind automated access to my search engine, I even offer a public API to the effect, that you can in fact hook into SearXNG. What I mind is when one jabroni with a botnet decides their search traffic is more important than everyone else's and grabs all the compute for himself via a sybil attack.



It is a metasearch engine. So it uses other search engines. The point is to let multiple use it, so that Google et al. does not know who's using their service. Ie. it is a gloried proxy.

Honestly, I just use Kagi. Though I need to find some way to limit my searches to 300 per month.



that does not negate what OP said. your IP will still get blocked very quickly.

although existing searx instances have been run for years and they don't seem to be dropping like flies...



Lots of people publicly host searx instances. There's a list of publicly available instances online, but if you are looking for a tool that randomly redirects you to an instance for every search you do on your browser's bar, you can use neocities: https://searx.neocities.org/changelog

I use this all the time. A downside is that sometimes you land on an instance that doesn't provide any results or gives you really poor ones. This has been happening less frequently recently.



People always sell Sear, but myself, I'm a fan of presearch.com I have no affiliation with them whatsoever or financial interest. I have no interest in their crypto based business model. In fact I think their lack of google or bing style search result filtering is entirely due to lack of funding and/or prioritizing other things more important to success, not due to taking a stand on free speech or anything like this. And that's perhaps how it was in the early days of the internet, when maslow's hierarchy of corporate needs focused on trying to make the thing work versus public relations goodfeels and presenting only rightspeech.

Anyway, if I'm looking for some topic I believe google would be known to filter heavily, or something esoteric, I take a look at presearch to get a second opinion. I'd also love to see archive.org do something similar, archive.org has an amazing collection of data, poorly indexed and poorly searchable.



First few instances I tried are either returning no results, or only DDG results.

Error! Engines cannot retrieve results:

brave ( Suspended: too many requests )

google ( Suspended: too many requests )

qwant ( server API error )



A few years ago, I remember someone conducted a study on the quality of SearX(NG) results using different Internet providers: mobile, fiber, and VPN.

I'm not sure if this person is still active on HN, but I'm really curious about the results.



Been running thieves the default on all my devices for the past year and I couldn't be happier. Have only had it choke twice and it just needed to be updated to be back in business.


That's clever. X-ING (like those 'crossing' roadsigns), so it's like Search-ching.

There's quite some similarity between the CH and the X sound in English.

But, as this is HN probably someone with a PhD in comparative phonetics will explain why this is a common and infuriating misunderstanding of layfolken.



Web content itself had gone to shit these days, in order to win google’s SEO game to win google’s Adsense game. “Google going to shit” is just a second order effect (or third/forth depending how you look at it).


The good content has not disappeared. So it is still google going to shit if it can't make up what is good and what isn't, which was the reason people started using it in the first place 25 years ago.


Google search has gone to shit since Google+ .... or more precisely, when they removed the plus operator in Google search around 2011. And no, the quotes aren't as good.

My bet is that Google will become "Google TV" and search won't be possible. They will just show you what they want. They'll probably frame it as "AI knows what you want to see".

Maybe they should ban Google instead of TikTok (I don't use either though).



It does, I run it that way with an optional fan-out to my personal YaCy instance. Here's the relevant part of settings.yml:
   - name: yacy
     engine: yacy
     categories: general
     search_type: text
     base_url: https://yacy.searchlab.eu
     shortcut: ya
     disabled: true
     # required if you aren't using HTTPS for your local yacy instance
     # https://docs.searxng.org/dev/engines/online/yacy.html
     # enable_http: true
     # timeout: 3.0
     # search_mode: 'global'
Change 'disabled' to 'false' and point it at whatever YaCy instance you want to use. It can use the 'general' and 'images' categories.
联系我们 contact @ memedata.com