Algolia 黑客新闻搜索 GitHub 项目已存档
Algolia Hacker News Search GitHub Project Archived

原始链接: https://github.com/algolia/hn-search

## HN 搜索总结 HN 搜索是一个 Rails 5 应用,前端使用 React,并使用 AlgoliaSearch 提供快速、相关的 Hacker News 搜索结果。它利用 `wkhtmltoimage` 通过抓取和渲染内容来生成缩略图。 **开发:** 欢迎通过 pull request 贡献!设置包括克隆仓库,使用 `bundle install` 安装依赖,配置数据库和应用程序凭证,迁移数据库 (`bundle exec rake db:migrate`),并使用 `bundle exec guard` 启动开发服务器。UI 贡献集中在 `app/assets` 目录中。 **部署:** Capistrano 用于部署,需要 SSH 访问权限。当前的部署说明(截至 2018 年 12 月)详细介绍了 Bluepill 和 Thin 服务器问题的解决方法——可能需要手动杀死进程来解决孤立进程并防止错误。 **Algolia 索引:** 该应用程序利用 AlgoliaSearch,并具有自定义配置,定义了索引属性、高亮显示、过滤标签、排名标准(点数、评论)以及针对 Hacker News 搜索相关性优化的排序偏好。

## Hacker News 搜索更新与讨论 Hacker News 的讨论集中在 Algolia 驱动的搜索功能上。一位 Algolia 开发者透露,搜索引擎大约一年前进行了完全重写,目前位于私有仓库中,未来可能公开发布。 用户报告称 HN 客户端无法加载数据,可能与 Algolia API 存在问题有关。一些用户遇到了登录困难和 API 故障。虽然 GitHub 项目于 2026 年 2 月 10 日存档,但它仍然是报告问题的地方——2025 年 8 月发生了一个简短的数据摄取问题。 对话还涉及访问历史 HN 数据。提到的资源包括 HackerBook,一个可在浏览器中运行的静态存档,以及 Google Cloud 的 BigQuery 数据集,其中包含超过 4700 万行 HN 数据,最后更新于 2026 年 2 月 21 日。用户还注意到 HN API 可用于构建自定义数据集,但抓取标记的帖子可能会导致封禁。
相关文章

原文

This is the Rails 5 application providing HN Search. It's leveraging react on the frontend, algoliasearch-rails for the search and uses wkhtmltoimage to crawl+render thumbnails.

Development/Contributions

We love pull-requests :)

# clone the repository
git clone https://github.com/algolia/hn-search
cd hn-search

# install dependencies
bundle install

# setup credentials
cp config/database.example.yml config/database.yml # feel free to edit, default configuration is OK for search-only
cp config/application.example.yml config/application.yml # feel free to edit, default configuration is OK for search-only

# setup your (sqlite3) database
bundle exec rake db:migrate

# start contributing enjoying Guard (watchers, livereload, notifications, ...)
bundle exec guard

# done!
open http://localhost:3000

If you want to contribute to the UI, the only directory you need to look at is app/assets. This directory contains all the JS, HTML & CSS code.

To deploy, we're using capistrano and therefore you need SSH access to the underlying machines and run from your own computer:

There is currently (December 2018) a bug with bluepill stopping the deployment. To workaround it, you need to force a restart with the following command instead:

bundle exec cap deploy:restart

There seems to as well be an issue with thin server, where after deployment orphaned thin processes are not killed. This means that the server tries serving previous version of the app and causes ChunkLoadErrors as the manifest points to no longer existing files. To fix the intermittent errors, you need to ssh to both servers, check for any orphaned thin processes and kill them manually.

ps aux | grep thin
kill <insert old thin process pid's>

The indexing is configured using the following algoliasearch block:

class Item < ActiveRecord::Base
  include AlgoliaSearch

  algoliasearch per_environment: true do
    # the list of attributes sent to Algolia's API
    attribute :created_at, :title, :url, :author, :points, :story_text, :comment_text, :author, :num_comments, :story_id, :story_title
    attribute :created_at_i do
      created_at.to_i
    end

    # `title` is more important than `{story,comment}_text`, `{story,comment}_text` more than `url`, `url` more than `author`
    # btw, do not take into account position in most fields to avoid first word match boost
    attributesToIndex ['unordered(title)', 'unordered(story_text)', 'unordered(comment_text)', 'unordered(url)', 'author', 'created_at_i']

    # list of attributes to highlight
    attributesToHighlight ['title', 'story_text', 'comment_text', 'url', 'story_url', 'author', 'story_title']

    # tags used for filtering
    tags do
      [item_type, "author_#{author}", "story_#{story_id}"]
    end

    # use associated number of HN points to sort results (last sort criteria)
    customRanking ['desc(points)', 'desc(num_comments)']

    # controls the way results are sorted sorting on the following 4 criteria (one after another)
    # I removed the 'exact' match critera (improve 1-words query relevance, doesn't fit HNSearch needs)
    ranking ['typo', 'proximity', 'attribute', 'custom']

    # google+, $1.5M raises, C#: we love you
    separatorsToIndex '+#$'
  end

  def story_text
    item_type_cd != Item.comment ? text : nil
  end

  def story_title
    comment? && story ? story.title : nil
  end

  def story_url
    comment? && story ? story.url : nil
  end

  def comment_text
    comment? ? text : nil
  end

  def comment?
    item_type_cd == Item.comment
  end
end
联系我们 contact @ memedata.com