试图创作值得向最终用户展示的AI图像。

试图创作值得向最终用户展示的AI图像。
How to turn 'sfo-jfk' into a suitable photo

原始链接: https://www.approachwithalacrity.com/how-to-turn-sfo-jfk-into-a-beautiful-photo/

## 构建人工智能体验：以人为本的旅行图像方法挑战：如何将用户自由形式的旅行查询（例如“sfo-jfk”）转化为旅行规划应用程序（Stardrift）中精美且相关的图像？仅仅通过人工智能生成图像被证明质量低且成本高昂，而谷歌搜索则存在版权问题风险。解决方案结合了大型语言模型、传统软件工程，以及至关重要的*人工策划*。系统分为三个步骤：首先，大型语言模型识别查询中的“地点”，并为每个地点定义名称和类型（城市、地区、国家）。其次，数据库将这些“地点”映射到从Unsplash获取的策划照片。最后，软件检索适当的图像，即使对于未识别的地点，也会使用地理定位来查找最近的已映射地点。填充数据库是一个手动但令人愉悦的过程。虽然该系统并非完美——存在差距，并且图像选择反映了个人品味——但它展示了一个强大的原则：利用人工智能的优势，并辅以人类专业知识。这种方法可以产生更精致、更“品味”的人工智能体验，避免完全依赖自动化解决方案的陷阱。该系统还包括缺失地点提醒，允许通过手动补全进行持续改进。

## 黑客新闻讨论摘要：AI 与旅行图像最近黑客新闻上出现了一场关于一篇文章的讨论，该文章详细介绍了一个系统，可以将旅行查询（例如“sfo-jfk”）转化为相关的照片。作者使用LLM（具体来说是Haiku）来解释查询并从策划的数据库中选择图像——**这些图像*不是* AI 生成的**，这是评论中争论的焦点。用户们争论了构建自定义本地模型与使用云API的实用性，许多人同意尽管有成本，云服务目前效率更高。人们对潜在的对平面设计师的影响表示担忧，但大多数人澄清该系统更像是一个专门的图像搜索引擎。讨论的很大一部分集中在黑客新闻上使用的具有误导性的标题上，最初将该项目定位为AI图像*生成*。几位用户指出标题不能准确反映文章内容。其他人质疑使用可能不真实的图像来描绘旅行目的地的价值，提倡使用真实的摄影作品。最后，一些人批评了创造理想化、可能具有欺骗性的视觉体验的想法。

or, designing AI apps with a sense of taste

How do you turn a freeform query like ‘sfo-jfk’ into a beautiful image?

This was a real problem I to solve recently. Whenever our users create a trip, we find a beautiful photo of their destination and present it to them. To do this, we need a system that could understand anything, and respond with a hand-curated photo.

To solve this, I used LLMs for understanding, traditional software engineering in the middle, and human curation (by me) of photos by excellent (human) photographers. By walking through this, I hope to provide some inspiration for how to use LLMs in ways that feel crafted and not like slop!

The problem

I work on Stardrift, an AI travel planning app, and we let people type whatever they want into a chatbox. But then we need to turn that into a beautiful homepage of images for them:

Here are some silly ideas for how you could solve this. You could AI-generate an image for each conversation. But AI-generated images suck, and it’s expensive. You could Google search the destination – but that has copyright issues, and some risks:

Ultimately, what I wanted to do was hand-curate a beautiful mapping from ‘location’ to ‘destinations’. And I wanted to match a query that could be about literally anything to this.

If you break this problem down, there are effectively three problems here:

Take a freeform query (like ‘sfo->jfk’) and turn it into a ‘place’
Build a database of ‘places’ -> pictures
Build a software system that can take a ‘place’, look it up in a database and spit out the right picture – even if that ‘place’ isn’t in the database

To explain how I built this, I’ll run through each step-by-step.

1. What is a place?

This was the simplest part of the project, from a technical perspective – and also the trickiest to design.

Nowadays, you can easily run a query like ‘SFO-JFK tomorrow’ through an LLM like Haiku and ask it to tell you where they’re going. Which is really cool – 5 years ago, this would have been impossible!

However, I had to be careful about what exactly I asked it for. If you think about it, a query like “SFO-JFK” should return “New York.” But “plan me a road-trip around the Isle of Skye” should return a picture of the “Isle of Skye”, not a generic picture of Scotland.

To add to the complication, very often our users mention multiple places in one chat — they might be going on a honeymoon trip through France, Germany and Belgium. We need to capture all of this.

After thinking a bit, I decided we’d build the system around this idea of a ‘place’. I decided that every query could give us a list of places, and every place would be a combination of a ‘name’ (e.g. New York), and a type of place (e.g. city/region/country).

So when the LLM is given “sfo-jfk,” it returns “the city, New York”. “Plan me a 3-day road trip around the Isle of Skye” will return “Isle of Skye, the region”. And if you ask about multiple places, we return a list. This sounds a bit dry and technical — and it is — but it was important to make the rest work.

For the programmers, here’s the base datatype I ended up with:

class Place:
  type: Union["region" | "city" | "country"]
  name: str

def query_to_place(query: str) -> list[Place]:
  ...

2. Build a mapping of ‘places’ -> pictures

Now that I had a definition of a ‘place’ defined, I could start creating my database of pictures.

This part was the most fun. I ran this new function I’d written to map queries to places on a sample of real Stardrift queries, which gave me a list of places ranked by popularity.

Then I wrote a little game.

We source pictures from Unsplash, a fantastic photography bank. Unsplash has an API, so I wrote an internal tool that went through this ranked list of locations and pulled the top 5 images from Unsplash. This let me pick the best one, and it would be saved into a database: