试图创作值得向最终用户展示的AI图像。
How to turn 'sfo-jfk' into a suitable photo

原始链接: https://www.approachwithalacrity.com/how-to-turn-sfo-jfk-into-a-beautiful-photo/

## 构建人工智能体验:以人为本的旅行图像方法 挑战:如何将用户自由形式的旅行查询(例如“sfo-jfk”)转化为旅行规划应用程序(Stardrift)中精美且相关的图像? 仅仅通过人工智能生成图像被证明质量低且成本高昂,而谷歌搜索则存在版权问题风险。 解决方案结合了大型语言模型、传统软件工程,以及至关重要的*人工策划*。 系统分为三个步骤:首先,大型语言模型识别查询中的“地点”,并为每个地点定义名称和类型(城市、地区、国家)。 其次,数据库将这些“地点”映射到从Unsplash获取的策划照片。 最后,软件检索适当的图像,即使对于未识别的地点,也会使用地理定位来查找最近的已映射地点。 填充数据库是一个手动但令人愉悦的过程。 虽然该系统并非完美——存在差距,并且图像选择反映了个人品味——但它展示了一个强大的原则:利用人工智能的优势,并辅以人类专业知识。 这种方法可以产生更精致、更“品味”的人工智能体验,避免完全依赖自动化解决方案的陷阱。 该系统还包括缺失地点提醒,允许通过手动补全进行持续改进。

## 黑客新闻讨论摘要:AI 与旅行图像 最近黑客新闻上出现了一场关于一篇文章的讨论,该文章详细介绍了一个系统,可以将旅行查询(例如“sfo-jfk”)转化为相关的照片。作者使用LLM(具体来说是Haiku)来解释查询并从策划的数据库中选择图像——**这些图像*不是* AI 生成的**,这是评论中争论的焦点。 用户们争论了构建自定义本地模型与使用云API的实用性,许多人同意尽管有成本,云服务目前效率更高。人们对潜在的对平面设计师的影响表示担忧,但大多数人澄清该系统更像是一个专门的图像搜索引擎。 讨论的很大一部分集中在黑客新闻上使用的具有误导性的标题上,最初将该项目定位为AI图像*生成*。几位用户指出标题不能准确反映文章内容。其他人质疑使用可能不真实的图像来描绘旅行目的地的价值,提倡使用真实的摄影作品。最后,一些人批评了创造理想化、可能具有欺骗性的视觉体验的想法。
相关文章

原文

or, designing AI apps with a sense of taste

How do you turn a freeform query like ‘sfo-jfk’ into a beautiful image?

This was a real problem I to solve recently. Whenever our users create a trip, we find a beautiful photo of their destination and present it to them. To do this, we need a system that could understand anything, and respond with a hand-curated photo.

To solve this, I used LLMs for understanding, traditional software engineering in the middle, and human curation (by me) of photos by excellent (human) photographers. By walking through this, I hope to provide some inspiration for how to use LLMs in ways that feel crafted and not like slop!

The problem

I work on Stardrift, an AI travel planning app, and we let people type whatever they want into a chatbox. But then we need to turn that into a beautiful homepage of images for them:

Our homepage.

Here are some silly ideas for how you could solve this. You could AI-generate an image for each conversation. But AI-generated images suck, and it’s expensive. You could Google search the destination – but that has copyright issues, and some risks:

Is this a photo of Buffalo, NY?

Ultimately, what I wanted to do was hand-curate a beautiful mapping from ‘location’ to ‘destinations’. And I wanted to match a query that could be about literally anything to this.

If you break this problem down, there are effectively three problems here:

  1. Take a freeform query (like ‘sfo->jfk’) and turn it into a ‘place’
  2. Build a database of ‘places’ -> pictures
  3. Build a software system that can take a ‘place’, look it up in a database and spit out the right picture – even if that ‘place’ isn’t in the database

To explain how I built this, I’ll run through each step-by-step.

1. What is a place?

This was the simplest part of the project, from a technical perspective – and also the trickiest to design.

Nowadays, you can easily run a query like ‘SFO-JFK tomorrow’ through an LLM like Haiku and ask it to tell you where they’re going. Which is really cool – 5 years ago, this would have been impossible!

However, I had to be careful about what exactly I asked it for. If you think about it, a query like “SFO-JFK” should return “New York.” But “plan me a road-trip around the Isle of Skye” should return a picture of the “Isle of Skye”, not a generic picture of Scotland.

To add to the complication, very often our users mention multiple places in one chat — they might be going on a honeymoon trip through France, Germany and Belgium. We need to capture all of this.

After thinking a bit, I decided we’d build the system around this idea of a ‘place’. I decided that every query could give us a list of places, and every place would be a combination of a ‘name’ (e.g. New York), and a type of place (e.g. city/region/country).

So when the LLM is given “sfo-jfk,” it returns “the city, New York”. “Plan me a 3-day road trip around the Isle of Skye” will return “Isle of Skye, the region”. And if you ask about multiple places, we return a list. This sounds a bit dry and technical — and it is — but it was important to make the rest work.

For the programmers, here’s the base datatype I ended up with:

class Place:
  type: Union["region" | "city" | "country"]
  name: str

def query_to_place(query: str) -> list[Place]:
  ...

2. Build a mapping of ‘places’ -> pictures

Now that I had a definition of a ‘place’ defined, I could start creating my database of pictures.

This part was the most fun. I ran this new function I’d written to map queries to places on a sample of real Stardrift queries, which gave me a list of places ranked by popularity.

Then I wrote a little game.

We source pictures from Unsplash, a fantastic photography bank. Unsplash has an API, so I wrote an internal tool that went through this ranked list of locations and pulled the top 5 images from Unsplash. This let me pick the best one, and it would be saved into a database:

Because of some API restrictions, I could only do about 20 places per hour. So while it only took me an afternoon to code the rest of the system, it took about 3 days to populate the database with the first 500 places, done mostly in 5-minute chunks every hour. But it was worth it – the game was very fun, and the mapping we produced is beautiful!

3. Putting it together

Now I could turn a conversation into a ‘place’, and had a database mapping ‘places’ to beautiful photos.

But our users could ask about anything, and our LLM would simply pass on whatever it said. To add insult to injury, our LLM wasn’t very precise — sometimes it would tell us the user was going to ‘NYC’, sometimes ‘New York’, sometime ‘New York City’. So how I meant to handle places that didn’t have an entry in the database?

Well, thanks to Google Maps, it’s very easy to turn any place name into latitude/longitude coordinates. So whenever we got weird input, we could turn it into a coordinate. And then we could ask: what’s the closest place that we do have in the database?

For instance, we might get a place like ‘Deadvlei.’ I have no idea where that is. But our geolocation API tells us it’s at (24.7° S, 15.2° E), so we can lookup to the closest coordinate that’s in the database, which is (24.8° S, 15.3° E). Then we can return the nearest one: in this case, Namibia:

The beautiful Namibian sea.

This mostly works, but it does mean there are a few places that don’t properly show up! One of the last tools I built was a little map that showed us where we have photos, so I could see where we had gaps. It turned out we were missing a lot of Africa and South America — they are not common travel destinations. So I manually filled them in.

Our internal database of photos, geolocation on a map!

We have missed places since — recently someone tried to search ‘Mongolia,’ which we didn’t have. When this happens, it sets off an alert to us, and then I manually go in and backfill it. It works well enough.

Final thoughts

There are some flaws to this system: for instance, we have a lot of cities mapped but not a lot of regions, so you might search for ‘The Gold Coast’ and get a photo of Brisbane instead. And everything is subject to my taste—my team recently made fun of me for always picking golden hour photos. (Wait, maybe that’s a feature...)

But I like it as a small but tasteful AI project. The best AI products don’t use AI for everything; they use it for what it’s good at. And that’s exactly what we did here, mixing software engineering, AI engineering and a bit of good old human curation.

This is a cross-post from the Stardrift technical blog! Shoutout to Sarah Chieng and swyx for hosting the Write & Learn meetup where I wrote this.

联系我们 contact @ memedata.com