StarVector:从图像和文本生成可缩放矢量图形代码
StarVector: Generating Scalable Vector Graphics Code from Images and Text

原始链接: https://starvector.github.io/

StarVector是一个预训练模型,用于根据图像生成SVG代码,尤其擅长矢量化图标、标识、技术图表、曲线图和柱状图。它利用Transformers库,可以使用`AutoModelForCausalLM`加载。StarVector-8B在SVG-Bench(一个包含SVG-Stack、SVG-Fonts和SVG-Diagrams等多种SVG数据集的基准测试)上取得了最先进的性能。它优于AutoTrace和Potrace等传统方法,展现出更强的捕捉精细细节和结构元素的能力。该模型在一个综合的数据集上进行训练,与基线方法相比,在生成准确且视觉上吸引人的SVG表示方面取得了更好的结果,尤其是在处理复杂的图表和精细的图标时。StarVector的语义理解能力使其能够对细节保留和SVG代码结构做出智能决策。

Hacker News 最新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 StarVector:从图像和文本生成可缩放矢量图形代码 (starvector.github.io) 47 分,由 lnyan 发布,10 小时前 | 隐藏 | 过去 | 收藏 | 2 条评论 peter_d_sherman 9 小时前 | 下一条 [–] 这对于生成应用程序图标绝对太棒了!(也可以做成一个很棒的SaaS……每月X美元(9.95美元、19.95美元,??)生成无限图标……)祝贺团队在LLM/Transformer研究这个新兴领域所做的开创性努力!干得好! 回复 iFire 9 小时前 | 上一条 [–] 好极了。代码已发布 回复 加入我们,参加6月16日至17日在旧金山举办的AI创业学校! 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系我们 搜索:

原文
              
from PIL import Image
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoProcessor
from starvector.data.util import process_and_rasterize_svg
import torch

# Load the model
model_name = "starvector/starvector-8b-im2svg"

starvector = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, trust_remote_code=True)
processor = starvector.model.processor
tokenizer = starvector.model.svg_transformer.tokenizer

# Move model to GPU and set to evaluation mode
starvector.cuda()
starvector.eval()

# Load and process the input image
image_pil = Image.open('assets/examples/sample-18.png')

image = processor(image_pil, return_tensors="pt")['pixel_values'].cuda()
if not image.shape[0] == 1:
    image = image.squeeze(0)
batch = {"image": image}

# Generate SVG from the image
raw_svg = starvector.generate_im2svg(batch, max_length=4000)[0]
svg, raster_image = process_and_rasterize_svg(raw_svg)
            

The code above demonstrates how to load a pre-trained StarVector model using the Transformers library, process an input image, and generate SVG code. The model handles all the complexity of understanding the visual elements and translating them into structured vector graphics code.

Note: To use image rasterization features, you need to install the starvector library. Visit the StarVector repository for installation instructions and to ensure all dependencies are properly installed.

We provide Hugging Face 🤗 model checkpoints for image2SVG vectorization, for 💫 StarVector-8B and 💫 StarVector-1B. These are the results on SVG-Bench, using the DinoScore metric.

Method SVG-Stack SVG-Fonts SVG-Icons SVG-Emoji SVG-Diagrams
AutoTrace 0.942 0.954 0.946 0.975 0.874
Potrace 0.898 0.967 0.972 0.882 0.875
VTracer 0.954 0.964 0.940 0.981 0.882
Im2Vec 0.692 0.733 0.754 0.732 -
LIVE 0.934 0.956 0.959 0.969 0.870
DiffVG 0.810 0.821 0.952 0.814 0.822
GPT-4-V 0.852 0.842 0.848 0.850 -
💫 StarVector-1B 0.926 0.978 0.975 0.929 0.943
💫 StarVector-8B 0.966 0.982 0.984 0.981 0.959

Note: StarVector models will not work for natural images or illustrations, as they have not been trained on those images. They excel in vectorizing icons, logotypes, technical diagrams, graphs, and charts.

As shown in the table above, StarVector-8B achieves the highest performance across all benchmark datasets, demonstrating its effectiveness in generating high-quality SVG code from images. The model's ability to understand and reproduce complex vector graphics makes it particularly valuable for applications requiring precise vectorization of icons, logos, and technical diagrams.

SVG-Bench is a benchmark for evaluating SVG generation models. It contains 10 datasets, and 3 tasks: Image-to-SVG, Text-to-SVG, and Diagram-to-SVG. The benchmark provides a standardized way to assess the performance of different approaches to SVG generation, enabling fair comparisons and driving progress in the field.

See our Huggingface 🤗 Dataset Collection

Dataset Train Val Test Token Length SVG Primitives Annotation
SVG-Stack 🤗 2.1M 108k 5.7k 1,822 ± 1,808 All Captions
SVG-Stack_sim 🤗 601k 30.1k 1.5k 2k ± 918 Vector path -
SVG-Diagrams 🤗 - - 472 3,486 ± 1,918 All -
SVG-Fonts 🤗 1.8M 91.5k 4.8k 2,121 ± 1,868 Vector path Font letter
SVG-Fonts_sim 🤗 1.4M 71.7k 3.7k 1,722 ± 723 Vector path Font letter
SVG-Emoji 🤗 8.7k 667 668 2,551 ± 1,805 All -
SVG-Emoji_sim 🤗 580 57 96 2,448 ± 1,026 Vector Path -
SVG-Icons 🤗 80.4k 6.2k 2.4k 2,449 ± 1,543 Vector path -
SVG-Icons_sim 🤗 80,435 2,836 1,277 2,005 ± 824 Vector path -
SVG-FIGR 🤗 270k 27k 3k 5,342 ± 2,345 Vector path Class, Caption

We offer a summary of statistics about the datasets used in our training and evaluation experiments. These datasets are included in SVG-Bench. The subscript _sim_ stands for the simplified version of the dataset, as required by some baselines.

Datasets Examples

Datasets Examples
Figure 5: Examples from our diverse SVG-Bench datasets. The benchmark includes a wide range of vector graphics styles, from simple icons to complex colored illustrations, enabling comprehensive evaluation of SVG generation models.

The diversity and scale of these datasets enable StarVector to learn a wide range of SVG generation capabilities, from simple icons to complex diagrams. By training on this comprehensive collection, the model develops a robust understanding of vector graphics principles and can generalize to new, unseen examples.

The following examples demonstrate StarVector's superior performance in generating high-quality SVG code from various input images. These comparisons highlight the model's ability to capture fine details and structural elements that other methods often miss.

Image-to-SVG Comparison
Figure 3: Comparison of StarVector with baseline methods on various image-to-SVG tasks. Note how StarVector preserves fine details and structural elements while producing clean vector graphics. Traditional methods often struggle with complex shapes and details.
MSE Comparison
Figure 4: Limitations of pixel-based metrics like MSE for evaluating SVG quality. Two visually different outputs can have similar MSE scores, highlighting the need for perceptual metrics that better align with human judgment.
SVG Diagrams Comparison
Figure 5: Comparison on technical diagrams. StarVector excels at vectorizing complex diagrams with multiple elements, preserving both structure and details. Note how our model correctly handles text elements, connections, and geometric shapes that are crucial for diagram comprehension.

Key observations: StarVector consistently produces cleaner, more accurate SVG representations compared to traditional vectorization methods. The model's ability to understand semantic content enables it to make intelligent decisions about which details to preserve and how to structure the resulting SVG code.

These qualitative results demonstrate that StarVector not only achieves higher numerical scores on benchmark metrics but also produces visually superior results that better capture the intent and structure of the original images. This is particularly evident in complex cases like technical diagrams and detailed icons, where traditional methods often struggle to maintain coherence and accuracy.

联系我们 contact @ memedata.com