``` HTML 作为论文的可访问格式 ```
HTML as an Accessible Format for Papers

原始链接: https://info.arxiv.org/about/accessible_HTML.html

arXiv 正在推出论文的实验性 HTML 版本,与现有的 PDF 文件一起使用,以大幅提高使用屏幕阅读器和其他辅助技术的研究人员的可访问性。 响应社区表达的紧迫性,此 Beta 版本解决了将绝大多数 arXiv 提交的论文——用复杂的 LaTeX 编写——转换为可访问 HTML 的挑战。 虽然最初并非 200 多万篇论文都能完美转换,但 arXiv 正在积极地回填其档案并改进转换过程。 作者在提交时将预览 HTML 版本,用户可以使用内置工具直接从 HTML 页面报告问题。 重点在于*功能*,而非 PDF 的完全相同的视觉复制。 预计布局会有所不同,但渲染错误和难以辨认的报告对于改进至关重要。 作者也可以通过遵循 LaTeX 最佳实践来贡献,并鼓励开发人员协助改进转换。 该项目依赖于社区反馈来识别有问题 LaTeX 包并提高所有人的可访问性。

arXiv,一个主要的科学论文存储库,现在提供“实验性”HTML格式,以及传统的PDF版本。这解决了重要的可访问性问题:90%的arXiv提交论文使用TeX/LaTeX,这不易被屏幕阅读器读取,或在移动设备上访问。 目标是为使用辅助技术的研究人员提供更易访问的版本。从TeX到HTML的转换很复杂,因为作者使用该语言的方式多种多样,需要一个快速且自动化的过程。 Hacker News上的讨论表明,此举也可能有利于大型语言模型(LLM)在研究中的日益广泛应用,因为PDF对LLM来说难以有效处理。用户对改进的可访问性表示感谢,并询问了社区对该项目的潜在贡献。
相关文章

原文

Accessibility barriers in research are not new, but they are urgent. The message we have heard from our community is that arXiv can have the most impact in the shortest time by offering HTML papers alongside the existing PDF.

arXiv has successfully launched papers in HTML format. We are gradually backfilling HTML for arXiv's corpus of over 2 million papers over time. Not every paper can be successfully converted, so a small percentage of papers will not have an HTML version. We will work to improve conversion over time.

The link to the HTML format will appear on abstract pages below the existing PDF download link. Authors will have the opportunity to preview their paper’s HTML as a part of the submission process.

The beta rollout is just the beginning. We have a long way to go to improve HTML papers and will continue to solicit feedback from authors, readers, and the entire arXiv community to improve conversions from LaTeX.

Why "experimental" HTML?

Did you know that 90% of submissions to arXiv are in TeX format, mostly LaTeX? That poses a unique accessibility challenge: to accurately convert from TeX—a very extensible language used in myriad unique ways by authors—to HTML, a language that is much more accessible to screen readers and text-to-speech software, screen magnifiers, and mobile devices. In addition to the technical challenges, the conversion must be both rapid and automated in order to maintain arXiv’s core service of free and fast dissemination.

Because of these challenges we know there will be some conversion and rendering issues. We have decided to launch in beta with “experimental” HTML because:

  1. Accessible papers are needed now. We have talked to the arXiv community, especially researchers with accessibility needs, and they overwhelmingly asked us not to wait.
  2. We need your help. The obvious work is done. Reports from the community will help us identify issues we can track back to specific LaTeX packages that are not converting correctly.

Error messages you may see in HTML papers

HTML papers on arXiv.org are a work in progress and will sometimes display errors. As we work to improve accessibility we share with you the causes of these errors and what authors can do to help minimize them. Learn more about error messages you may see in HTML papers

Ways to help

1) Read HTML papers and report issues

We encourage the community to try out HTML papers in your field:

Report an issue

  • Go to the abstract page for a paper you are interested in reading.
  • Look in the section where you find the link to the PDF download, and click the new link for HTML.
  • Report issues by either a) clicking on the Open Issue button b) selecting text and clicking on the Open Issue for Selection button or c) use Ctrl+? on your keyboard. If you are using a screen reader, use Alt+y to toggle accessible reporting buttons per paragraph.

Please do not create reports that the HTML paper doesn't look exactly like the PDF paper

Our primary goal for this project is to make papers more accessible, so the focus during the beta phase will value function over form. HTML layouts that are incorrect or are illegible are important to report. But we do expect the HTML papers to present differently than the same paper rendered in PDF. Line breaks will occur in different places and there is likely to be more white space. In general, the HTML paper won't present as compactly. Intricate typographic layouts will not be rendered so intricately. This is by design.

HTML is a different medium and brings its own advantages versus PDF. In addition to being much more compatible with assistive technologies, HTML does a far better job adapting to the characteristics of the device you are reading on, including mobile devices.

2) Help improve the conversion from LaTeX

If you are an author you can help us improve conversions to HTML by following our guide to LaTeX Markup Best Practices for Successful HTML Papers.

If you are a developer and have free development cycles, help us improve conversions! Our collaborators at LaTeXML maintain a list of issues and welcome feedback and developer contributions.

If you are a publisher, member of a society, or conference organizer you can help us improve conversions to HTML by reviewing the .cls files your organization recommends to authors for unsupported packages. Providing .cls files that use supported packages is an easy way to support and sow accessibility in the scientific community.

Thank you to our collaborators

First, we want to share a special thank you to all the scientists with disabilities who have generously shared their insights, expertise, and guidance throughout this project.

We want to thank two organizations without which HTML papers on arXiv would not be possible: The LaTeX Project, and the LaTeXML team from NIST. We deeply thank each member of these teams for their knowledge, incredible work, and commitment to accessibility.

联系我们 contact @ memedata.com