Show HN:一款具有Python集成的现代电子表格
Show HN: A modern spreadsheet with Python integration

原始链接: https://citadel5.com/gs-calc.htm

GS-Calc 是一款功能强大的现代电子表格软件,专为处理海量数据集而设计,克服了传统软件的局限性。它支持高达 3200 万行 × 16384 列的数据,无文件大小限制,并能自动分割和合并大型 CSV/XLSX 文件。其主要优势包括:优化的数据加载、复制和粘贴性能;更快的 VLOOKUP 和 MATCH 函数;以及更小、加载更快的二进制文件格式。 它拥有 25 个支持 3200 万行的透视表函数,并支持蒙特卡洛模拟和批量参数修改功能。快速的二进制搜索功能赋能 VLOOKUP 和 FILTER 等函数,即使在数百万行数据中也能实现近乎即时的更新。GS-Calc 提供无限的外部工作簿引用、正则表达式支持和灵活的图表功能。它包含 Python 集成,可自定义脚本来合并来自各种来源的数据,并提供专业的数值函数。此外,它还具有文件格式验证和比较工具。该软件完全离线且可移植,是一款用于数据分析和处理的多功能工具。

Citadel5.com 是一款新型电子表格应用程序,它拥有 Python 集成功能,并能够处理超大型数据集(4GB 以上)。根据 Hacker News 的帖子,它可以加载和编辑海量 CSV/文本文件(最多 3200 万行和 100 万列)而不会崩溃,最多可利用 500GB RAM。用户可以将 Python 函数作为 UDF 公式集成,返回图像和 CSV 文件,并利用具有几乎无限变量的统计透视表函数和求解器。它还可以即时创建包含数百万数据点的图表。 评论者表达了他们的兴趣,特别是它能够处理 Excel 难以处理的大型文件的能力。一位用户想知道,39 美元的许可费是否能通过节省数据处理和错误修正的时间和精力来证明其合理性。另一位用户询问 Excel 用户可能会错过哪些功能,而其他用户则指出了潜在的缺点,例如无法使用旧版电子表格和宏。开发者回应说,其重点在于其可预测的性能以及比 Excel 更高效地处理大型数据集的能力。

原文

GS-Calc is a modern spreadsheet with unique capabilities for processing big data sets.
You can edit large CSV files with millions of rows, XLSX files that are automatically split and merged, build complex data models e.g. with millions of look-up formulas that are opened/updated almost instantly and clean, transform, publish multi-gigabyte data sets.
  • 32 million rows x 16,384 columns.
    The number of worksheets and subfolders is unlimited. No limit for the file size.
    Large text, CSV files exceeding the 32M/16K limits are automatically split into multiple sheets and saved together in one zip.
  • Organizing worksheets in a tree form and creating 3D array cell references for worksheets in a given subfolder (by specifying only the folder path in the reference).
  • Optimized to process large files. GS-Calc easily outperforms other existing spreadsheet solutions and redefines what the term "large data set" means for desktop software.

    → Performance examples.

    GS-Calc vs other spreadsheets (for consistency, using the "industry standard" 1M row limit):

    • Loading text files is several times faster in GS-Calc.
    • Copying/pasting/filling large data ranges should be on average several times faster (and more, especially for blocks containing formulas) in GS-Calc.
    • Performing VLOOKUP and MATCH for newly pasted/entered data in large data sets can be executed at least several times faster in GS-Calc.
    • GS-Calc workbooks saved in its binary file format are a few times (and up for specific data sets) times smaller and are accordingly faster to load/save.
  • Pivot table data functions in GS-Calc: 25 functions (vs 11 functions in Excel), chi2 tests, 32 million rows, full regex filtering.
  • Fast pivot tables with up to 32 million rows, built-in reports, many functions and filtering options.
  • Monte Carlo Simulations to easily estimate risk in business, costs, future pensions or simply to reverse-calculate formulas etc.
  • Mass modifying/adding/removing parameters/arguments used by functions in large workbooks without the risk of making mistakes in long formulas.
  • VLOOKUP, HLOOKUP, MATCH, UNIQUE, FILTER functions using fast binary searching to ensure the best performance for worksheets with millions of rows.
    For example, one million vlookup() functions in a table with a few million rows can be updated almost instantly even on a slow, old pc with 8GB RAM.
  • No limitations for using cell references to external closed workbooks in all functions including the INDIRECT() function and functions written and added by users.
    Closed workbooks are automatically opened in the background when updating formulas and can either remain open in the background for best speed or can be automatically closed to save memory so that only one is loaded at a time.
  • Use of regular expressions in the look-up and text functions and in the Find and Replace function.
  • The FILTER() function to quickly filter millions of rows (and/or to perform multi-key sorting and searches for duplicates).
  • Charts easily handling millions of data points with custom vertical and horizontal (grid-)lines.
  • Fully configurable cross-highlighting to ensure the best screen readability.
  • No limitations for the number of hyperlinks and its usage. Workbook hyperlinks created instantly by the Copy/Paste commands.
    No limitations for conditional formatting - you can instantly create and efficiently use millions of rules.
  • Users can specify from 1 to 64 processor cores to be used by GS-Calc for calculations.
  • Around 450 built-in functions. Users can add their own DLL libraries with new functions or functions replacing the default ones, with the same performance, dynamic array and multi-core calculations.
    (You can also additionally order such custom functions.)
  • Python integration with UDF() functions. Create your own functions in Python that can return numbers, matrices, strings, CSV data blocks for parsing and images to be displayed in sheets.
  • Each single worksheet window can be split into up to 100 panes (with optionally synchronized rows and/or columns scrolling) to display various regions of that worksheet or other worksheets.
  • JScript and VBscript scripts making it easy to mass-merge tables and records from csv/text/xlsx/xbase/MySQL/SQLite files.
  • Two optional native file formats: Open Document *.ods spreadsheet format and exceptionally fast and compact binary format which - thanks to various data patterns analysis - enables you to generate even tens of times smaller files than those saved by other popular spreadsheets.
  • Loading/editing/saving tables with millions of rows saved as Excel *.xlsx and *.xls workbooks; tables with rows exceeding the *.xlsx and *.xls row limits are automatically split and saved (or loaded and merged) either as multiple xlsx/xls files or as multiple worksheets in one workbook.
  • Loading/editing/saving CSV, text, xls, xlsx, dBase, Clipper, FoxPro, MySQL, SQLite files with up to 32 million rows (and up to 1 million columns for text files);
    all existing files are edited in-place, without changing their format or parameters/structure.
  • Verifying data integrity with SHA256 checksums when switching between file formats.
    Compare files, workbooks, sheets, ranges and generate reports with differences.
  • Support for JScripts & VBScripts. Organizing scripts hierarchically in trees.
  • 2D/XY and 3D charts handling millions of data points instantly.
  • Saving workbooks to PDF: saving entire workbooks, single worksheets, ranges or single charts to compact PDF files.
  • Specialized numerical functions: matrix decompositions; linear equation sets with improving iterations; least squares (weighted, constrained), regression with orthogonal polynomials; time series analysis; minimization; linear programming, integer programming and quadratic programming.
  • GS-Calc can be installed on any portable storage device and used without performing any registry modifications. Fully offline - doesn't need internet connection.

Version: 22  64-bit
File size: 7.12 MB
System: 7/8.x/10/11 64-bit

Videos created on Intel Core i5-7500 @3.40GHz / 16GB RAM; CPU benchmark 8K [ vs 60K for Intel Core i9-13900K ]

Using the FILTER() function to filter 3.3 million rows.

Using a 2.37GB workbook file with i5, 16GB RAM & HDD:
32 million formulas and 300 million random number/text cells.

Creating a script in GS-Calc to mass-import tables from CSV/text/xBase/xlsx files

Using MonteCarlo simulations to find sets of 100 numbers that add up to a give value.

MonteCarlo simulations in GS-Calc to find linear programming solutions.

12 million fast binary VLOOKUP's in GS-Calc.

联系我们 contact @ memedata.com