(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=40179566

本文介绍了一个名为 PySheets 的 Python 应用程序,该应用程序使用 PyScript、PyScript-LTK、MicroPython 和 PyOdide 构建,通过 DigitalOcean 上的 Gunicorn 在最小的 Web 服务器上运行。 其存储位于 Cloud Firestore 上。 创建者对其改进表示兴奋,分享他的个人成就,并讨论潜在的改进和用户目标。 他主张金融专业人士采用它,因为它基于云的性质、缺乏 PowerPoint/Word 替代品、需要更强大的大数据处理以及缺乏通用金融工具的 API。 他计划很快提供视频教程。 他之前曾尝试使用 ChatGPT 中的人工智能驱动功能进行简历审核,但结果好坏参半。 作者澄清,“AI 驱动”是指使用 Matplotlib 等 Python 库进行可视化。 该应用程序允许导入数据、转换为数据帧、分析、学习和导出。 目前导入时无法识别 Excel 公式,但计划未来开发。 欲了解更多信息,请期待即将推出的视频。

相关文章

原文


The author of PySheets here: The app is written entirely in Python, running on PyScript, using PyScript-LTK, with two Python VMs, MicroPython and PyOdide. Web server is minimal logic, running on gunicorn on DigitalOcean. Storage is on Firestore. The App can be easily packaged up as a standalone, "on-prem" app, but I have not given that too much priority for now. Would love to hear what you all thing of writing web apps in the browser in Python.


>>> Would love to hear what you all thing of writing web apps in the browser in Python.

I like the idea. I'm not a commercial dev, but a so called "scientific" programmer, meaning that I use programming mainly as a problem solving tool. But once in a while I create little apps for my colleagues to use, many of whom don't program. But they can manage spreadsheets quite well.

I'm pretty committed to Python at this point, but deployment of an app is a headache, and I've explored a variety of solutions. I've written a couple of web apps using *flet*, and they run on pretty much every platform I've tested. This seems like a nice approach.

The thing I'd like to figure out is how to give a web app access to a user's files, though I appreciate why this should be difficult for security reasons.



Wow, PyScript has come a long way. I remember when loading it into the browser would take 5-10 seconds. This seems much faster. Great work!!


Shameless plug: If you have bigger data sets, check out rowzero.io.

We implemented something like PySheets initially where the formula language was full Python. But we found the Python interpreter to be the bottleneck during (e.g.) large CSV imports, and the GIL prevented parallelizing evaluation. It was also harder for business users to adopt due to small syntactic differences between Python and the Excel formula language.

So we implemented the spreadsheet engine and formula language in Rust. We have a Python code window that allows you to write arbitrary Python functions. Those functions can be called as formulas from any spreadsheet cell. We seamlessly marshall Pandas dataframes from Python land to spreadsheet land and back. It gives you 90% of the benefits of pure Python without compromising on performance.



Rowzero is a better spreadsheet, while PySheets is a better Jupyter Notebook. Although they converge in certain aspects, their distinct target audiences set them apart. This divergence may create some overlap, but it also leaves ample room for user preference.

PySheets currently runs inside the browser, on top of WebAsm, and the limitations there are bigger than just Python's slowness. You have only 4G addressable memory, including the interpreter and libraries. Network bandwidth is also a limiting factor for client-side computation.

That said, PySheets can render a sheet based on a 50,000-row Excel sheet in 0.5s and needs about 20s to do a full end-to-end recompute run. There are limits to what you can do in the browser without using an external kernel that can run Polars on large datasets. But, I think most people will be fine with what PySheets can let them do.

Finally, as the author of PySheets I am honored that a "competitor" sees us as a threat. I am quite impressed by Rowzero myself. Nice work :-)



Kudos on the technical achievement. We considered the thick client approach you're doing, and one of the reasons we punted was because it was so hard.

One really nice thing about your approach is it minimizes infrastructure cost. That positions you well for embedding use cases, like New York Times visualizations, that we struggle to do economically.

Best of luck!



I am feeling pretty Okay now, indeed. I played golf today. It was on a Par3 course, so it only tested my short game. However, I scored -1, with almost a hole-in-one. I blame it on the success of PySheets :-)


I've been trying to get a platform to create dashboards where some data comes from spreadsheets and some data comes from databases. Something like a notebook interface crossed with a grafana interface while also enabling forms for input is sorely missing. While it can be stitched together, speed/performance and flexibility (in terms of JS or Python) seems to be lacking atm.

I want to use such a thing to create internal dashboards similar to retool.



Does it need to be live (i.e when database or underlying spreadsheet updates does it need to be reflected in real time on the dashboard) or are you ok with static display.

Live updating data is a pain I've messed around using javascript to force refresh html iframes on a timer. But I was never really satisfied with this. I've heard you can do things with websockets but that is starting to get too complicated for me (I'm not a programmer).

For static stuff one of the data scientists in my org pointed me to Streamlit (https://streamlit.io/) it's a python package I found very easy to use. Can easily combine SQL with CSV imports and display them all on one dashboard. Can use forms toggle butotns etc to control the display.



Rowzero seems incredible, but this and PySheets target the wrong users. You are targeting data scientist while I would target finance people to get traction. So let me tell why I would use it as a Data Scientist but not as a finance guy: 1) It runs on the cloud, I would go with something that runs locally (or on premise) since there are sensible data there (with rust as a backend should be fine, python you need to ship a set of libraries using docker) or should be integrated into GCP/AWS/Azure. 2) You need to create a PowerPoint/Word alternative as well where you can just copy/paste stuff or you need to make the copy/paste in PowerPoint/Word easy 3) Push strong on big data and DB connection, right now those are the bottlenecks, also create some API in python for popular services in finance (Bloomberg, Factset, CapitalIQ, ...) so that they are available out of the box with a subscription 4) Do something for the text part, like getting embeddings for similarity, fuzzy match in python plus probably the interface can be different in analyzing text (highlights in green of keywords, search in text and so on), people in finance often work also with PDF and having all in a platform is nice instead of having two windows as of today


PySheets has been designed to run on-prem and on GCP as well. The beta version you are looking at is just offered as a zero-install experimentation platform. We are actively talking with financial institutions, and both co-founders on the team, https://pysheets.app/#Team, have a long history in Finance, so we are very sensitive to all the (correct) points you make. We will look in more detail at your very helpful suggestions!


A major part is, in the form of Pyscript-LTK. I keep moving more of PySheets to LTK as I find reusable parts. I truly love open-source, but I am also trying to get some revenue for the months of work I spent on developing PySheets.


Great idea, easy to use GUI for non-tech and Pandas for data oriented at same time.

Is there some similar project but selfhosted? I would be uncomfortable with uploading health related data to external service.



There’s a cool one I’ve used called MitoSheet[0]. Runs locally and has some great features, though it doesn’t support TSV files last time I checked. It’s being actively developed still. I believe it was developed with YCombinator funding.

[0] https://www.trymito.io/



Heavy emphasis on "sort of"; it enforces data types on columns, which is a significant difference from both spreadsheets and pysheets. This enables/requires more database-like behavior and planning (which is great for a lot of applications), but importing spreadsheets is much less intuitive and spreadsheet competence won't get you very far.

Grist's closer to "what if Access had an interface that was more like Excel". Pysheets is more like "what if Python data structures had a GUI that looked like Excel".

To put it another way, I love Grist but _would not_ recommend people who are using spreadsheets to try to bring their spreadsheets into it. I also love pysheets and _would_ recommend it for that usage.



The PySheets server runs anywhere, for example: my laptop, Google AppEngine, and DigitalOcean. I designed it with on-prem in mind, so that PySheets could be deployed at companies that do not want to share data with external services.

That said, only the data stored in the sheet itself is stored in PySheets. Most use cases will load data from another place, filter and convert it, and then render a result. Still, self-hosting would be an interesting use case.



Any chance of a video walkthrough or tutorial? I can't figure out what the workflow is and which use cases PySheets addresses from looking at the landing page. I don't want to register an account just to find out.


Yes, I will do some videos in the coming week. I did an extended demo for the weekly PyScript FUN meeting, but it turned out it was not recorded .


Yes. They actually had a product, and IIRC, I had downloaded and tried it.

I think they are some of the same folks who later founded PythonAnywhere, which I had also tried.

I read recently somewhere that they were acquired by Anaconda.



This looks pretty cool, I am someone who gets annoyed by excel, sheets, numbers for not just letting you code it in a nice language like python and then visualize/query after that.

But then I see "AI-driven", which I should note is the _third_ line of text on the web page. I assume it is an important feature for the author of the page.

I control-f, "ai-driven", it is only used one other time on the page:

"Perform easy AI-driven visualization with Matplotlib"

There is no further elaboration on the home page and I have been unable to find additional docs. (Someone please post a snarky RTFM response with a link to the manual, cuz like I said I am very interested in this. I did google "pysheets docs" which uhh linked to a python library with the same name...)

Last week, for the first time ever, I used noted "AI" ChatGPT to review a resume I had written. I wouldn't normally do this, but the company I was applying for heavily emphasized that they use chatgpt to generate code and review things.

Ever the skeptic, I decided to try it myself. I have to say I was impressed with the results. EXCEPT, ChatGPT, pointed out a grammar error in my resume which literally did not exist. Like the sentence it was critiquing in it's feedback was not found anywhere in my resume nor was there anything similar (from my perspective, I'm sure 1000 layers deep in it's network there was some similarity to something that had the error and wouldn't it be cool if we could effectively debug that).

ANYWAY, when I see ai-driven without elaboration in a spreadsheet program, I am very concerned that my data might be "hallucinated" and I would encourage the author to explain what exactly this means. Will my charts be correct 99% of the time but sometimes a hallucination? What's going on here? I would probably be signing up for the beta right now if I had any idea. Thanks.

(final snark: funny that one of the authors is named Kurt Vile, what are the odds https://www.youtube.com/watch?v=4uAXMl-Bfiw)



If you sign up for PySheets, we give you 7 tutorials. Two explain how to use AI to import data, convert it to Dataframes, and visualize them using Matplotlib. The generated code is impressive and can help novice data scientists explore the Pandas and Pyplot APIs.

The AI is used to generate Python code, not to analyze or generate data in the sheet. I will clarify that on the landing page. Hopefully, that will inspire you to try it out.

This is a different Kurt Vile :-)



This looks like a great and very polished project! Leveraging python in soeadsheets is a great idea - probably why excel are doing it already, but it's nice to see an implantation that's so clear and easy to use.

It's hardly a criticism of pysheets specifically, but I wish spreadsheets were more restrictive (I.e. force sheets into a table format) so that people could build out spreadsheets in an org without creating an unholy mess that needs to be picked apart and reversed engineered in something that isn't a spreadsheet.



I envisioned many of the use cases not to store data in the sheet, but to use PySheets as a better Jupyter Notebook: Import data, convert to Dataframes, massage, analyze, learn, and export. A good example is how I have a sheet that loads PySheets usage metrics, converts to dataframes, plots in graphs and then renders as live charts on the pysheets.app landing page.


Very interesting software/app. My current company has a lot of Excel files with a lot of business logic embedded as Excel formulas in them. When we import Excel file to PySheets, does it also recognizes formulas in the original Excel file? Are there any videos that show what PySheet can do? Thank you.


Try cut-and-pasting a sheet from Google Sheets to PySheets. It works quite well. At the moment, PySheets does not handle Excel functions. This is on our possible roadmaps, but we just did not get to it yet. I really only worked on PySheets for about 3 months, since resigning from my last job in February.


You can load spreadsheets into Jupyter today. With Pandas or Polars, you can import CSV or Excel sheets quite easily. PySheets is reimagining what Jupyter Notebooks would look like if you use a DAG, not a linear execution flow.

Just like CoPilot or Sourcegraph's Cody is used in VS Code, PySheets uses OpenAI to suggest the Python code to write when the sheet contains a Pandas data frame of a certain shape. The AI accelerates figuring out what APIs to call and when. I myself find Matplotlib and Pyplot highly confusing, and a coding assistant that writes my code in this niche, makes me a lot more productive. It is cool to say, "Take the dataframe in E13 and generate an orange bar graph for it," and see the code generated.

联系我们 contact @ memedata.com