![]() |
|
![]() |
| After some superficial testing I with bad quality scans you can find on kaggle I can not confirm that. CogVLM2 refuses to handle scans that InternVL-V1.5 still can comprehend. |
![]() |
| I’m going to be saying First Ever AI something for the next 15 years for clout and capital, not going to be listening to anybody’s complicated ten step funnel if they’re not doing the obvious |
![]() |
| Woah, this actually did quite well on table data extraction. I wonder how this could be used for long documents. Maybe paired with some kind of hybrid rag approach. |
![]() |
| > you can also run it multiple times to catch errors.
Does this require a slight offset and/or rotation to the image, or just literal rerun, with seed seed/whatever giving a different result? |
![]() |
| A wide variety of PDFs (both in length and content) that can have a variety of different tables, real estate related with a lot of financial content. And I need to be able to run on local models / software (no parsing as a service, no OpenAI, etc).
Here's just one example: https://www.totalflood.com/samples/residential.pdf (I struggle getting accurate data out of the Sales Comp section - basically all approaches mix up the properties. |
![]() |
| Image based or any kind of visual captchas will never be extremely effective. I think we will see more of PoW captchas in the upcoming years (just like cloudflare's turnstile captcha) |
![]() |
| Was also looking for something like this - I can't find pricing listed anywhere for their API usage, only the free 1,000 credits - or am I completely misunderstanding how this works? |
![]() |
| This "matching gpt-4" catchy phrase has lost its meaning to me. Everytime an article like this pops up, I see marketing buzz and unrealistic results in practice. |
![]() |
| While that may be true, the opposite has also happened to hundreds of companies in other areas:
https://news.ycombinator.com/item?id=39136472 Many companies also optimize for tools, like Python, that have boost productivity more than price/performance ratio. OpenAI had billions of other people's money. They might just keep using tools which worked before. Lastly, there are tons of papers published on techniques that claim to reduce cost. Most of them aren't good. Their benchmarks aren't good. Even reviewing most of them is more time than a lot of AI researchers have. Those that make it to established communities usually have gotchas that come with the benefits. So, they could also simply miss a needle in a large haystack. I think you're right that they'd be using whatever really worked with no loss in model performance. It's just that they might not for a number of reasons. The rational choice is for others to keep experimenting with those things in case they get a competitive advantage. |
![]() |
| Fair enough. Is it now safe to say that OpenAI could have done with a 8B model + $500 of fine tuning instead of running a (much) larger model on their GPU cluster? |
![]() |
| Would love to see Ollama support for this - seems promising given my experience with LLaVA so far and would love to get some hands on head to head experience |
Very curious how it performs on OCR tasks compared to InternVL. To be competitive at reading text you need tiling support, and InternVL does tiles exceptionally well.