May 18, 2024, 6:57 a.m. | /u/stoefln

Computer Vision www.reddit.com

I am trying to figure out how much it would cost to install and run such an LLM just as a PoC. I would like to label around 500 images (1080x1920) per month. So my input would be "where is the button in the image" and the LLM should spit out the bounding box. The LLMs capable of visual grounding that I found were these:

1. [https://github.com/QwenLM/Qwen-VL](https://github.com/QwenLM/Qwen-VL)
2. [https://github.com/Vision-CAIR/MiniGPT-4](https://github.com/Vision-CAIR/MiniGPT-4)
3. [https://github.com/magic-research/bubogpt](https://github.com/magic-research/bubogpt)
4. Did I miss another good one?


I was …

computervision cost figure image images install llm multimodal per poc visual

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

Seeking Developers and Engineers for AI T-Shirt Generator Project

@ Chevon Hicks | Remote

Technical Program Manager, Expert AI Trainer Acquisition & Engagement

@ OpenAI | San Francisco, CA

Director, Data Engineering

@ PatientPoint | Cincinnati, Ohio, United States