[D] - Can multimodal models tell images apart from text? Like if a text token and an image token are close vectors, will the model be able to "tell" if it is reading or seeing?

May 20, 2024, 7:44 p.m. | /u/30299578815310

I ran into this doing some work with multimodal models. It seemed like they couldn't tell which part of the information was from the text vs the image portions of an input.

Is there any research on this?

image images information machinelearning multimodal multimodal models part ran reading text the information token vectors will work

Visit resource

More from www.reddit.com / Machine Learning

[D] Where is https://ai.papers.bar/papers/weekly 9 hours ago | www.reddit.com

machinelearning project

[D] Is there any way to perform encoding a bit faster when creating FAISS indexes? 9 hours ago | www.reddit.com

benchmarks building code embedding +12

[Discussion] Why next token prediction doesn't work for Recommender System? (or am I wrong?) 11 hours ago | www.reddit.com

assessment build dataset gpt +13

[D] Pretrained embedding model for search in scientific documents 14 hours ago | www.reddit.com

app documents embedding errors +13

[N] AI is promoted from back-office duties to investment decisions 16 hours ago | www.reddit.com

decisions investment machinelearning office +1

[P] Baysian bandits item pricing in a Moonlighter shop simulation 17 hours ago | www.reddit.com

agent bayesian customer game +8

[D] The Dilemma of Taking Notes on Every ML Resource or Accepting Knowledge Loss Over … 17 hours ago | www.reddit.com

every knowledge loss machine +7

[R] MetaEarth - A Generative Foundation Model for Global-Scale Remote Sensing Image Generation 18 hours ago | www.reddit.com

foundation foundation model generative global +5

If LLMs are token-based autoregressive models, how do they generate images? (Transformers + VQVAE) [D] 19 hours ago | www.reddit.com

autoregressive autoregressive models gemini generate +10

Senior Machine Learning Engineer

@ GPTZero | Toronto, Canada

View on ai-jobs.net

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

@ HelloBetter | Remote

View on ai-jobs.net

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena

View on ai-jobs.net

Seeking Developers and Engineers for AI T-Shirt Generator Project

@ Chevon Hicks | Remote

View on ai-jobs.net

Technical Program Manager, Expert AI Trainer Acquisition & Engagement

@ OpenAI | San Francisco, CA

View on ai-jobs.net

Director, Data Engineering

@ PatientPoint | Cincinnati, Ohio, United States

View on ai-jobs.net

all AI news

[D] - Can multimodal models tell images apart from text? Like if a text token and an image token are close vectors, will the model be able to "tell" if it is reading or seeing?

More from www.reddit.com / Machine Learning

Jobs in AI, ML, Big Data

Senior Machine Learning Engineer

ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)

Doctoral Researcher (m/f/div) in Automated Processing of Bioimages

Seeking Developers and Engineers for AI T-Shirt Generator Project

Technical Program Manager, Expert AI Trainer Acquisition & Engagement

Director, Data Engineering