Multi-modal AI: what GPT-4V taught us about vision + language

A deep dive into the capabilities and failure modes of vision-language models.

5 تعليقات

أو لترك تعليق.

5 تعليقات

٣٢

u/ajabah_adminقبل 4 أيام

Bookmarked. Coming back to this when I need to make a decision on this.

٢٩

u/ajabah_adminقبل 4 أيام

Solid write-up. Came here from search and glad I found this community.

٢٠

u/ajabah_adminقبل 4 أيام

Real question: how long did it take you to figure all of this out?

١٧

u/ajabah_adminقبل 4 أيام

Counterpoint: what about the cases where this doesn't apply? Genuinely curious.

u/ajabah_adminقبل 4 أيام

This is the type of post that makes me keep coming back to ajabah.