LLaVA: Large Language and Vision Assistant
LLaVA 정리

Intro
LLaVA: Visual Instruction Tuning
(데이터) GPT-assisted Visual Instruction Data Generation

instructional vision-langauge 데이터 예시
(학습) Visual Instruction Tuning
모델 구조

학습 데이터 구성


Multi-turn 데이터 예제
학습 방법
Limitation
LLaVA 1.5: Improved Baselines with Visual Instruction Tuning
LLaVA-1.6: Improved reasoning, OCR, and world knowledge

Last updated





