![]() |
Photo = Yonhap news |
[Alpha Biz= Reporter Kim Jisun] A groundbreaking Korea-specialized multimodal language model (MLLM), Kanana-v, has been unveiled, drawing attention for its ability to process diverse data types—text, images, and audio—while outperforming global models in Korean OCR (optical character recognition) and document comprehension.
On December 5, Kakao introduced Kanana-v on its official tech blog, highlighting its features and performance. This follows the first reveal of the Kanana lineup at the if kakaoAI 2024 developers’ conference in October.
The Kanana lineup comprises various AI models, including three language models (LLMs), three MLLMs, two visual generation models, and two voice models, each categorized by size, type, and functionality.
Kakao aims to further refine Kanana-v to deliver not only precise answers but also personalized responses tailored to user preferences. The company is also optimizing the model for on-device environments, enabling seamless data processing directly on devices.
Additionally, Kakao is developing Kanana-o, a next-generation multimodal model that integrates audio and video processing. Demonstrations of Kanana-o’s voice-based interactions were featured during the keynote session of if kakaoAI 2024.
This advancement marks a significant milestone in AI technology tailored for Korean language and culture, highlighting Kakao's commitment to leading innovation in this field.
Alphabiz Reporter Kim Jisun(stockmk2020@alphabiz.co.kr)