Paper Abstract and Keywords |
Presentation |
2025-02-18 10:15
Extraction of Important Scenes by Multimodal LLM Using Video and Speech Transcription Data
-- A Study on the Accurate Understanding of Timestamp Information -- Tomoki Haruyama, Cheng Zhou (NTT DOCOMO) |
Abstract |
(in Japanese) |
(See Japanese page) |
(in English) |
In recent years, with the development of network technology and video viewing devices, companies have released a wide variety of video distribution services, and each company is taking various measures to effectively promote their own services. One of these is the method of posting short videos of video content distributed by video distribution services on various social media.
However, the amount of video content added daily is enormous, and creating short videos requires a lot of effort.
Therefore, the use of multimodal LLM, which has made remarkable progress in the field of video understanding in recent years, has been considered. However, it has been reported that it is difficult to accurately grasp timestamp information when extracting specific scenes using multimodal LLM.
We therefore propose a method that uses speech transcription data to accurately grasp timestamp information.
In experiments, we confirm that by using speech transcription data, it is possible to extract important scenes with accurate timestamp information. |
Keyword |
(in Japanese) |
(See Japanese page) |
(in English) |
Video Understanding / Video Distribution Service / Multimodal LLM / Social Media Marketing / / / / |
Reference Info. |
ITE Tech. Rep., vol. 49, no. 4, ME2025-2, pp. 7-12, Feb. 2025. |
Paper # |
ME2025-2 |
Date of Issue |
2025-02-11 (MMS, ME, AIT, SIP) |
ISSN |
Online edition: ISSN 2424-1970 |
Download PDF |
|
Conference Information |
Committee |
ME AIT MMS IEICE-IE IEICE-ITS SIP |
Conference Date |
2025-02-18 - 2025-02-19 |
Place (in Japanese) |
(See Japanese page) |
Place (in English) |
Hokkaido Univ. |
Topics (in Japanese) |
(See Japanese page) |
Topics (in English) |
Image Processing, etc. |
Paper Information |
Registration To |
ME |
Conference Code |
2025-02-ME-AIT-MMS-IE-ITS-SIP |
Language |
Japanese |
Title (in Japanese) |
(See Japanese page) |
Sub Title (in Japanese) |
(See Japanese page) |
Title (in English) |
Extraction of Important Scenes by Multimodal LLM Using Video and Speech Transcription Data |
Sub Title (in English) |
A Study on the Accurate Understanding of Timestamp Information |
Keyword(1) |
Video Understanding |
Keyword(2) |
Video Distribution Service |
Keyword(3) |
Multimodal LLM |
Keyword(4) |
Social Media Marketing |
Keyword(5) |
|
Keyword(6) |
|
Keyword(7) |
|
Keyword(8) |
|
1st Author's Name |
Tomoki Haruyama |
1st Author's Affiliation |
NTT DOCOMO, INC. (NTT DOCOMO) |
2nd Author's Name |
Cheng Zhou |
2nd Author's Affiliation |
NTT DOCOMO, INC. (NTT DOCOMO) |
3rd Author's Name |
|
3rd Author's Affiliation |
() |
4th Author's Name |
|
4th Author's Affiliation |
() |
5th Author's Name |
|
5th Author's Affiliation |
() |
6th Author's Name |
|
6th Author's Affiliation |
() |
7th Author's Name |
|
7th Author's Affiliation |
() |
8th Author's Name |
|
8th Author's Affiliation |
() |
9th Author's Name |
|
9th Author's Affiliation |
() |
10th Author's Name |
|
10th Author's Affiliation |
() |
11th Author's Name |
|
11th Author's Affiliation |
() |
12th Author's Name |
|
12th Author's Affiliation |
() |
13th Author's Name |
|
13th Author's Affiliation |
() |
14th Author's Name |
|
14th Author's Affiliation |
() |
15th Author's Name |
|
15th Author's Affiliation |
() |
16th Author's Name |
|
16th Author's Affiliation |
() |
17th Author's Name |
|
17th Author's Affiliation |
() |
18th Author's Name |
|
18th Author's Affiliation |
() |
19th Author's Name |
|
19th Author's Affiliation |
() |
20th Author's Name |
|
20th Author's Affiliation |
() |
21st Author's Name |
|
21st Author's Affiliation |
() |
22nd Author's Name |
|
22nd Author's Affiliation |
() |
23rd Author's Name |
|
23rd Author's Affiliation |
() |
24th Author's Name |
|
24th Author's Affiliation |
() |
25th Author's Name |
|
25th Author's Affiliation |
() |
26th Author's Name |
/ / |
26th Author's Affiliation |
()
() |
27th Author's Name |
/ / |
27th Author's Affiliation |
()
() |
28th Author's Name |
/ / |
28th Author's Affiliation |
()
() |
29th Author's Name |
/ / |
29th Author's Affiliation |
()
() |
30th Author's Name |
/ / |
30th Author's Affiliation |
()
() |
31st Author's Name |
/ / |
31st Author's Affiliation |
()
() |
32nd Author's Name |
/ / |
32nd Author's Affiliation |
()
() |
33rd Author's Name |
/ / |
33rd Author's Affiliation |
()
() |
34th Author's Name |
/ / |
34th Author's Affiliation |
()
() |
35th Author's Name |
/ / |
35th Author's Affiliation |
()
() |
36th Author's Name |
/ / |
36th Author's Affiliation |
()
() |
Speaker |
Author-1 |
Date Time |
2025-02-18 10:15:00 |
Presentation Time |
15 minutes |
Registration for |
ME |
Paper # |
MMS2025-2, ME2025-2, AIT2025-2, SIP2025-2 |
Volume (vol) |
vol.49 |
Number (no) |
no.4 |
Page |
pp.7-12 |
#Pages |
6 |
Date of Issue |
2025-02-11 (MMS, ME, AIT, SIP) |