(英) |
Stylized image captioning aims at generating captions with styles, based on the input images. In this paper, we aim at generating both emotionally positive and negative sentences. For this purpose, we propose a Bidirectional Encoder Representations from Transformers (BERT) based model, StyleCapBERT. Our model is the first image captioning model that controls the length and style of the captions at the same time. Not only did our model achieve good performance in multiple evaluation metrics such as METEOR and ROGUE, but also it can control the length and the style of the captions accurately on demand. |