Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization (HA-DPO)

Zhiyuan Zhao *, Bin Wang *, Linke Ouyang *,
Xiaoyi Dong, Jiaqi Wang, Conghui He ,
Shanghai AI Laboratory,   
*Equal Contribution       Corresponding Author

Demo Video

Abstract

Multimodal large language models have made significant advancements in recent years, yet they still suffer from a common issue known as the "hallucination problem" where the models generate textual descriptions that contain inaccurate or non-existent content from the image. To address this issue, we introduces a novel strategy: Hallucination-Aware Direct Preference Optimization (HA-DPO). Our approach treats the hallucination problem as a unique preference selection issue, where the model is trained to favor the non-hallucinating response when presented with two responses of the same image (one accurate and one hallucinating). We also presents an efficient process for constructing hallucination sample pairs to ensure high-quality, style-consistent pairs for stable HA-DPO training. We applied this strategy to three mainstream multimodal models, and the results showed a significant reduction in the hallucination problem and an enhancement in the models' generalization capabilities. For example, with HA-DPO, the MiniGPT-4 model demonstrates significant advancements: POPE accuracy increases from 51.13% to 85.66% (34.5% absolute improvement), and the MME score escalates from 968.58 to 1365.76 (41% relative improvement).

Hallucination Mitigation strategy in HA-DPO

MY ALT TEXT

The hallucination mitigation strategy in HA-DPO can consists of 3 steps:

1. Description Generation: We randomly select images from the VG dataset and use the LVLM to generate corresponding detailed descriptions.

2. GPT-4 Hallucination Detection and Correction: Next, input the model-generated description and all the annotation information of the original image into GPT-4 and provide a detailed prompt template to enable GPT-4 to check whether there are hallucinations in the generated description. If hallucinations exist, a corrected description without hallucinations needs to be provided. This way, we can obtain the positive and negative responses corresponding to an image. In fact, hallucinations almost always occur when a multimodal model provides a detailed image description.

3. Style-consistent Data Augmentation: To ensure style consistency between positive and negative sample sentences and obtain more samples, we use GPT-4 to rewrite the positive and negative samples obtained in the previous step, ensuring that the positivity and negativity remain unchanged. Besides, we further augment positive and negative data into question-answering format. Specifically, we let GPT-4 convert descriptive positive-negative data into one question with positive-negative answer pairs. Then the question is provided to LVLM and positive-negative responses are sampled according to previously given answers. In HA-DPO training, the descriptive positive-negative data and question-answering positive-negative data form all the preference learning data.

4. Hallucination mitigation: The constructed positive-negative hallucination-aware data are used for model fine-tuning, based on DPO (Direct Preference Optimization).

SHR: Sentence-level Hallucination Ratio

MY ALT TEXT

In response to the limitations of the POPE, we propose a new GPT-4 assisted evaluation metric, termed "Sentence-level Hallucination Ratio" (SHR), aimed at quantifying the degree of hallucination at the sentence level in multimodal AI models. These descriptions or responses often contain multiple sentences for multimodal models that generate detailed descriptions of a given image. The SHR measures the proportion of sentences showing hallucinations to the total number of sentences in the response.

To overcome the evaluation inaccuracy in SHR, we manually add additional factual information for each image, which can help ensure the high evaluation accuracy of SHR. Overall, the accuracy of GPT-4 judgment can reach about 95\% with the assistance of human-annotated factual information.

Compared with current hallucination evaluation benchmarks such as POPE, our proposed SHR demonstrates following advantages: (1) Openness of object categories. The VG images encompass thousands of object types in total and any object of any category within these images is part of the evaluation. (2) SHR covers various types of hallucinations. In the SHR evaluation, there are no restrictions on the types of hallucinations and any description that conflicts with the image content is considered a hallucination.


BibTeX

@misc{zhao2023hallucinations,
            title={Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization}, 
            author={Zhiyuan Zhao and Bin Wang and Linke Ouyang and Xiaoyi Dong and Jiaqi Wang and Conghui He},
            year={2023},
            eprint={2311.16839},
            archivePrefix={arXiv},
            primaryClass={cs.CV}
      }
      

Terms of Use

By utilizing this service, users are bound to comply with the stipulated terms: The service is primarily a research preview, designed exclusively for non-commercial applications. It offers only limited safety measures and there's a possibility it may produce offensive content. Users are strictly prohibited from using the service for any illegal, harmful, violent, racist, or sexually explicit purposes.

License

This service, being a research preview, is intended solely for non-commercial use and is governed by the model License of LLaMA. If you encounter any potential violation, we urge you to contact us immediately.