University of Leeds
About the Project
Vision language models (VLM) can process both visual information and natural language and learn associations between visual information and corresponding text descriptions. With their ability to extract semantics and insights from multi-modal data, they have shown impressive capabilities in tasks such as image captioning, visual question-answering, and text-to-image search. However, such models have seen limited adoption within real-world healthcare applications. Visual Question Answering (VQA) is a task that involves understanding and answering questions about images. It combines both computer vision, which interprets the content of images, and natural language processing, which deals with understanding and generating human language. The answers to these questions require an understanding of the image, the language, and domain-specific knowledge.
Medical visual question answering models can assist clinicians in clinical decision-making and increase efficiency in the clinical workflow. They can be used to develop text-to-image search engines that allow users to query an image and its visual contents through natural language, making it possible to find medical images that fit specific criteria for research, discovery, or educational purposes.
To help us track our recruitment effort, please indicate in your email – cover/motivation letter where (globalvacancies.org) you saw this job posting.