In this fourth week, our main focus was researching and documenting the state of the art. We thoroughly analyzed existing works and solutions in our field: insurance fraud detection using Computer Vision and Natural Language Processing (NLP). This theoretical background is essential to guide our future architectural and technical decisions.
In the Computer Vision component, we kicked off our first experimental works. After a research phase where we found the ideal dataset for our use case, we proceeded with its cleaning and preprocessing. With structured data ready, we began testing pre-trained models to evaluate their ability to identify damages and anomalies in images.
Simultaneously, in the Natural Language track, we faced the need for highly specific data for the project. Therefore, we started creating a custom dataset from scratch, meticulously built to meet our system's requirements for analyzing textual descriptions of accidents.