During the spring semester 2022, we wrote our master’s thesis in how to automatically extract metadata from job advertisements (unstructured data). Jonas Södergren has been our supervisor and has guided us through this process. The work has been to investigate methods and techniques for extracting four different types of metadata: Education, Experience, Driving License and Full-time / Part-time. We have chosen to deal with it as a classification problem, where three different models have been implemented and evaluated: a baseline, Support Vector Machine and BERT. We have used open data from the Swedish Public Employment Service to create a labeled dataset to train the models on. The full thesis can be found here: https://gupea.ub.gu.se/bitstream/handle/2077/72171/CSE%2022-07%20Strauss%20Safdar.pdf?sequence=1.
The code can be found here: GitHub - evelinastrauss/jobtech_thesis.