Using a multilingual Transformer model to classify guard reports in multiple languages

Session Outline

Our guards write reports on everything that happens — from criminal events like burglary with damages or threats to people to warm freezers, water leaks and issuing parking tickets — generating tens of millions of reports globally every year. The guards categorize the event using preset categories and also write a free text comment to provide detail of the event in their local language. Freetext is information dense but challenging to automate information extracting from, especially on multilingual data. To enrich our guard reports to maximize client value, we use a Transformer model (XLM-R), which has been pre-trained on 100+ languages, and fine tune it to better understand guard terminology and expression. By adding another neural network, we can train the model to classify the reports for a number of interesting categories. Despite only fine-tuning training the model in a few languages, the model can classify well in many other languages, meaning that the pre-training multilingual properties carries over to our real-world task. To maximize the model impact with minimized effort, we developed a methodology of creating labels in a collaborative fashion and training the model which we will unpack. We will also share our learnings which we will use in other projects utilizing pre-trained language models. We have productionized these models in multiple countries, performing several tasks

Key Takeaways

We employ a pre-trained Transformer language model to solve a multitude of real problems greatly outperforming classical methodologies in both performance and value scalability and greatly outperforms LLMs in speed and cost
Transformer language model can be fine tuned to understand domain specific data and their multilingual capabilities from pre-training carries through to the task despite single language fine-tuning
We need a relatively small amount of labels to reach a high classification performance by developing an efficient way of generating labels in a collaborative manner
Iterating over label generation and model analysis will ensure the most time efficient model development, with a minimum amount of labels and maximum generalization performance

————————————————————————————————————————————————————

Speaker Bio

Jonas Alström Mortin-Expert Data Scientist | Securitas Digital

Jonas works as an Expert Data Scientist in NLP at Securitas. He has almost 15 years of experience in data science and business intelligence, across academic research (physics, atmospheric sciences), consulting, and product companies in the security industry. Having previously lead/contributed as a data scientist in projects like “Crime and Risk prediction” and “Real alarm classification”, Jonas has in the last couple of years specialized in working with text data using methods that span very simple to latest gen LLMs. Jonas holds a PhD in Atmospheric Sciences.

October 25 @ 17:10

17:10 — 17:40 (30′)

Day 1 | 25 Oct 2023 | MACHINE LEARNING + MLOPS

Jonas Alström Mortin-Expert Data Scientist | Securitas Digital

Cookie	Duration	Description
__cfduid	1 month	The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
bcookie	2 years	This cookie is set by linkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
lang	session	This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	1 day	This cookie is set by LinkedIn and used for routing.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to deliver advertisement when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website.
bscookie	2 years	This cookie is a browser ID cookie set by Linked share Buttons and ad tags.
fr	3 months	The cookie is set by Facebook to show relevant advertisments to the users and measure and improve the advertisements. The cookie also tracks the behavior of the user across the web on sites that have Facebook pixel or Facebook social plugin.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.

Cookie	Duration	Description
_ga_P9NY14LEKW	2 years	No description
AnalyticsSyncHistory	1 month	No description
UserMatchHistory	1 month	Linkedin - Used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.

Using a multilingual Transformer model to classify guard reports in multiple languages

Jonas Alström Mortin-Expert Data Scientist | Securitas Digital

Hyperight Summits

Legal

Contact