Text classification is a ubiquitous capability with a wealth of use cases. While dozens of techniques now exist for this fundamental task, many of them require massive amounts of labeled data in order to prove useful. Collecting annotations for your use case, however, is typically one of the most costly parts of any machine learning application. In this talk, I’ll explain how text representations (embeddings) can be leveraged as classifiers, trained with only a small amount of labeled data, or even with no labeled data at all.  I’ll also give a demo of this method in action. 


  • Learn about various limited-labeled data paradigms and strategies
  • Understand how popular text embedding models (SentenceBERT, Word2Vec) can be used as classifiers
  • Prototype demonstration via a simple Streamlit application
  • Insights on the strengths and limitations of text embeddings as classifiers



Melanie Beck – Machine Learning Research Engineer | Cloudera

Melanie Beck is a Research Engineer at Cloudera Fast Forward where she delights in translating machine learning breakthroughs into practical applications, and is particularly interested in natural language processing capabilities. With experience in machine learning and data science at diverse organizations – from manufacturing to cybersecurity – she is a jack-of-all-trades problem solver as well as a reformed astrophysicist, holding a PhD in Astrophysics from the University of Minnesota.

May 26 @ 13:00
13:00 — 13:30 (30′)

Day 2 | 19th of May – Machine Learning

Melanie Beck – Machine Learning Research Engineer | Cloudera