Recent conversation in NLP has been dominated by model sizes with GPT-3 having 175B parameters. While the models are giving amazing performance, there are a number of factors beyond model size that contribute to the performance. While training a 175B parameter model may not be feasible for everyone, this talk will discuss key factors that can significantly improve the performance of your NLP models as they have for BERT and GPT.


  • Representation of text is critical to performance of NLP models
  • Identify key pre-processing steps that have significant impact on model performance
  • Learn the large number of tricks used to extract maximum performance from models that you can use in your own models



Ashish Bansal – Director, Recommendations | Twitch

Ashish is the Director of Recommendations at Twitch where he works on building scalable recommendation systems across a variety of product surfaces, connecting content to people. He has worked on recommendation systems at multiple organizations, most notably Twitter where he led Trends and Events recommendations and at Capital One where he worked on B2B and B2C products. Ashish was also a co-founder of GALE Partners. In many years of work building hybrid recommendation systems balancing collaborative filtering signals with content-based signals, he has spent a lot of time building NLP systems for extracting content signals. In digital marketing, he built systems to analyze coupons, offers, subject lines. He has worked on messages, tweets, news articles amongst other types of textual data and applying cutting-edge NLP techniques. He is the author of Advanced Natural Language Processing with TensorFlow 2.

May 26 @ 15:20
15:20 — 15:50 (30′)

Day 2 | 19th of May – Machine Learning

Ashish Bansal – Director – Recommendations | Twitch