Yunshi Zhao is a Machine Learning Engineer at Liftoff, a mobile app optimization platform for large-scale app marketing and monetization. Your tasks will range from researching and training models to using and monitoring models in production. She is also a member of Liftoff’s Diversity, Equity, and Inclusion (DEI) committee, which focuses on engineering representation. Before moving into startup life, she worked as a data scientist and aerospace engineer. Here she talks about machine learning development, best practices, use cases and ML in production.
What is Liftoff’s mission as a company and what made you decide to join the team?
Liftoff Mobile is a technology and programmatic advertising company. The organization has many products in different parts of the ad technology ecosystem, especially after our merger with Vungle last year. But the main mission is to help mobile apps grow and make money.
I really like the vertically integrated system where you can do everything along the model life cycle. Most companies will hire you for data science and model development. But then hand that over to another engineer to provide. At Liftoff, the ML group does everything and that really appealed to me.
How did you train for this role and what are your tips for anyone interested in making the transition to AI?
Luckily, my previous job in aerospace engineering used a lot of the same math, so I’d say anyone with a strong math background would have an easier time progressing to becoming a machine learning engineer (MLE). For the programming part, there are so many online resources to help improve the software and there is also such a large community of people you can ask for help. If you don’t have a math background, you can always start with something that doesn’t make the programming part that difficult. Data science and data analytics are good entry points and then you can slowly work your way up to the MLE. I look at this progression as a video game where you advance through all the different levels.
Which industry do you focus on at Liftoff and what does your day-to-day look like?
I work on the Demand Side Platform (DSP), a system that helps advertisers buy the right ad at the right price. The main task of our team is to build conversion models and predict the probability of a conversion in any of the down-funnel events. My day-to-day work depends heavily on the project I’m working on, but mostly it’s about initiating pilot projects. Sometimes I also work on our code base before model launch to update our model. I also create code to update how we train the model and make code changes in the bid path for the part of the code that we use in the model to bid on an ad. Liftoff has a strong documentary culture, so I also write a lot for ideas I want to suggest or thought experiments I want to share. I also meet with other teams to better understand the business metrics and how our model should behave in that business context.
Scalability is an important part of the infrastructure, especially in your advertising technology use case. What should be considered when it comes to data scalability?
Our Kafka processes two gigabytes of data per second, that’s a lot of data. Much of our system relies on knowing the data we need to process and it is challenging to perform a functional analysis mainly because much of our system is built in-house and has a limited use case. It worked really well for the original case we built it for and they made sure everything was really fast – position is fast and then further training is fast but then we have a challenge with the feature- Analysis. Because we have a large data set, it’s not easy to do a functional analysis natively like you would for other use cases. It’s definitely something we always talk about when deciding on a system in our company.
Speaking of systems and integrations at Liftoff, every company goes through build versus buy assessments to see what it can do in-house and what’s worth outsourcing. What are some of your current or future ML systems and infrastructure tools?
Most of the products we currently use are made in-house because the company wanted to move fast. Many of our systems are really lightweight and built for a specific use case. For example, we have an experiment tracking tool that you can use to proceed and see some of the matches of each performance. It’s really simple and can’t really do many fancy things that experiment tracking tools currently on the market can, but it gets the job done.
Right now we’re having a push to try moving to more standardized tools as expansion can be a bit of a pain point. Earlier I think our ML was more focused on the conversion models, but now we have so many other ML applications. For example, the pace of the budget and the market price. But every time we try to build a new model, it’s a bit difficult because of the tight cases. It’s also really difficult to get people on board to use an in-house product or tight cases. For this reason we are also investigating other tools that may be more flexible and applicable to other ML applications in our company.
Do you have favorite tools in your tech stack or things that make life a little easier for you as a machine learning engineer?
I like Trino very much. It’s easy and I can quickly examine data. Our data set is large. So when we want to do impression-level data analytics, it’s very slow, but our product analytics team has created a daily, hourly analytics table that populates raw data into specific dimensions that we care about. It’s nothing special, but I really like it because it’s really easy to look at data without having to wait forever for a query to be executed.
What are some best practices in ML model lifecycle related to model training, development and experimentation?
For training, I think it’s important to have a good protocol. Whenever we experiment at Liftoff we write a report with the whole protocol so everyone knows exactly what we are doing and the system we build also ensures reproducibility. The previously mentioned tracking and sharing of experiments is also an important tool.
Regarding the models in production, I would say that it depends on the type of application. For us, model freshness is important, so we have to make sure that we build such a system, that we can continuously train the model and deploy new models. But when we automate it we also want to ensure some security, so we build a system with automated security checks to make sure we don’t have bad models.
Another best practice for model experimentation when deciding to introduce a model is to not just look at the aggregate, because sometimes when we say that an aggregate-level model is better, there is actually so much more to consider. For example, since our model is used by so many campaigns, it’s always good to see the distribution of impact across all campaigns.
What is important to think about once models are live in production and what impact will they have on people in the real world?
For models in production, we have dashboard data that we use to track metrics and ensure the model is healthy in production. Because Liftoff is a fairly large company, there are teams that help us monitor the health of the campaign. They are more at the forefront and can help us understand if the model is working well. We also take precautions during the test phase. Whenever we develop a model, we run an A/B test. And when we launch, we have a rigorous launch plan with the MLEs, the teams managing the campaign, the technical product manager, and also customer-facing teams. We plan it that way and test carefully so hopefully we don’t have any big surprises once we get into production.
In the ad tech space, you get feedback on your models pretty quickly. For most of your use cases, do you get the ground truths back quickly or are many of the models delayed ground truths to look for drift in production?
Some events are pretty quick. For example, installations are usually pretty quick, but purchases are usually slower. So we have some attribution lag and we have some techniques to correct this in our model training. We get ground truths pretty quickly, but I like to put quotes around “ground truths” because most machine learning models have a problem with feedback loops, and I think in our case it’s probably worse because of the way our model behaves , actually affects what traffic we buy. So there is always some bias in the sample that we see. So yes, we have base truths, but we don’t always know if that’s the base truth for the entire population or just for the sample we’re getting.
Can you share your thoughts on diversity in engineering and what signals whether a company is doing a good job with it or not?
I would say it’s quite difficult to find a diverse engineering group because unfortunately colleges aren’t really that diverse. Liftoff’s engineering team is open to trying it and actively improving it. Crucially, someone takes a more active role in helping the company identify things they can change. Speaking up is important and you know you have a good team when they listen to your feedback, whether it’s negative or positive, and then take concrete action. It’s exciting to be part of the solution.