Reinforcement Learning, Leaderboards, and other ML activities

Author: Larry Qiu

Posted: December 20, 2025

Some ML-adjacent projects I worked on over the past semester.

Real World Reinforcement Learning

Report: report.pdf

Presentation: https://www.youtube.com/watch?v=t8pFxsfnhz8

Code: https://github.com/LQ1234/cs138-code

One of the more promising offshoots of the current AI boom are the emergence of generalist robots and the policies controlling them. Many of them have humanoid forms, like Figure’s O3 and Tesla’s Optimus, but the main criterion is the ability to perform a diverse range of tasks, usually prompted by language. Current generalist policies, however struggle to perform at the same speed and reliability as human equivalents, even using the same hardware through teleoperation. For my final project in CS138: Reinforcement Learning, I explored how Reinforcement Learning can applied in the real-world to directly address these limitations.

For more details, see the report (attached above), but essentially I fine-tuned Physical Intelligence’s model for a simple block picking task, then performed a GPRO and REINFORCE inspired policy gradient procedure in real time, which decreased the median task duration from 15 seconds to 10 seconds. There were plenty of hurdles that I had to get around to get this to work, including lots of infrastructure modification (this project would not have been possible without Tufts’ cluster), SFT problems, and hours of watching the robot try, fail, and sometimes even succeed.

Here are the videos mentioned in the report:

Curled Up Failure Mode

Final Model Success

Training Timelapse

Snake Failure mode

Final Model Success Multiple Attempts

Intro to Machine Learning Leaderboards

This semester I also took CS135: Intro to Machine Learning, and one fun part of the class are the project leaderboards, where students compete to train the best model on a specific task for a small portion of the course grade. There are two projects, one about assigning reading difficulties given a book excerpt and another about predicting movie ratings.

For no good reason, I became obsessed with creating the best model in the class, and I ended up spending an ungodly amount of time training models and optimizing hyperparameters to secure that number one spot.

In the end I ended up #1 on both leaderboards for the first project and #6 on the only leaderboard for the second project (which is still not bad in a class of 80 students). I won’t be sharing how I achieved those results for the sake of future students, but a general tip is to ensure that you are performing the train-validation split properly and once done, try as many methods and hyperparameter combinations as possible. For the first project, I also had a ✨special insight✨ that I believe no one else figured out that greatly improved the performance of my model. (Hint: read the paper that describes how the dataset was generated.)

Anyway, was this a massive waste of time? Possibly. Did I beat Jesus on one of the leaderboards? Absolutely.

EE110 Project Development

Before the start of this semester, I also worked with Professor Mai Vu on developing class projects for her new class Optimization in Deep Learning. The goal was to demonstrate some key machine learning results and package it in a way where a student can figure it all out and write a report on it within two weeks.

I ran hundreds of training runs, exploring parameters such as: