Sam Johnson

Sam Johnson

AI Safety Researcher at Indiana University and Engineering Intern at Groundwork, Inc.

About Me

I am an undergraduate researcher at Indiana University in the Luddy School of Informatics, and I will be graduating with a bachelor's degree in Data Science in the spring of 2024, followed by a master's degree in Data Science in the spring of 2025. I have worked on several research projects as a part of a university research team headed by Dr. M.M. Dalkilic and Dr. Hasan Kurban, which focused primarily on Data-Centric AI and the tools that support the development of ML systems. However, I am now primarily interested in AI safety research, and the current topics of my research are alignment in RL agents and failure modes of LLMs. This work includes analyzing the influence of reward representation in RL training on the propensity for the incoherence failure mode to arise in an LLM trained to play chess. In addition to my research, I am an engineering intern at Groundwork Inc., where I lead several data science projects and implement new software features. My interests include classic literature, philosophy, backpacking, and nutrition, and I always appreciate book and article recommendations.

Conference Papers

Are They What They Claim: A Comprehensive Study of Ordinary Linear Regression Among the Top Machine Learning Libraries in Python

Accepted Paper at 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

We authored a comprehensive survey of current implementations of the Original Least Squares method in popular Python libraries (TensorFlow, PyTorch, scikit-learn, MXNet) to give users actionable information about state of ML. Within this work, we conducted original experiments to analyze the runtime across platforms, space requirement, performance over big data, and strength of model implementation of these popular libraries.

AReS: An AutoML Regression Service for Data Analytics and Novel Data-centric Visualizations

Accepted Paper at 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Our service is intended to enable researchers to make use of ML that can augment data analysis and exploration in their respective fields. AReS allows users to upload data and automatically build dozens of different ML models, each with its own strengths. These models are compared to determine which is most effective, and several, novel data-centric visualizations are presented in an informative report. This is intended to help users better understand their data and the effectiveness of the models over their data. AReS can be found here.

Projects

Incoherence in Predictive Chess Model

UC Berkeley Supervised Program for Alignment Research

Our team is modifying the Decision Transformer architecture to suit the domain of chess, where we will evaluate how different reward representations and RL training schemes give rise to incoherence in the model. Potential extensions to our work includes analyzing chess strategy learned by the model through activation steering.

Resolution Theorem Prover

Independent Research

I implemented a resolution theorem prover from scratch in Python. The script will convert any well-formed formulae in propositional logic into canonical form and then perform resolution to determine if the premises are consistent. Additionally, I constructed a logic interpreter that yields the truth value of any well-formed formulae in propositional logic, given the truth value of the variables.

SitterShare

Startup Business in Shoemaker Innovation Center

As a client in the Shoemaker Innovation Center, I conceived a plan for the new business venture: a service to matchup families with babysitting needs by organizing local babysitting co-ops. I researched relevant markets and competitors, and collected consumer insight through extensive interviews and a tailored survey.

Work Experience

Software Engineering Intern May 2023 - Current

Groundwork - Indianapolis, IN

In this position, I develop, test, and maintain new features of a web application using the Ruby on Rails framework. I established several data analysis pipelines, including an exploration of the company database of client-customer interaction, aimed to increase client lead conversion by 20%, and a prediction of new pricing structure that is expected to increase company ARR by upwards of 6%. Groundwork is a CRM and lead qualification startup with a client-base of 150+ contractors in the home-improvement industry.

Undergraduate Instructor August 2022 - May 2023

Luddy School of Informatics, Indiana University - Bloomington, IN

I was an undergraduate instructor for the Introduction to Computers and Programming course at IU. I gave lessons on various fundamental computing concepts such as choice, loops, comprehensions, search, sorting, recursive algorithms, databases, and object-oriented programming. During office hours, I guided students to solve complex exercises in homework assignments, including writing Python scripts to perform DNA translation, Simpson's Rule for integration, gradient descent, etc.

Administrative Clerk May 2021 - August 2021

Johnson County Land Title - Franklin, IN

I composed title search reports and insurance policies for real estate transactions in an efficient manner, resulting in the erasure of a 60-day policy backlog that had been impacting company reputation of timeliness. I independently researched, proposed, and implemented a new procedure for data entry in the creation of title search reports that automated a tedious manual step and reduced process time by approximately 30 percent.

Hobbies & Interests:

Backpacking, Cooking, Classic Literature, Health & Nutrition, Weightlifting

What I'm Reading:

Infinite Jest by David Foster Wallace

In Defense of Sanity: The Best Essays of G.K. Chesterton

Doctor Zhivago by Boris Pasternak

Gödel, Escher, Bach by Douglas R. Hofstadter

The Odyssey by Homer

Favorite Personal Photos: