Request edit access
Data Science Project Interest Fall 2025
We're very glad you're interested in joining a project with the Data Science Institute. Please fill out the following form giving us some additional information about your interests and background. All projects begin with a required training during the first week of classes. Projects marked as "Full Semester" continue through the semester. Other projects will have start and completion dates that are variable, and you may be able to participate in one or more projects depending on scheduling. The deadline to apply is Monday, January 5 at noon. For more information about the projects, you can view the information session.
Sign in to Google to save your progress. Learn more
Email *
Participant Information
A little about you!
Last Name *
First Name *
Project Interest: Selections *
Check all boxes for the projects that you're interested in joining. If you select multiple projects, use the next question to fill in your highest ranked project choice. You can add additional information about your selection in the following box. The required and preferred skills are noted below.
Please note that not all projects may move forward, depending on student interest and DSI team availability.

Required:
Experience with AI Models (Inference or Training - e.g., AI Summer, AI Winter, Transformers courses, DSI Bootcamp)
* A class in Python (e.g., DS 1100 for undergraduates)
Preferred qualifications:
* Prior experience with reading and using APIs
* Deep experience with programming (e.g., undergraduates should have taken two courses requiring programming)

____________________________________________________________________________________________________________

AI Driven Survey Analytics
PI - Dr. Joshua Clinton, Political Science

This project will use the Vanderbilt Poll, a long-running initiative led by Josh Clinton, as the foundation for building an AI-powered platform that transforms how survey data can be explored and understood. By leveraging large language models, the tool will ingest questionnaires, toplines, and micro-data, then generate visualizations, suggest related questions from past surveys, and surface subgroup breakdowns—while enforcing statistical guardrails to ensure rigor. Starting with a decade of Vanderbilt Poll results, the project will establish a framework for future polling analytics, ultimately enabling any survey to be uploaded and interactively analyzed. The goal is to empower non-coders, researchers, and the public alike to discover insights, compare trends, and promote transparency in survey research.

____________________________________________________________________________________________________________

Cascading Heat Assessments on Infrastructure Networks
PI - Dr. JB Ruhl, Law School

Extreme heat events pose grave challenges to cities and their citizens. Leveraging Vanderbilt's previous work on capturing and analyzing extreme climate policies and plans, the Cascading Heat Assessments on Infrastructure Networks will examine a city's resilience to such events using AI and digital twins to assess the impact of such events given a city's infrastructure. Providing such analyses will provide realistic guidance on steps to take to save lives in the future.
____________________________________________________________________________________________________________

Automatic Stuttering Detection
PI - Dr. Robin Jones, Medical School

Stuttering affects about 70 million people in world, approximately 1% of the world's population. About 5-10% of children stutter at one point in their lives, with a quarter maintaining their stutter throughout their lives. However, it has been shown that speech therapy in early stages can have upwards of an 80% success rate in alleviating stutter. In this project, the team will work with VUMC researchers to develop models that leverage audio, text, and other types of data to classify and better understand the nature of stuttering. This project aims to leverage automatic speech recognition models, audio models, and other multimodal models to enhance early diagnosis and inform treatment planning.

____________________________________________________________________________________________________________

Multimessenger Astronomy
PI - Chayan Chatterjee, Abbie Petulante

Multimessenger astronomy combines different signals from the Universe—such as gravitational waves, light, and neutrinos—to study extreme cosmic events. Gravitational waves, in particular, allow us to directly probe collisions of black holes and neutron stars, offering insights into the nature of gravity and matter under the most extreme conditions. However, gravitational wave signals are faint and buried in enormous volumes of detector noise, and connecting them with other messengers requires fast and reliable analysis. Artificial intelligence is becoming essential in this field: it can detect weak signals in real time, match them with observations from telescopes or particle detectors, and help reveal the full astrophysical picture. 

Project 1: This project will explore applying GW-Whisper, an adaptation of OpenAI’s Whisper model for gravitational wave data analysis. Students will help optimize and apply GW-Whisper to search for intermediate mass black hole mergers in gravitational wave data and use machine learning methods to estimate the physical properties of these cosmic events.

Project 2: In this project, students will build interactive AI-driven visualization tools to showcase how GW-Whisper and other machine learning models perform in detecting and analyzing gravitational wave and multimessenger signals.

____________________________________________________________________________________________________________

Small Language Model Reasoning Training
PI - Abbie Petulante

Small language models can be trained to use reasoning to extend their capabilities significantly. We will explore the extent we can train models to perform novel reasoning tasks in specialized fields requiring challenging and unique reasoning. In this critical, foundational work, we will examine how far small language models can be trained to match, and perhaps exceed, much larger models on specific reasoning tasks.

____________________________________________________________________________________________________________

AI Governance Research Initiative (with Brookings/Lawfare Partnership)
PI - Mark Williams, Vanderbilt AI Law Lab, Brookings Institution (Lawfare)

This initiative advances two distinct but complementary projects: the next phase of the AI Legislation Tracker and the new AI Litigation Tracker, developed with Brookings’ Lawfare. While each tracker will stand independently, the litigation tracker will build on the methodologies, code base, and visualization frameworks established through the legislative tracker. Together, they will generate rigorous analysis and advanced visualizations of both legislative and judicial developments, creating mutually reinforcing tools that enhance understanding of the evolving AI legal and policy landscape.

Legislation Tracker - This project will focus on refining an existing code base - enhancing the existing tool with visualizations that help track legislative progress over time. Additionally, we will also build a deep research agent using LangChain, similar to the deep research functionality on ChatGPT, but focused on AI-related bills across the country. 

Litigation Tracker - This project will develop a new tool, leaning heavily on the existing Legislation Tracker codebase and work with various APIs and data sources provided by Lawfare. 

____________________________________________________________________________________________________________

National Security Model Assessment Pipeline Project
PI - Institute for National Security, Brett Goldstein, National Security Assessments Project SENTINEL

The National Security Pipeline Project is an automated evaluation framework designed to rapidly assess the capabilities, differences, and national security risks of new AI models and disruptive technologies. By combining technical capability analysis, AI-augmented brainstorming, and scenario-based simulations, the system not only identifies potential malicious uses—from cyberattacks and autonomous weapons to disinformation campaigns—but also characterizes how each new model or technology differs from prior iterations. High-risk findings are automatically routed to domain experts for deeper review. The vision is to establish the Vanderbilt Model Threat Assessment Center, a trusted, independent hub that provides authoritative evaluations of new models and technologies. With rapid-response briefs, in-depth analyses, and outputs formatted for both human and machine use, the pipeline delivers continuous, actionable intelligence to help national security leaders stay ahead of evolving threats.


Required
Project Interest: Details and Other *
Please select the project name that you're most interested in here, especially if you have indicated multiple projects.
Project Involvement *
Each project has multiple individual contributors who code, work with data, train models, etc., AND has one project manager who tracks the work, keeps the agile process running, and helps run the project. 
This semester, we are also offering the ability for a student to take on the role of Technical Lead (as opposed to a DSI staff member), who uses their expertise to decide specific project goals, tasks, and implementations.
Which role are you interest in for the projects indicated above?
On average, how many hours do you plan on dedicating to the project each week this semester outside of the meeting times of the project? *
Projects usually have 1.5 - 2 hours of meetings per week max, with a 45-60 minute demo session at the end of the week.
Programming background *
Please briefly describe your background programming and the language(s) you've used. If you multiple years of experience, feel free to simply state "multiple years". This will help us build a team with a good skill mix from beginner to expert. 
Other data science background
Please share any background you have in data science--projects you've worked on, internships, jobs, or any other experience you'd like to share, especially with team-based data science. 
Generative AI experience
Please briefly describe any background you might have working in AI.  
Project Skills *
Please indicate all skills for which you've had instruction, including through VU courses, DSI workshops and intensives, clubs or other:
Required
GitHub username *
We'll use GitHub for collaboration and project management.  If you don't already have an account, visit https://www.github.com to create one, and enter your username here.
Is there anything else you'd like us to know while considering your application?
If you don't have some of the pre-requisites or have a particular background or passion for a particular project, this is a great place to provide more explanation!
Organizational Affiliation *
Next
Clear form
Never submit passwords through Google Forms.
This form was created inside of Vanderbilt University.

Does this form look suspicious? Report