About the Project | Brightpath

Problem & Motivation

“Undermatching” is the phenomenon of high-achieving students attending colleges with low-selectivity standards (e.g., low or no SAT requirements). This is a significant issue in higher education, particularly among high-achieving, low-income students. A lack of information and access to people with relevant experiences lead to undermatching. Existing solutions are expensive and rely solely on teachers or the students themselves. Undermatching results in lower graduation rates, frewer career opportunities, reduced career earnings, and lower social mobility.

Our Mission

To provide students and educators a powerful self-serve tool to surface high quality schools that match a student's achievement abilities. With the use of machine learning models to automate and expand the resources available to teachers and students.

How it Works

A web based application that is hosted on BrightPath's servers and allows an educational institution to directly integrate their school directories for evaluation.

Students - Login using their unique student ID and password to retrieve their academic profile and generate a tailored list of universities. They will also have a direct line to their assigned educators to setup meetings or send messages for counseling sessions.
Educators - Login using their unique faculty ID and password. They will have the ability to search for students in their academic cohorts to review recommendations and facilitate counseling sessions.

Data Sources & Data Science Approach

We used publicly available longitudinal studies from the National Center for Education Statistics (NCES): High School Longitudinal Study (HSLS) of 2009; Education Longitudinal Study (ELS) of 2002. These datasets provided individual, de-identified data from students and contains their information from high school up to 10 years from initial data collection. This dataset provided the base inputs for the targeting model to identify students that are "undermatched."

We developed two components for BrightPath:

Targeting model - an ML model that processes individual student data with select features to identify if a particular individual can be classified as undermatched
Recommendation model - an ML model that leverages the output of the targeting model and produces a list of relevant and recommended colleges for a specific student

The recommendation model was built from collaborative filtering using KNN to group recommended colleges into three tier relevancies. Synthetic student data was used to ensure a large enough sample size was processed for KNN.

Model Evaluation

Several classification machine learning models were evaluated to enhance the identification of students at risk of undermatching. Various probability thresholds were tested to optimize the model, ensuring a more accurate prediction of these students' needs. This approach allows for a tailored intervention strategy, ultimately supporting students in achieving their full potential. By our models, we aim to provide targeted assistance and improve educational outcomes.

Random Forest showed consistent performance across different probability thresholds. To minimize flase negatives, the optimal probability threshold was 10%.

For the number of features, the max depth of 3 showed strong performance while 10 features had a balanced model outcome.

Architecture

BrightPath's underlying is AWS SageMaker AI connected to a Streamlit interface from end to end.

Future Work

Data access: Substantial funding is required to purchase masked student data such as SAT scores through the National Student Clearinghouse. Having access to this rich dataset would enhance the accuracy and precision of our model
Partnerships: Receiving real world feedback to fine tune the recommendation algorithm will require key partnerships with organizations that work directly with undermatched students such as the non-profit SEO Scholars.
Beta test: Collaborate with schools and organizations like Naviance to test implementation and integration with live student data.