Spring 5-7-2019

Document Type

Honors Project

First Advisor

Zacharski, Ron

Department Chair or Program Director

Finlayson, Ian

Degree Name

Bachelor of Science

Major or Concentration

Computer Science

Department or Program

Computer Science


As machine learning becomes more influential in everyday life, we must begin addressing potential shortcomings. A current problem area is word embeddings, frameworks that transform words into numbers, allowing the algorithmic analysis of language. Without a method for filtering implicit human bias from the documents used to create these embeddings, they contain and propagate stereotypes. Previous work has shown that one commonly used and distributed word embedding model trained on articles from Google News contained prejudice between gender and occupation (Bolukbasi 2016). While unsurprising, the use of biased data in machine learning models only serves to amplify the problem. Although attempts have been made to remove or reduce these biases, a true solution has yet to be found. Hiring models, tools trained to identify well-fitting job candidates, show the impact of gender stereotypes on occupations. Companies like Amazon have abandoned these systems due to flawed decision-making, even after years of development.

I investigated whether the technique of word embedding adjustments from Bolukbasi 2016 made a difference in the results of an emulated hiring model. After collecting and cleaning resumes and job postings, I created a model that predicted whether candidates were a good fit for a job based on a training set of resumes from those already hired. To assess differences, I built the same model with different word vectors, including the original and adjusted word2vec embedding. Results were expected to show some form of bias on classification. I conclude with potential improvements and additional work being done.