Academic Background

University of Oxford

(October 2018-September 2019)

Oxford, England

At a Glance:

College: Trinity College

Major: Computer Science

Concentration: Machine Learning and Artificial Intelligence

Degree Type: Master of Science (MSc)

Degree Class: Distinction (highest possible honors)

Award Date: 4 October 2019

Graduation Date: 18 January 2020

Blazon of Trinity College

Dissertation

For more on the dissertation, see Projects.

I knew that I wanted my dissertation to focus on a project that incorporated AI and ML but also had a positive impact on average people. After doing some research on explainable AI, I came to this project (called Mockingbird). The abstract is as follows:

"As algorithms are increasingly used to make potentially life-altering decisions the importance of ensuring they meet certain standards of fairness, accountability, and transparency is becoming paramount. Complex models that are poorly understood are susceptible to bias, relying on protected or irrelevant features, and adversarial examples. This concern has spawned interest in explainable and interpretable algorithms which allow humans to understand algorithmic decisions. Running parallel to this issue is an increasing reliance on algorithmically generated profiles to target advertisements online. Often these profiles are built on sensitive characteristics, without user knowledge or means of recourse. They can influence user perceptions of self, alter the content users are exposed to, and reveal to other parties characteristics which users have chosen not to share. This project aims to increase user autonomy and privacy online by leveraging techniques in explainable machine learning.

This work begins by creating a prototype tool called Mockingbird that builds more than twenty algorithmically generated profiles based on Twitter data. These profiles are created using various techniques, including lexicons, neural networks, and Gaussian Processes. Mockingbird offers explanations for most of these profiles, primarily using a post-hoc explanatory tool. Finally, using synonym suggestions and techniques used to generate adversarial examples, Mockingbird gives users suggestions on how to alter their tweets in order to change their algorithmic profiles. Mockingbird is novel in combining profiles, explanations, and tools to change algorithmic profiles.

The system was tested through several (n = 6) lab experiments to gauge user response to explanations and willingness to alter their data. It appears that users are generally not concerned about their algorithmic profiles when used for advertising, but would be much more concerned in higher stakes applications. Users were also unwilling to change their existing data to control algorithmically generated profiles, viewing the necessary changes as too disruptive to how they use social media. This work points to avenues of further research focusing explicitly on editing social media data to influence high stakes decisions such as job, visa, and loan applications."

The full text is available on my GitHub, the Oxford University Research Archive, and the Department of Computer Science website (requires Oxford login).

Relevant Coursework

  • "Advanced Machine Learning" - A new course focused on deep learning applications in NLP and Bayesian approaches to model fitting. See Projects for more information on the final project.

  • "Artificial Intelligence" - An introduction to AI, with a focus on search algorithms and problem formulation. Several practicals in Java.

  • "Computational Learning Theory" - A high-level approach to the bounds of machine learning, with a focus on Probably Approximately Correct (PAC) learning and its variants. Heavy on proofs, challenging but interesting.

  • "Computers in Society" - An interdisciplinary look at the ethical questions raised by computers and information technology in the modern age.

  • "Databases" - A general course on databases, using PostgreSQL. Covers the theory behind the design of database systems (relational algebra, relational calculus, etc), the actual practical SQL programming, and some information on how SQL works behind the scenes.

  • "Database Systems Implementation" - An expansion of the Databases course from the previous term, with a focus on implementation in C++. Final project involved implementing concepts not yet standard, such as skew-sensitive processing (conducted in Python using the pandas library).

  • "Foundations of Computer Science" - Since this degree is a conversion from Operations Research to Computer Science for me, I took this course to ensure I have all of the underlying theory I need. Focuses on basic models of computation (DFA, PDA, Turing Machines), reductions, and some complexity theory.

Princeton University

(September 2014-June 2018)

Princeton, New Jersey

At a Glance:

Major: Operations Research & Financial Engineering (ORFE)

Minor: Applications of Computing

Degree Type: Bachelor of Science in Engineering (BSE)

GPA: 3.9/4.0

Honors: magna cum laude, Tau Beta Pi, Sigma Xi, S. S. Wilks Memorial Prize

Why I Chose My Major

My major, Operations Research and Financial Engineering (ORFE for short), is a bit of an odd one. My interest in it mostly comes from the operations research end. I saw the degree as a mix between computer science, data science, optimization, statistics, probability, economics, and finance. This appealed to me, as I knew I could get the foundations of computer science in addition to a formal and rigorous education in these other quantitative disciplines. I'm most interested in topics where different fields interact and together work to solve a complex problem, and this program gave me several tools and approaches in this vein that I may not have received from a pure computer science degree. Perhaps as importantly, understanding data is a core part of any machine learning approach and ORFE is at its core a data-driven major. My senior thesis (see below) drew its conclusions using machine learning, but my ability to reason critically and concretely about data was equally as important to the project's success. By minoring in Applications of Computing and now pursuing a further degree in Computer Science, I am able to leverage what I've learned in ORFE to better solve problems in the computer science field.

Senior Thesis

For more on the thesis, see Projects.

As part of their senior year, every Princeton student (with a few departmental exceptions) is required to submit a thesis. These tend to be year-long research projects conducted one-on-one with a departmental advisor. For my project, I tried to apply machine learning to the classification of "fake news" and satire. It was important to me to work on a project that was both relevant and interesting to me and this problem caught my attention early on. Here is the abstract:

"The problem of false news articles has recently surged to the front of political discussion, particularly in the United States. Misinformation comes from a wide range of sources and major social media companies such as Facebook and Google have taken steps towards reducing the spread of so-called “fake news.” By their nature, many such deceptive news articles are difficult for humans to identify, as there may be conflicting reports, different interpretations of information, or a widespread distortion of the facts as the story circulates. While the notion of objective truth is one best left to the philosophers, it may be possible to make in-roads by studying a related category of articles: satire. Satirical articles often arise as part of the discussion when they are taken as fact by their readers and shared in a way to confuse a large part of the public. Many guides to “fake news" contain mostly websites that claim to be satirical. This is often not easy to verify as the satirical disclaimer may be hidden deep in the website's description. The ability to reliably and automatically categorize this subset of articles from the main body of news would be a useful tool to warn readers not to accept the article as fact.

As standards and norms for politics and society change, so too must standards for news and satire. For this reason, this thesis considers a number of subsets of the data, based on date of publication. By considering both the corpus as a whole and the subsets individually, the goals of this paper are to A) develop an effective machine learning approach to identifying satire and B) see if a changing political and social climate has affected news and satire in a way discernible to a machine learning algorithm.

For the purposes of labeling the data, articles coming from sites explicitly claiming to be satirical are labeled as “satire,” and the rest "serious.” The problem of separating satire from serious news is analogous to separating valid email from spam. For this reason, this paper uses many of the most common techniques from the field of spam filtering such as a Support Vector Machine with a linear kernel. The SVM uses a number of features established other works, chiefly a bag of words, and two new features based on links to Twitter and other websites. This thesis also implements a deep learning approach with a C-LSTM.

The SVM with all features consistently achieved over 99% accuracy, 95% precision, and 96% recall when comparing satirical articles to serious ones. The C-LSTM achieved just under 99% accuracy with about 90% precision and recall. It was found that the "fake news" category is easier to separate from serious news but may share similarities with satire. Lastly, this thesis found that the date of publication is a significant factor in identifying satirical articles and that serious news from the past two years may be more similar to older satirical articles than previously."

Semester Abroad

Semester Abroad:

As part of my time at Princeton, I spent one semester at Worcester College of the University of Oxford. This amounted to Hilary and Trinity Terms and consisted of four courses in Mathematics and Computer Science. These courses introduced me to Oxford's individualistic style of learning and emphasis on self-driven study. Each course was taught one-on-one and pushed me to explore on my own. In my Machine Learning course, I was expected to learn basic Python, JavaScript, and HTML in just one week so that I could finish practicals and work on my research project. I found the approach a good fit and the work very rewarding, which is why I decided to return to Oxford for my Master's.

Relevant Coursework

At Princeton:

  • "Computing & Optimization for the Physical and Social Sciences" - This ORFE course mixed proof-based math with MATLAB to teach approaches to solving convex optimization problems. This included regression, convex analysis, semi-definite and linear programming, gradient descent methods, relaxation of hard or intractable problems, and reduction.

  • "Information Security" - This class covered topics in online and occasionally physical security. It melded theory with current applications and projects (in Java and Python) included implementing the Diffie-Hellman key exchange algorithm, using SQL injections and CSRF attacks to gather information from websites, and using computer forensics on a simulated target computer.

  • "Data Structures & Algorithms" - Based in Java, this course focused on fundamental data structures (trees, graphs, lists) and algorithms (sorts, flow on graphs). It also included higher level methods for thinking about the efficiency of these approaches (time/space complexity, reduction, tractability).

  • "Intro to Programming Systems" - Using C and occasionally x86-64 assembly, this course taught the lower-level basics of computers, including pointers, threading, and memory allocation. Projects included creating a buffer overflow attack and building a Linux shell.

  • "Regression & Applied Time Series" - This ORFE class used a financial lens to consider concepts such as the Gaussian Copula, quantiles, Monte Carlo simulation, basic regression techniques (autoregressive, moving average, least squares), and time series. Projects written in R included both simulations and analyses of real world data.

  • "Transportation Systems Analysis" - An ORFE course that covered the history and future of transportation in the world, largely from an optimization perspective. A major focus was on autonomous vehicles, and I worked on a project in Java for assigning routes to hypothetical autonomous taxis in a simulated city.

  • "Probability & Stochastic Systems" - Cross-listed with the Math department, this ORFE class provided a rigorous mathematical approach to topics in probability including Poisson processes, Brownian Motion, and Markov Chains. After completing this class myself, I tutored another student in it through the university.

  • "Fundamentals of Statistics" - An introduction to probability and statistics in MATLAB.

  • "Networks: Friends, Money, and Bytes" - An Electrical Engineering/Computer Science course that addressed twenty major topics in modern networks, including PageRank, rating systems, online courses, and Internet protocols. The final team project used MATLAB to analyze the New York City subway system as a graph, finding which stations and lines were most important to overall network flow.

  • "Linear Programming & Optimization" - A course dedicated entirely to linear programming and its applications, written in AMPL. The final project used SVMs to process handwriting.

At Oxford:

  • "Machine Learning" - An introductory course for ML, we covered topics such as regression, SVMs, neural networks, collaborative filtering, regularization, and LASSO. A large part of the class was centered around my project which used a neural network and collaborative filtering to attempt to predict my Facebook friend's political leanings from their page likes - see Projects for more details.

  • "Computational Game Theory" - This was a high-level approach to game theory, covering topics such as Nash equilibria, mixed strategies, voting methods, and auction theory.

  • "Graph Theory" - A class in mathematics, it provided a basic foundation for graph theory including coloring, Euler paths, graph searching algorithms, trees, and reachability. It also helped me to formalize my proofs.

  • "Computational Linguistics" - An introduction to the field, this class focused on how to consider language computationally, focusing mostly on vectorization.

Other Activities

  • McGraw Center Tutor. For three years, I worked weekly at the McGraw center, teaching current Princeton students courses I'd already taken. These included Calculus I and II, Linear Algebra, Physics I, Microeconomics, and Macroeconomics. I enjoyed these sessions and believe they helped improve my ability to communicate with others and explain complex ideas simply.

  • Global Ambassador. After my semester at Oxford, I volunteered a few times to be a resource for other students considering study abroad. This included meeting directly with students, being on panels, and communicating over email.

  • Tau Beta Pi. My senior year I joined Tau Beta Pi, an engineering honor society limited to the top 1/5 of the engineering class. As part of this program, I led a weekly tour of the Engineering Quadrangle to prospective students, explaining the kind of work each department does and what my experience had been.