Missing data is a common problem in data analysis. One of the successful approaches is k-Nearest Neighbor (kNN), a simple approach that leverages known information to impute unknown values with a relatively high degree of accuracy.
In this gentle introduction to kNN, we discuss the key concepts and applications, including:
- Measuring distance
- Selecting the optimal k
- Common use cases for kNN, including predictive regression models
- Advantages and disadvantages of the kNN approach
Note: This training is an exclusive benefit to members of the Statistically Speaking Membership Program and part of the Stat’s Amore Trainings Series. Each Stat’s Amore Training is approximately 90 minutes long.
About the Instructor
Richard Dunks is the founder of Datapolitan, a small business focused on helping public sector organizations cultivate a more data-driven culture through training, project work, and strategy consulting. Richard has over 6 years of experience as a data analyst in the public sector and since graduating from NYU’s Center for Urban Science and Progress in 2014, has been working with innovative local government agencies and forward-thinking non-profit organizations across the country. Richard is also an instructor with the Center for Government Excellence at Johns Hopkins University, teaching data governance, performance management, and other key topics in local government innovation to public sector employees across the country as part of Bloomberg Philanthropies What Works Cities initiative. A Las Vegas native, Richard currently lives in Dallas, TX.