Using Machine Learning to Find Environmentally At-Risk Communities

10 Pages Posted:

Date Written: December 1, 2016


Environmental health persists as a genuine concern in many US localities. However, public agencies often face limited capacity and resources to collect comprehensive environmental health data. Inspired by CalEnviroScreen, an environmental health assessment tool used to identify environmentally at-risk communities in California, I calculate pollution burden scores at the census tract level for the entire contiguous United States. Pollution burden is a composite score that encompasses 12 environmental (air, water, waste) indicators. I combine the actual pollution burden indicator data with predicted statistics using machine learning. I create an interactive, publicly accessible National Pollution Burden Map using ArcGIS Online. Although applied to US states, the same approach can also be applied to other regions of the world.

Keywords: environmental health, environmental justice, pollution, machine learning, United States

Suggested Citation

Shen, Shiran Victoria, Using Machine Learning to Find Environmentally At-Risk Communities (December 1, 2016). Available at SSRN:

Shiran Victoria Shen (Contact Author)

University of Virginia ( email )

1540 Jefferson Park Ave
S183 Gibson Hall
Charlottesville, VA 22904
United States


Register to save articles to
your library


Paper statistics

Abstract Views
PlumX Metrics