
Data Collection
The data set used to complete this project is comprised of data from four data sources: eBird, NCEI, FEMA, and the Census Bureau. This is how we collected the data.
We acquired the data from multiple sources. The main data source that we are using is eBird. We acquired the API key for this data for the state of Colorado for the dates of 2021-01-01 to 2025-12-31. We made repeated requests from the eBird API for each data between the date range. Since we had to do so many requests we stored the data in a cache if there was an interruption. Because of that there is a delay for the rate limits so we use an async library so we could run multiple tasks at the same time. We also pulled the API in ebird for the bird taxonomy which was a separate API so that we could do more sorting and classification during preprocessing. This data set is relevant to our research question because it is the main source of data for bird counts, locations of birds and bird species. So this assists in answering all of our research questions.
​
The next data set that we acquired was the weather station data through NCEI’s API key. We collected the weather station data for the dates of 2021-01-01 to 2025-12-31. We then got the specific weather data from the Global summary in NCEI for each month using GSOM. We filtered the data by the Colorado location id, by date and then data type. There were some limitations to this API. We could only get 1000 records per response so we had to update the parameters and send another request to the API to retrieve the additional data. This data set is relevant to our research question because several of our questions relate to weather data.
​
We then pulled the data from the FEMA API. This data is the declared natural disasters between the dates of 2021-01-01 and 2025-12-31. We pulled all the data for those dates for Colorado. This data set is important because it assists us in answering our research questions related to bird populations and natural disasters.
​
From the Census Bureau we pulled the urban population data. We pulled this from the Census Bureau API for the year 2020. This data is only taken in the census every 10 years so the most recent data was from 2020. We also put in parameters to only pull Colorado and to sort by county. This data set is relevant to our research questions because it helps answer our questions related to the population of birds related to urban development.