“FireCaster”: How Can We Use Data to Predict Domestic Fire Risk and Save Lives ?
Domestic fire incidents remains one of the leading causes of death that threatens human safety and increase the urban vulnerability 1. It is considered as a real burden on economies’ shoulders around the world due to its devastating consequences that can turn a big full of life city into ashes in no time.
To address this problem, we developed
Firecaster, a predictive analytics system for forecasting and prioritizing Fire Inspections in the city. The system provides insightful recommendations to help fire departments better plan both their fire fighting strategies and their awareness campaigns, and also help insurance companies to deal with damages and less importantly help indoor and outdoor fire prevention and alarm systems to better identify their customers.
Context: The societal changes and the economical damage brought about by domestic fire incidents
Domestic fire incidents may not be on everyone’s minds, but they happen more often than we realize. Each year numerous families and even entire communities are affected by fires in the US and all over the world. Over the year’s things have been put in place to lower the amount of fires occurring each year.
A major domestic fire incident doesn’t only touch the victim. It touches the entire community. Not only can major domestic fire incidents kill and injure, they can destroy communities socially and economically. Take the recent Fort McMurray fire in Alberta, Canada. Almost 1.5 million acres were destroyed and 88,000 people were evacuated. 2,400 homes and businesses were destroyed, making it the costliest natural disaster in that country’s history. The communities affected were paralyzed economically and socially, essentially turning them into temporary wastelands. Moreover, the US Fire Administration took comprehensive figures from the year 2013. In that year, there were 1.2 million fires resulting in 3,240 deaths and 15,925 injuries. The overall level of economic loss was $11.5 billion. While these numbers are large, they represent major progress when compared with figures from 2004. The number of fires, since 2004, has dropped by 21.6%. The number of deaths and injuries, since 2004, has also fallen by 21% and 6.6% respectively. And total economic loss is down by 10.1%. The drop has happened due to government awareness campaigns, an increase in the number of fire alarms, and a renewed focus on firefighting over fire prevention. It should also be mentioned that since 1986 the number of firefighters per 1,000 of the population has remained constant (source).
Despite major increases in population numbers, the size of America’s fire departments has risen at the same rate. This has enabled a constant level of protection over the previous three decades.
But How Does the US Compare to Other Countries?
The number of deaths in 2012 in the United States from fire was 2847, according to the WHO. The only major countries surveyed that had more deaths were Pakistan, China, Russia, and India. Other first world nations had far fewer deaths in 2012. Great Britain, for example, had 335 deaths. France had 488 deaths. Germany had 422 deaths.
However, when population differences are considered, the United States is matching other first world countries in reducing deaths by fire. They have also seen reductions in the number of fires in general. From 2010 to 2014 the United States saw almost a 30,000 decline in the number of annual fires. Great Britain saw about a 90,000 decline in the number of annual fires. France saw about a 60,000 drop in the number of annual fires.
How Much Does the Government and Insurance Companies Spend on Domestic Fire Incidents?
Surprisingly, the average amount lost per property after fire has remained relatively unchanged since 1977. One study on fire loss in the United States revealed that in 1977 the average loss was $14,600 and in 2015 it was $20,700. The United States Fire Administration budget has declined in recent years, with the 2016 budget being $41.582 million, a 5.6% decrease compared to the previous year. But that’s also coupled with a decrease in the cost of fire prevention equipment. Take home sprinkler system costs as an example. The cost in 2013 was a mere $1.35 per square foot, compared to $1.61 in 2008.
Actual spending for fire prevention campaigns have decreased, whereas firefighting spending has increased. The Hazardous Fuels Reduction Program in Montana was given $500 million in 2012, up from $421 million in 2002, but with inflation this is a decrease. Similar trends have been seen across the country.
According to the Geneva Association, losses for insurers have also been in decline, with (in millions) insurers shelling out 15,500 million in 2008. This number declined to 11,600 million in 2010, bucking the trend of other developed nations, which saw insurer losses remaining steady.
The main tools the government has are raising awareness and designing effective fire fighting strategies. There are no laws regarding any specific fire prevention equipment people must have in their home. It’s not a legal requirement to install a smoke alarm, for example. Identifying and strategically targeting people and places with hight risks of fire is the key component in reducing fire deaths, injuries, and economic loss.
Our project builds on top of a traditional stream of literature in social science that focused on studying the connection between fire incidents in urban zones and socio- demographic factors 123. The early attempts revealed the underlying domestic risk factors that can be considered
(1) Building conditions: The litterature showed that the age of the building and the materials that are used inside contribute strongly to ingestion of domestic fire accidents.
(2) Demographic and economic characteristics: while the overall demographics of neighborhoods differ from one to another, most of the people inhibiting the same neighborhood share relatively similar income, similar educational level, along with other demographic characteristics as they tend to live the same life style 4.
(3) Weather conditions: we hypothesized that the various weather features, in particular temperature and precipitation, would be a good proxy on the use of heater, AC. Other weather features like snow, wind, rain would commonly reflect some human behavioral attitudes towards staying at home.
However, cities are highly heterogeneous, and commonly unequal in regards with social behavior of their citizens, and the structure of their architectures among other dimensions. Our capacity to understand and evaluate our methods is bound by availability of contextual relevant data in different cities. To overcome the challenge of accessing to local and real data, our study test the theoretical models using the public data from the city of New York. In this project, we conducted a series of experiments to demonstrate how we collected, filtered and merged different sources of data from open data portals (New York City as a case study) to develop our predictive dashboard.
In this section, we describe each of the datasets that we have used. This project relies on open data sources from the City of New York. All of the data is publicly available, and pulled using the city's open data APIs. The main data sources that we used in the project are the following:
Fire Incident Dataset - FDNY Incidents: The dataset contains the daily detailed information about fire incidents that are handled by FDNY Fire units. The dataset has been collected using New York Fire Incident Reporting System (NYFIRS), which has been developed by the FDNY to provide data to the National Fire Incident Reporting System (NFIRS). This later is a modular all-incident reporting system designed primarily to understand the nature and causes of fire, as well as civilian fire casualties and fire-fighter injuries. The raw dataset contains 1.33M records spanned between 1st January 2013 and 31th Dec 2015. This dataset includes information on where and when incidents have occurred, and what resources have been used to mitigate it. The limitation we faced when we used this data source is this later doesn’t provide information on the building location, but the street address.
MapPluto (Primary Land Use Tax Lot Output) is a dataset provided by New York City Department of City Planning (NYCDCP), this dataset has been created by merging the Primary Land Use Tax Lot Output (PLUTO) database and geographic boundaries of tax lot features from the Department of Finance’s Digital Tax Map. It contains geographical features and information on land use, buildings’ age, number of units, and lot size.
Census: The Census’ American Housing Survey is an annual panel conducted by the Census to track highly-specific details about households. As summarized in the literature review, there are some socio-demographic factors that relatively contribute to fire accidents.
Street Data a digital vector file of public and private roads and streets of NYC. The dataset is maintained by the state Department of Transportation.
Open Street Map is an open source community project dedicated to mapping the world through community contributions. In order to obtain buildings’ characteristics, we query, filter, and parse OSM API to get relevant information on buildings as well as other geographic features related to roads, buildings, Points Of Interest.
Weather Data We obtained New York historical weather information through Weather Wunderground developer API. We collected daily weather summaries for our study period, and selected the following features: temperature, dew point, precipitation amount and type, visibility, wind, fog, pressure and humid- ity. For some features such as temperature, we obtained more granular data by taking minimum, mean and maximum values.
In order to study and analyze spatial patterns, we started investigating the different options taking in consideration the ability to analyze distinct spatial units. US Census Bureau has designed a hierarchically nested spatial units to overcome any geographical overlap and to easily perform longitudinal analysis. In our study, the spatial granularity is defined based on the census tracts or the urban block. Census tract is a geographic unit used for census on demographic characteristics. It links geographic areas with socio-demographic dataset, to reflect the structure of homogeneous urban form regulated by current zoning ordinances. The urban block is a homogeneous physical territory bounded by streets. Census Blocks are distinct inside Census tracts, according to the 2010 Census, Manhattan is divided into 288 census tracts and 2870 urban blocks based on the PLUTO dataset. The advantage of adopting these spatial units is that both census tract, census blocks are officially used to document social and demographic statistics.
Fig.1 depicts the distribution of domestic fire accidents over time in the city of New York. We observe that there is periodic temporal pattern over the different years. The number of incidents fluctuates across the first three months of the year, indicating accidents may be in infuenced by dynamic temporal factors such as weather. There are also differences when comparing between different incidents distributions at the level of tracts. From the above observations, we decide to include data from other domains such as meteorological information to predict fire incidents.
In what follows, we describe the task of selecting proper feature sets to build our model. Feature engineering is the task of researching and creating features that represent the human’s understanding about the influencing factors of a phenomena. In our project, features are extracted from each individual domains, representing the influencing factors. The set of features are summarized below:
|Temporal||Day of week||The ordinal number of the day in a week|
|Temporal||Month||The month which the time interval is in|
|Temporal||Holiday||Is this day a holiday or no|
|Spatial||Tract||The administrative tract|
|Meteorological||Weather condition||The index of humidity in a given day|
|Meteorological||Temperature||The temperature in a given day|
|Meteorological||Wind||The orientation and speed of the wind in a given day|
|Meteorological||Humidity||The index of humidity in a given day|
|Meteorological||Snow||The depth of snow in a given day|
|Meteorological||pressure||The level of pressure in a given day|
|Meteorological||precipitation||The level of precipitation in a given day|
|Buildings||avg_yearbuilt||The average age of buildings in a give region|
|Buildings||total_units||Total number of units in a give region|
|Buildings||avg_unitarea||The average unit area in a give region|
|Buildings||ratio_retailarea||Ratio of retail buildings in a give region|
|Buildings||ratio_comarea||Ratio of commercial buildings in a give region|
|Buildings||ratio_resarea||Ratio of residential buildings in a give region|
|Buildings||ratio_officerea||Ratio of office buildings in a give region|
|Buildings||avg_numfloors||The average number of floors|
|Buildings||total_bldgarea||Total building area|
|Census||household_type_by_units_in_structure||household type by units in structure|
|Census||average_household_size_of_occupied_housing_units||average household size of occupied housing units|
|Census||total_housing_units||Total housing units|
|Census||median_number_of_rooms||Median number of rooms|
|Census||median_contract_rent||Median contract rent|
|Census||median_gross_rent||Median gross rent|
|Census||median_household_income||Median household income|
|Census||aggregate_household_income||Aggregated household income|
|Census||owner_occupied_homes_median_value||Owner occupied homes median value|
|Census||value_for_owner_occupied_housing_units||Median value for owner occupied housing units|
|Census||owner_occupied_homes_median_value||Owner occupied homes median value|
|Census||total_vacancies||Total vacancies appartements|
|Census||sold_not_occupied||Sold non-occupied appartements|
|Census||for_rent||Appartements for rent|
|Census||population_5_and_over||Population ages 5-18|
|Census||adults_18_to_20||Population ages 18-20|
|Census||adults_25_to_64||Population ages 25-64|
|Census||adults_25_to_64_with_bachelors_degree||Population ages 25-64 with bechelor degree|
|Census||below_poverty_line||Population below poverty line|
|Census||people_per_household||Number of persons per household|
|Census||built_total||Average of buildings age|
|Census||built_1970s||Number of buildings built on 1970s|
|Census||built_1960s||Number of buildings built on 1960s|
|Census||built_1950s||Number of buildings built on 1950s|
|Census||built_1940s||Number of buildings built on 1940s|
|Census||built_before_1940||Number of buildings built before 1940s|
|Census||median_household_income||Media household income|
Results We evaluate the performance of our model. We chronologically order the samples and test the model using k-folds cross validation for evaluation to prevent over-fitting. The best performing model with K=3 is summarized in Table 3.
Table.3: The performance of the model
To evaluate the effectiveness of features, we list the top 10 features with respect to feature importance in Table 5. We used Bagged decision trees like Random Forest and Extra Trees to estimate the importance of features. It can be easily observed that the most contributive features are Meteorological factors that can influence people's behaviour (indoor/outdoor activities). The buildings' conditions contribute strongly to these events.
Table 5: The TOP 10 features ranked by score importance
To ease the use of our system for further analysis and urban strategy making, we designed and developed a a map-centered system for users to navigate, query and visualize the results of
Firecaster. Fig. X shows a screen shot of the system. The proposed system has been developed in a modular way, so that components can be updated easily as new technologies and algorithms become available. We use Bootstrap, JS & D3 to build the front-end basic page and Leaflet for map related elements on the page. For the back-end, we use Flask (a python microframework for web application) to process the requests. A relational database is used to store spatial data and census information for every tract in a standard format. We use PostGis. From the user side, the prototype works as follows. After the user submits the address for a building and the current time, the result of the prediction will appear (see the left) using a color-coded message; Green for safe neighborhood and Red for dangerous neighborhood.
The project is structured in the following components:
- FireCaster_Model : Build and validate the model
data_acquisition: contains scripts to download the data
data_processing: contains scripts for our process of crawling and transforming the data to a usable format.
data_analysis: contains scripts for supporting tasks carried out throughout the process of building the fire risk model.
FireCaster_Dashboard: Map-centered web application to visualize the results. We run a cron-job to generate scores and read the scores directly from DB.
docs: Documentation & presentation.
- Install tools
- Create Postgres/Postgis database
psql -d postgres createdb "<DB_NAME>" psql -d "<DB_NAME>" -c "CREATE EXTENSION postgis"
- Init Postgis database
psql -f ./sql/init_firecaster_db.sql
- Create data folders
When you finish setting up the environment, you can use the pipeline. The general framework is the following (1) load data into the database (2) generate features from the data (3) train model (4) evaluate the model performance.
- Get New York Open Datasets. You can find the list of all the datasets under
- Get New York Mappluto data
Cleaning, processing, and joining the files.
python main.py -task CONTEXTUAL_DATA_COLLECTION python main.py -task DATA_CLEANING
After processing all the files, you can find all the processed files under
- Data processing & feature engineering
python main.py -task FEATURE_ENGINEERING
- Data Analysis & model selection
python main.py -task MODEL_SELECTION
Run the Dashboard
As we have validated our model, we run a cron job (daily) to insert new data to the database, run the same model on the new data, infer the prediction score, update the file and then insert the new values to the database. The dashboard loads the data directly from the database.
- Run Scheduler job (on the server)
0 6 * * * python main.py -task SCHEDULER
- Run the dashaboard
cd firecaster_dashboard/ python run.py
 A. Clark and J. Smith. Experiencing a domestic fire: an overview of key findings from a post incident research programme. Safer Communities, 14(2):95–103, 2015.
 A. Clark, J. Smith, and C. Conroy. Domestic fire risk: a narrative review of social science literature and implications for further research. Journal of Risk Research, 18(9):1113–1129, 2015.
 L. S. Edelman. Social and economic factors associated with the risk of burn injury. Burns, 33(8):958–965, 2007.
This project has been developed during an internship at Snips on 2014, and it was refactored after that.
If you are having issues, please let us know or submit a pull request.
The project is licensed under the MIT License.