need a simple, engaging, and scalable method for presenting geospatial data, they often turn to a heatmap. This 2D visualization divides a map into equal-sized grid cells and uses color to represent the magnitude of aggregated data values within the cells.
Overlaying heatmaps on geographical maps permits quick visualization of spatial phenomena. Patterns such as clusters, hotspots, outliers, or gradients become immediately obvious. This format can be useful to decision-makers and the public, who might not be well-versed in raw statistical outputs.
Heatmaps may be composed of square cells (called grid-based or matrix heatmaps) or smoothly contoured “continuous” values (called spatial or kernel density heatmaps). The following maps show the density of tornado starting locations using both approaches.
If you squint your eyes while looking at the top map, you should see trends similar to those in the bottom map.
I prefer grid-based heatmaps because the sharp, distinct boundaries make it easier to compare the values of adjacent cells, and outliers don’t get “smoothed away.” I also have a soft spot for their pixelated appearance as my first video games were Pong and Wolfenstein 3D.
In addition, kernel density heatmaps can be computationally expensive and sensitive to the input parameters. Their appearance is highly dependent on the chosen kernel function and its bandwidth or radius. A poor parameterization choice can either over-smooth or under-smooth the data, obscuring patterns.
In this Quick Success Data Science project, we’ll use Python to make static, grid-based heatmaps for tornado activity in the continental United States.
The Dataset
For this tutorial, we’ll use tornado data from the National Oceanic and Atmospheric Administration’s wonderful public-domain database. Extending back to 1950, it covers tornado starting and ending locations, magnitudes, injuries, fatalities, financial costs, and more.
The data is accessible through NOAA’s Storm Prediction Center. The CSV-format dataset, highlighted yellow in the following figure, covers the period 1950 to 2023. Don’t bother downloading it. For convenience, I’ve provided a link in the code to access it programmatically.
This CSV file contains 29 columns for almost 69,000 tornadoes. You can find a key to the column names here. We’ll work primarily with the tornado starting locations (slat, slon), the year (yr), and the storm magnitudes (mag).
Installing Libraries
You’ll want to install NumPy, Matplotlib, pandas, and GeoPandas. The previous links provide installation instructions. We’ll also use the Shapely library, which is part of the GeoPandas installation.
For reference, this project used the following versions:
python 3.10.18
numpy 2.2.5
matplotlib 3.10.0
pandas 2.2.3
geopandas 1.0.1
shapely 2.0.6
The Simplified Code
It doesn’t take a lot of code to overlay a heatmap on a geographical map. The following code illustrates the basic process, although much of it serves ancillary purposes, such as limiting the US data to the Lower 48 states and improving the look of the colorbar.
In the next section, we’ll refactor and expand this code to perform additional filtering, employ more configuration constants for easy updates, and use helper functions for clarity.
# --- Plotting ---
print("Plotting results...")
fig, ax = plt.subplots(figsize=FIGURE_SIZE)
fig.subplots_adjust(top=0.85) # Make room for titles.
# Plot state boundaries and heatmap:
clipped_states.plot(ax=ax, color='none',
edgecolor='white', linewidth=1)
vmax = np.max(heatmap)
img = ax.imshow(heatmap.T,
extent=[x_edges[0], x_edges[-1],
y_edges[0], y_edges[-1]],
origin='lower',
cmap=cmap, norm=norm,
alpha=1.0, vmin=0, vmax=vmax)
# Annotations:
plt.text(TITLE_X, TITLE_Y, 'Tornado Density Heatmap',
fontsize=22, fontweight='bold', ha='left')
plt.text(x=SUBTITLE_X, y=SUBTITLE_Y, s=(
f'{START_YEAR}–{END_YEAR} EF Magnitude 3–5 '
f'{GRID_SIZE_MILES}×{GRID_SIZE_MILES} mile cells'),
fontsize=15, ha='left')
plt.text(x=SOURCE_X, y=SOURCE_Y,
s='Data Source: NOAA Storm Prediction Center',
color='white', fontsize=11,
fontstyle='italic', ha='left')
# Clean up the axes:
ax.set_xlabel('')
ax.set_ylabel('')
ax.axis('off')
# Post the Colorbar:
ticks = np.linspace(0, vmax, 6, dtype=int)
cbar = plt.colorbar(img, ax=ax, shrink=0.6, ticks=ticks)
cbar.set_label('\nTornado Count per Grid Cell', fontsize=15)
cbar.ax.set_yticklabels(list(map(str, ticks)))
print(f"Saving plot as '{SAVE_FILENAME}'...")
plt.savefig(SAVE_FILENAME, bbox_inches='tight', dpi=SAVE_DPI)
print("Plot saved.\n")
plt.show()
Here’s the result:
The Expanded Code
The following Python code was written in JupyterLab and is described by cell.
Importing Libraries / Assigning Constants
After importing the necessary libraries, we define a series of configuration constants that allow us to easily adjust the filter criteria, map boundaries, plot dimensions, and more. For this analysis, we focus on tornadoes within the contiguous United States, filtering out states and territories outside this area and selecting only significant events (those rated EF3 to EF5 on the Enhanced Fujita Scale) from the complete dataset spanning 1950 to 2023. The results are aggregated to 50×50-mile grid cells.
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
import numpy as np
from shapely.geometry import box
# --- Configuration Constants ---
# Data URLs:
TORNADO_DATA_URL = 'https://bit.ly/40xJCMK'
STATES_DATA_URL = ("https://www2.census.gov/geo/tiger/TIGER2020/STATE/"
"tl_2020_us_state.zip")
# Geographic Filtering:
EXCLUDED_STATES_ABBR = ['AK', 'HI', 'PR', 'VI']
TORNADO_MAGNITUDE_FILTER = [3, 4, 5]
# Year Filtering (inclusive):
START_YEAR = 1950
END_YEAR = 2023
# Coordinate Reference Systems (CRS):
CRS_LAT_LON = "EPSG:4326" # WGS 84 geographic CRS (lat/lon)
CRS_ALBERS_EQUAL_AREA = "EPSG:5070" # NAD83/Conus Albers (projected CRS in m)
# Box for Contiguous US (CONUS) in Albers Equal Area (EPSG:5070 meters):
CONUS_BOUNDS_MIN_X = -2500000
CONUS_BOUNDS_MIN_Y = 100000
CONUS_BOUNDS_MAX_X = 2500000
CONUS_BOUNDS_MAX_Y = 3200000
# Grid Parameters for Heatmap (50x50 mile cells):
GRID_SIZE_MILES = 50
HEATMAP_GRID_SIZE = 80500 # ~50 miles in meters.
# Plotting Configuration:
FIGURE_SIZE = (15, 12)
SAVE_DPI = 600
SAVE_FILENAME = 'tornado_heatmap.png'
# Annotation positions (in EPSG:5070 meters, relative to plot extent):
TITLE_X = CONUS_BOUNDS_MIN_X
TITLE_Y = CONUS_BOUNDS_MAX_Y + 250000 # Offset above max Y
SUBTITLE_X = CONUS_BOUNDS_MIN_X
SUBTITLE_Y = CONUS_BOUNDS_MAX_Y + 100000 # Offset above max Y
SOURCE_X = CONUS_BOUNDS_MIN_X + 50000 # Slightly indented from min X
SOURCE_Y = CONUS_BOUNDS_MIN_Y + 20000 # Slightly above min Y
Defining Helper Functions
The next cell defines two helper functions for loading and filtering the dataset and for loading and filtering the state boundaries. Note that we call on previous configuration constants during the process.
# --- Helper Functions ---
def load_and_filter_tornado_data():
"""Load data, apply filters, and create a GeoDataFrame."""
print("Loading and filtering tornado data...")
df_raw = pd.read_csv(TORNADO_DATA_URL)
# Combine filtering steps into one chained operation:
mask = (~df_raw['st'].isin(EXCLUDED_STATES_ABBR) &
df_raw['mag'].isin(TORNADO_MAGNITUDE_FILTER) &
(df_raw['yr'] >= START_YEAR) & (df_raw['yr']
Note that the tilde (~
) in front of the mask and states_temp_gdf
expressions inverts the results, so we select rows where the state abbreviation is not in the excluded list.
Running Helper Functions and Generating the Heatmap
The following cell calls the helper functions to load and filter the dataset and then clip the data to the map boundaries, generate the heatmap (with the NumPy histogram2d()
method), and define a continuous colormap for the heatmap. Note that we again call on previous configuration constants during the process.
# --- Data Loading and Preprocessing ---
gdf = load_and_filter_tornado_data()
states_gdf = load_and_filter_states()
# Create bounding box and clip state boundaries & tornado points:
conus_bounds_box = box(CONUS_BOUNDS_MIN_X, CONUS_BOUNDS_MIN_Y,
CONUS_BOUNDS_MAX_X, CONUS_BOUNDS_MAX_Y)
clipped_states = gpd.clip(states_gdf, conus_bounds_box)
gdf = gdf[gdf.geometry.within(conus_bounds_box)].copy()
# --- Heatmap Generation ---
print("Generating heatmap bins...")
x_bins = np.arange(CONUS_BOUNDS_MIN_X, CONUS_BOUNDS_MAX_X +
HEATMAP_GRID_SIZE, HEATMAP_GRID_SIZE)
y_bins = np.arange(CONUS_BOUNDS_MIN_Y, CONUS_BOUNDS_MAX_Y +
HEATMAP_GRID_SIZE, HEATMAP_GRID_SIZE)
print("Computing 2D histogram...")
heatmap, x_edges, y_edges = np.histogram2d(gdf.geometry.x,
gdf.geometry.y,
bins=[x_bins, y_bins])
# Define continuous colormap (e.g., 'hot', 'viridis', 'plasma', etc.):
print("Defining continuous colormap...")
cmap = plt.cm.hot
norm = None
print("Finished execution.")
Because this process can take a few seconds, the print()
function keeps us up-to-date on the program’s progress:
Loading and filtering tornado data...
Loading state boundary data...
Generating heatmap bins...
Computing 2D histogram...
Defining continuous colormap...
Finished execution.
Plotting the Results
The final cell generates and saves the plot. Matplotlib’s imshow()
method is responsible for plotting the heatmap. For more on imshow()
, see this article.
# --- Plotting ---
print("Plotting results...")
fig, ax = plt.subplots(figsize=FIGURE_SIZE)
fig.subplots_adjust(top=0.85) # Make room for titles.
# Plot state boundaries and heatmap:
clipped_states.plot(ax=ax, color='none',
edgecolor='white', linewidth=1)
vmax = np.max(heatmap)
img = ax.imshow(heatmap.T,
extent=[x_edges[0], x_edges[-1],
y_edges[0], y_edges[-1]],
origin='lower',
cmap=cmap, norm=norm,
alpha=1.0, vmin=0, vmax=vmax)
# Annotations:
plt.text(TITLE_X, TITLE_Y, 'Tornado Density Heatmap',
fontsize=22, fontweight='bold', ha='left')
plt.text(x=SUBTITLE_X, y=SUBTITLE_Y, s=(
f'{START_YEAR}–{END_YEAR} EF Magnitude 3–5 '
f'{GRID_SIZE_MILES}×{GRID_SIZE_MILES} mile cells'),
fontsize=15, ha='left')
plt.text(x=SOURCE_X, y=SOURCE_Y,
s='Data Source: NOAA Storm Prediction Center',
color='white', fontsize=11,
fontstyle='italic', ha='left')
# Clean up the axes:
ax.set_xlabel('')
ax.set_ylabel('')
ax.axis('off')
# Post the Colorbar:
ticks = np.linspace(0, vmax, 6, dtype=int)
cbar = plt.colorbar(img, ax=ax, shrink=0.6, ticks=ticks)
cbar.set_label('\nTornado Count per Grid Cell', fontsize=15)
cbar.ax.set_yticklabels(list(map(str, ticks)))
print(f"Saving plot as '{SAVE_FILENAME}'...")
plt.savefig(SAVE_FILENAME, bbox_inches='tight', dpi=SAVE_DPI)
print("Plot saved.\n")
plt.show()
This produces the following map:
This is a beautiful map and more informative than a smooth KDE alternative.
Adding Tornado Ending Locations
Our tornado density heatmaps are based on the starting locations of tornadoes. But tornadoes move after touching down. The average tornado track is about 3 miles long, but when you look at stronger storms, the numbers increase. EF‑3 tornadoes average 18 miles, and EF‑4 tornadoes average 27 miles. Long-track tornadoes are rare, however, comprising less than 2% of all tornadoes.
Because the average tornado track length is less than the 50-mile dimension of our grid cells, we can expect them to cover only one or two cells in general. Thus, including the tornado ending location should help us improve our density measurement.
To do this, we’ll need to combine the starting and ending lat-lon locations and filter out endpoints that share the same cell as their corresponding starting points. Otherwise, we’ll “double-dip” when counting. The code for this is a bit long, so I’ve stored it in this Gist.
Here’s a comparison of the “start only” map with the combined starting and ending locations:
The prevailing winds drive most tornadoes toward the east and northeast. You can see the impact in states like Missouri, Mississippi, Alabama, and Tennessee. These areas have brighter cells in the bottom map due to tornadoes starting in one cell and ending in the adjacent easterly cell. Note also that the maximum number of tornadoes in a given cell has increased from 29 (in the upper map) to 33 (in the bottom map).
Recap
We used Python, pandas, GeoPandas, and Matplotlib to project and overlay heatmaps onto geographical maps. Geospatial heatmaps are a highly effective way to visualize regional trends, patterns, hotspots, and outliers in statistical data.
If you enjoy these types of projects, be sure to check out my book, Real World Python: A Hacker’s Guide to Solving Problems with Code, available in bookstores and online.
Source link
#Overlay #Heatmap #Real #Map #Python