Scaling Disaster Response at Nextdoor

Sean Bromage
Nextdoor Engineering
4 min readApr 29, 2016

--

Nextdoor, at its core, is a communications platform. We not only emphasize communication within your neighborhood, but also build tools that enable municipal agencies to reach members of their community.

Recent events, such as the flooding in Houston, have highlighted the utility of one of our core features: the Urgent Alert.

Sending an Urgent Alert allows agencies to target individual neighborhoods with critical information. When an alert goes out, Nextdoor members are alerted via text message, push notification and/or email. The conversation then continues within their neighborhood and nearby neighborhoods.

Substantial technical infrastructure is needed to deliver Urgent Alerts timely and reliably, especially for large cities. We’ve had to devise an architecture that will be rock solid in these times of need.

The Challenge

Municipal agencies on Nextdoor can reach upwards of 1M households in dense urban areas. Since an Urgent Alert is just that - urgent - our infrastructure needs to be able to handle sudden spikes in activity. There are several things to consider when designing an architecture with this use case in mind:

  • Timing is unpredictable
  • Scope can be very large
  • Messages must be delivered
  • Latency must be kept to a minimum
  • Site load times can’t be affected

Implementation

All content routing on Nextdoor is dependent on the member’s location. We chose PostgreSQL as our database system because of its performance characteristics related to executing geospatial queries.

When an agency is created, a large geospatial query is executed in the background that finds the intersection of the agency’s geometry with geometries of household plots (read more on how we store and quickly access these geometries on our writeup of Atlas). This information is stored in a separate table for quick lookup when determining which members to route alerts to.

The City of Houston’s geometry causes complex geospatial queries

When an alert is created, a series of asynchronous tasks are created and sent to an array of Taskworker machines. These tasks include creating and sending text messages, push notifications and emails. In order to give priority to these tasks over other types of tasks (we run tens of millions per day), we utilize the concept of priority queues.

Every async task that a Nextdoor engineer creates must be given a priority. These priorities are respected by Taskworkers by giving resources to higher priority tasks. They also allow tuning of alerts related to Service Level Agreements (SLAs) given to each type of task that will notify oncall engineers of unusually slow queue consumption. The priorities range from high — > default — > low — > large, with high being reserved for very important, sub-second latency communication events, such as Urgent Alerts.

As millions of tasks flood the Taskworker array, lower priority tasks can be starved of resources. We use Rightscale to monitor metrics related to our arrays and have setup autoscaling triggers. If queue depth becomes too high for any priority queue, more machines automatically spin up. In order to protect our main database from so many machines vying for resources at once, we’ve created a fleet of read replicas to quickly process queries without introducing more load on the main DB that would affect site load times.

As tasks are processed, we send text messages using Twilio’s API using a short code for higher send rates and create our own push notifications and emails.

Use In the Field

The Houston Office of Emergency Management had little time to react and mobilize when some communities got more than 17 inches of rain in less than 24 hours. Houston officials utilized Nextdoor for Public Agencies to reach their residents during the severe flooding:

Actual Urgent Alerts sent by the Houston Office of Emergency Management
Green dots represent Urgent Alert recipients in Houston, TX

The messages provided up-to-the-minute details to keep residents out of harm’s way. There are currently more than 1,400 agencies nationwide using Nextdoor to connect and communicate with their residents.

By integrating asynchronous tasks, priority queues, and autoscaling Taskworker arrays into Nextdoor’s infrastructure, we are able to handle any number of messages municipal agencies broadcast.

Like this post? Be sure to recommend and follow Nextdoor Engineering.

Make your impact on Nextdoor’s engineering culture.

--

--