about — stoop

Data sources

NYC Dept. of Buildings

DOB Complaints Received

dataset: eabe-havv

Every complaint filed with the DOB since 2007, including category, status, inspection dates, and disposition. Primary dataset behind the DOB risk level and neighborhood percentile ranking.

NYC Housing Preservation & Development

HPD Housing Maintenance Violations

dataset: wvxf-dwi5

Formally issued violations for housing maintenance code breaches. Classified by severity: Class C (immediately hazardous), B (hazardous), A (non-hazardous).

NYC Housing Preservation & Development

HPD Housing Maintenance Complaints

dataset: ygpa-z7cr

Complaints filed directly by tenants about housing conditions like heat, hot water, pests, mold, leaks, and more. Classified as Immediate Emergency, Emergency, or Non Emergency.

NYC Dept. of City Planning

NYC Building Footprints

dataset: 5zhs-2jue

Building polygons with roof height and footprint area, used to estimate total building scale for size-normalized comparisons, plus centroids for map placement.

NYC Dept. of City Planning

Neighborhood Tabulation Areas

dataset: NTA2020

2020 NTA polygon boundaries used for point-in-polygon assignment. Each building is placed in exactly one NTA, which defines its peer group for percentile ranking.

Data processing

Download & normalize

Each sync fetches only records filed since the previous run via the Socrata JSON API. Column names are mapped from the raw DOB headers (e.g., Date Entered → date_entered), date columns are parsed, and borough is derived from the first digit of each building's BIN (1 = Manhattan, 2 = Bronx, 3 = Brooklyn, 4 = Queens, 5 = Staten Island).

Deduplication

Some rows appear more than once in the raw export (e.g., when a complaint status is updated). Duplicates are resolved by keeping the last occurrence of each unique complaint_number, which reflects the most recent known status.

BIN validation

BINs consisting entirely of zeros (e.g., 0000000) are treated as missing and excluded from scoring. Complaints with no valid BIN cannot be attributed to a specific building.

Neighborhood assignment

Each building is assigned to an NTA via point-in-polygon using NYC's 2020 NTA boundaries. A spatial index (STRtree) is used for efficient bulk matching. Buildings outside all NTA polygons — typically those missing or slightly off their recorded coordinates — are excluded from neighborhood comparisons but still scored.

DOB complaint priority

Each of the 254 DOB complaint category codes is assigned a priority tier based on the DOB's own classification system (rev. 09/21). When a complaint's category is unknown or absent, it defaults to Priority C.

Imminent dangerweight 15

Collapse risk, falling debris, blocked egress, gas leaks, elevator accidents

Active violationweight 8

Illegal work in progress, no permit, SRO conversion, sprinkler defects

Minor / administrativeweight 3

Zoning non-compliance, certificate of occupancy issues, failure to maintain

Tracking / inspectionweight 1

Routine inspections, contractor sign absent, inter-agency referrals

HPD violation severity

HPD Housing Maintenance Code violations are classified into four classes by severity. The weighted violation sum uses the same recency multipliers as the DOB weighted sum (see Building size normalization below), so recent serious violations weigh more than old minor ones.

Immediately hazardousweight 15

Lead paint, mold, heat failure, pest infestation, structural hazard

Hazardousweight 8

Broken locks, defective plumbing, missing smoke detectors, damaged floors

Non-hazardousweight 3

Peeling paint (non-lead), minor repairs, cosmetic defects

Informationalweight 1

Administrative notices, permit-related items

HPD tenant complaint urgency

HPD tenant complaints are classified by urgency when filed. Because complaints are typically closed once an inspector visits or a violation is issued, raw open counts understate the building's history. The weighted complaint sum captures the full record — with higher weight for urgent and recent complaints.

Immediate Emergencyweight 15

No heat in winter, gas leak, sewage backup, structural collapse risk

Emergencyweight 8

Mold, pest infestation, water leak, broken elevator

Non Emergencyweight 3

Cosmetic damage, minor repairs, general maintenance

Building size normalization

A 200-unit tower will naturally accumulate more complaints than a four-unit brownstone. Raw counts penalize larger buildings unfairly. To make comparisons meaningful, all weighted sums are divided by an estimate of building scale before peer ranking. Since we don't have the exact unit count for each building, size is estimated by building footprint and height on building.

Estimated scale

scale = footprint × max(height / 12, 1)

Footprint area (sq ft) from the building polygon multiplied by estimated floors (roof height ÷ 12 ft per floor). This approximates total floor area without needing unit counts.

Complaint density

density = weighted sum / scale × 10 000

Weighted complaint or violation sum divided by estimated scale, scaled to “per 10,000 sq-ft-floors.” A small building and a large building with proportional complaint histories get the same density.

Recency multiplier

≤ 2 yearsFull weight

1.0×

2 – 5 yearsHalf weight

0.5×

5 – 10 yearsQuarter weight

0.25×

> 10 yearsExcluded

Applied to all three datasets (DOB, HPD violations, HPD complaints). Complaints with no date recorded contribute nothing to the weighted sum.

Size-normalized percentile

Each building's density is ranked via PERCENT_RANK() within its NTA, separately for HPD violations, HPD complaints, and DOB complaints. Buildings without footprint or height data receive a raw count percentile instead. A density percentile of 20 means the building has fewer weighted complaints per unit of scale than 80% of its residential neighbors.

Risk level

A building's neighborhood percentile is mapped to a risk level label shown on building pages and the map. The label reflects how the building compares to residential peers within the same neighborhood — not citywide.

Very low< 15th percentile

Fewer weighted complaints per unit of scale than ~85% of residential peers in the neighborhood.

Low15th – 39th

Below the neighborhood median.

Moderate40th – 69th

Near or above the neighborhood median.

High70th – 89th

More weighted complaints than most residential peers.

Very high≥ 90th percentile

Among the most complaint-heavy buildings in the neighborhood.

“Insufficient data” and “Not comparable” are handled separately — see below.

Special cases

Insufficient data

Buildings with fewer than 10 total complaints and less than 2 years of complaint history cannot be reliably ranked. They are excluded from percentile comparisons.

Not comparable

Buildings in non-residential NTAs — parks, airports, cemeteries, and similar areas (NTA type ≠ 0) — are excluded from percentile ranking because there are no meaningful residential peers to compare against.

Neighborhood comparisons

Neighborhood percentile

Percentile comparisons are neighborhood-relative, not absolute — a building is compared only to residential peers in its own NTA. Within each NTA, buildings are ranked by weighted complaint density from lowest to highest. A building at the 80th percentile has higher weighted complaint density than 80% of its residential peers — meaning it received relatively more or more serious complaints. Percentiles are computed independently per NTA, so the same density may rank high in one neighborhood and low in another.

Serious complaint rate

Priority A and B complaints per year, averaged over the last 10 years (minimum 1 year). This rate is also percentile-ranked within each NTA and surfaced in the “Severity” insight card on building pages.

Trend

Complaint trend compares the average annual rate of the last 2 years against the 3 years before that. A building is “worsening” if the recent rate exceeds the prior rate by more than 1 complaint per year, and “improving” if it is more than 1 lower. The same algorithm is applied independently to DOB complaints and HPD tenant complaints.

Leaderboards

The leaderboard pages rank buildings by complaint activity in the last 2 years, not all-time totals, so they reflect current conditions rather than accumulated history. Buildings need at least 10 total complaints to appear.

DOB — Building Safety

Sorted by DOB complaints filed in the last 2 years. Ties broken by serious complaints (Priority A+B) in the same window. Only residential buildings are included (non-residential NTAs excluded).

Primary sortComplaints last 2yr

TiebreakerPriority A+B last 2yr

HPD — Housing Conditions

Sorted by HPD tenant complaints filed in the last 2 years. Ties broken by emergency complaints (Emergency + Immediate Emergency) in the same 2-year window — counting all emergency complaints regardless of whether they are still open.

Primary sortComplaints last 2yr

TiebreakerEmergency complaints 2yr

Limitations

Complaint ≠ confirmed violation

DOB and HPD complaints are reports filed by the public or other agencies — they are not confirmed findings. HPD violations are formally issued after inspection and carry more weight. Scores reflect the full record of complaints and violations, not confirmed outcomes only.

Records begin in 2007

Electronic DOB complaint records are available from 2007 onwards. HPD violation and complaint records vary in depth. Complaints filed before the digital record period, or those never digitized, are not reflected in scores.

BIN matching

All data is attributed to buildings using the Building Identification Number (BIN). If a complaint or violation was filed with a missing or incorrect BIN, it will not appear on the correct building's page and is excluded from scoring.

Scale estimation

Building scale is estimated from footprint area and roof height. Buildings missing either value cannot be size-normalized and fall back to raw count percentiles within their NTA. Scale is a proxy — it does not account for unit density or occupancy.

Sync frequency

All datasets are refreshed periodically from NYC Open Data. There may be a lag of several days between a complaint being filed and it appearing here.

All data is sourced from NYC Open Data and is in the public domain.

How it works