Data Notes

The National Housing Preservation Database (NHPD) provides users with the best assumption of subsidy status based on nationally available data sources. The accuracy, format, completion, and frequency of update for each data source vary by program and have changed overtime. Data for Section 8 Project Based Rental Assistance (PBRA) and US Department of Housing and Urban Development (HUD) Insured subsidies update monthly, while all other subsidy programs update once a year or less. Programs primarily tracked at the local level such as Low Income Housing Tax Credit (LIHTC) subsidies often have insufficient information to determine subsidy status at the national level. For instance, LIHTC subsidies can be kept affordable by state Housing Finance Agencies (HFAs) despite passing their extended use period or being listed as non-programmatic in HUD's LIHTC Database. Until September 2017, the true subsidy end date for Section 515 Direct Loans was not available and the Restriction Clause Expiration Date was used instead. Therefore, it is possible that many Section 515 Direct Loans were incorrectly classified as inactive in versions of the database before 2017.

Each record in the database describes a property. While each property can consist of multiple buildings and addresses, only the first address listed is displayed in the database. This may hinder searches by property address. Searching for individual property addresses can be particularly challenging for public housing properties. Public housing properties are reported in Asset Management Projects (AMPs), which tend to be larger groups of buildings that can be scattered among a larger area than other subsidy programs.

The number of active subsidies reported in the NHPD can change over time as the format of data files change, subsidy programs expand or are phased out, and as the logic applied to determine subsidy status is updated. The procedure for updating new data and changes in the number of active subsidies and factors that affect the comparability of data over time are noted below for each data refresh.

Data Refresh September 2017

The September 2017 data refresh incorporated 12 new file updates, a new matching protocol, and changes to how subsidy status is classified for all subsidies. These changes will impact overtime comparisons of earlier versions of the NHPD. Since many subsidies are either missing information on the subsidy end date or are not continually monitored by the government agencies that provide data on these programs, ‘inconclusive’ was added as a third subsidy status category. This change gives researchers more control over how conservative their estimates of active and inactive affordable housing properties are. Researchers can now choose to report inconclusive properties separately or re-categorize inconclusive subsidies using the subsidy status description field, which describes why a subsidy is labeled inactive or inconclusive. The number of active LIHTC, Section 515, and Section 8 PBRA, and State HFA 236 subsidies were most affected by the subsidy status change rule. Many subsidies missing key information, such as subsidy end date, that were previously classified as active are now classified as inconclusive. Similarly, the number of HUD insured mortgages increased drastically between 2016 and 2017 due to the inclusion of a handful of HFA risk sharing mortgages programs that were not previously tracked by the NHPD. Meanwhile, the large change in the number of active Section 538 subsidies can be attributed to data updates for these programs that were more than a few years outdated. To view a version of the Data Dictionary that lists all changes side by side, click here. 

Number of Active Subsidies Before and After September 2017 Refresh
Subsidy Type Active Subsidies Before Refresh Active Subsidies After Refresh Active and Inconclusive Subsidies After Refresh Change in Number of Active Subsidies Change in Number of
Active and Inactive Subsidies
Section 8 PBRA* 21,574  19,989  22,970  -7%  6%
Section 202 Direct Loans* 1,721  1,533  1,538  -11%  -11%
State HFA 236 206  35  181 -83%  -12%
HUD Insured* 5,826  7,896  7,896  36%  36%
LIHTC* 37,865  33,352  41,384  -12%  9%
HOME* 20,639  18,461  18,461  -11%  -11%
Section 515* 14,719  14,080  14,092  -4%  -4%
Section 538** 424  748  748  76%  76%
Public Housing* 6,890  6,778  6,778  -2%  -2%
Total 109,864  105,410  116,586  -4%  6%

*Data updated (previous data outdated by less than one year)
**Data updated (previous data outdated by more than one year)

Changes to the Database in September 2017

In September 2017, the NHPD was updated to incorporate new variables, prepackaged extracts, and enhanced user features. Data processing updates were also made to improve data accuracy and reduce the number of duplicate properties in the database. To learn more about these changes, click here

Data Refresh February 2016

The February 2016 refresh incorporated ten data file updates, which led to a significant change in the number of total active subsidies. The number of active Home Investment Partnership (HOME), Section 202, and Section 515 subsidies changed significantly because the data for both of these programs was outdated by more than three years. This refresh incorporated new subsidies that have been awarded or terminated during this time. Additionally, the HOME data used previously did not include subsidies awarded to properties with less than five units. The number of Section 202 Direct Loans decreased significantly because some owners prepaid their mortgages and there were no new Section 202 Direct Loans awarded. Section 202 assistance is now awarded through rental assistance contracts, which are tracked under the Section 8 PBRA program.

Number of Active Subsidies Before and After February 2016 Refresh
Subsidy Type Before Refresh After Refresh Change
Section 8 PBRA* 21,733  21,574 -1%
Section 202 Direct Loans** 2,231 1,721 -23%
State HFA 236 214 206 -4%
HUD Insured* 5,551 5,826 5%
LIHTC* 34,220 37,865 11%
HOME** 6,451 20,639 220%
Section 515** 10,877 14,719 35%
Section 538 422 424 0%
Public Housing** 6,813 6,890 1%
Total 90,218 109,864 22%

*Data updated (previous data outdated by less than one year)
**Data updated (previous data outdated by more than one year)

Data Refresh February 2015

The February 2015 data refresh incorporated five new data files, an update to the logic used to determine which HUD Insurance programs would be included in the database, and an update on how LIHTC subsidy status is calculated. Previously, all properties that were HUD Insured were included in the database. During this refresh, selected HUD Insurance programs that are not affiliated with affordable rental housing were excluded from the database. This change resulted in a reduction of the number of active units in the database by one million. For a list of HUD programs included in the database, see the Data Dictionary.

Additionally, LIHTC subsidies listed as non-programmatic in the LIHTC Database were reclassified as inactive, which resulted in a significant reduction in the number of active LIHTC subsidies.

Number of Subsidies Before and After February 2015 Refresh
Subsidy Type Before Refresh After Refresh Change
Section 8 PBRA* 21,227 21,733 2%
Section 202 Direct Loans 2,266 2,231 -2%
State HFA 236 228 214206 -6%
HUD Insured* 13,964 5,551 -60%
LIHTC* 37,184 34,220 -8%
HOME 6,595 6,451 -2%
Section 515 11,001 10,877 -1%
Section 538 422 422 0%
Public Housing 6,813 6,813 0%
Total 101,407 90,218 -11%

*Data updated (previous data outdated by less than one year)
**Data updated (previous data outdated by more than one year)

Data Integration and Cleaning Protocol

Procedure for Integrating New or Updated Data

The NHPD is updated three times a year. At these times, any updates made to source datasets by the dataset originator (such as the Department of Housing and Urban Development) are imported into the Database. A list of all data sources included in the NHPD and their most recent update date can be found here .

As data quality and data format vary by data source and each data source may contain duplicate property entries, automated procedures have been created to standardize imported data and reduce the number of incorrect or duplicate entries in the NHPD. During the import process, data inconsistencies that cannot be corrected through the automated process are flagged for manual cleaning. Manual cleaning takes places at each tri-annual data update and on an ongoing basis. The procedures used for both automated and manual cleaning are described below.

Automated Cleaning Procedures

Automated cleaning procedures center on correcting property addresses and latitude and longitude values, as these fields are the primary matching keys for identifying and linking all of a property’s subsidies. As property addresses are imported into the Database, they are standardized according to USPS standard address protocols and extraneous characters or words appearing in addresses are attempted to be removed. Likewise, property names are standardized and extraneous characters are attempted to be removed. These procedures improve the rate of positive property matches between data sources.

Once addresses are standardized, they are matched and compared to existing address records in the database using subsidy ID. If the address is new or has changed, it is entered into an address verification system. The system currently utilized is based on US Postal Service Coding Accuracy Support System (USPS CASS) certification provided by Smarty Streets and Melissa Data. These address verification systems return latitude, longitude, and other geographic information about the area the property is located in. Geocodes from Melissa Data and Smarty Streets are only kept for building level property matches. If the address verification systems do not return a match, the latitude and longitude provided from the originating data source are used if available.

After addresses have been standardized and verified using Smarty Streets and Melissa Data, they are matched to comparable addresses in the database using the following fields:

  • Property address, city, and state
  • Property name, city, state, zip code, and total units +/- 2
  • Property ID
  • Latitude and longitude
  • Referencing subsidy ID

Any subsidy record that matches based on these fields is considered to be awarded to the same property. If two records with different property addresses are matched by these rules, an admin user select which property address to display in the database. Any property records missing street address, city, state, total units, latitude, or longitude are flagged for manual review and are withheld from the database until these fields can be populated.

Manual Cleaning Procedures

Several types of data issues lead to manual review and cleaning. First, all properties that do not CASS certify with a valid USPS address in the automated cleaning process are flagged for manual review. These properties are checked using Google Maps to validate the address and are manually cross checked to the NHPD to ensure that there are no duplicate properties located in the database once the address is updated. Any changes made to property and subsidy records in the database are retained after subsequent data updates. Several common address errors and their corresponding cleaning protocol are listed below.

Incomplete or Incorrect Address: Incomplete or Incorrect Address:

  • Case 1: Address is incomplete and does not contain a house number. (Ex. Main St.)
  • Case 2: Address is a set of cross streets. (Ex. 5th and Vine)
  • Case 3: Address contains no street address. (ex. apartment name or city is repeated in street address line)
  • Case 4: Address contains misspellings (ex. 100 Mairn St., Phonix, AZ)
  • Case 5: Address contains incorrect information. (ex. 100 Main St., Phoenix, AR)
  • Case 6: Address contains a range of street numbers. (ex. 1-100 Main St., Phoenix, AR)

Solution: The correct property address is researched by googling the apartment name and location to identify the official address and by using google maps to verify the address and identify a corresponding building footprint. Once a correct address is found, it is cross checked to the NHPD to ensure that a duplicate property with the correct address is not present. If a duplicate is found, the subsidy information for the duplicate property is merged. Each of the corrected addresses are CASS certified using SmartyStreets.If the corrected address does not CASS certify the latitude and longitude provided by Google Maps is entered into the record. If an address is too incomplete to identify a property’s location, the property is flagged as ‘incomplete’ are remains flagged for cleaning. It cannot be updated until more information is received from the source data.

  • Case 6: Address does not offer a mail receptacle.

Solution: The address is viewed on google maps to determine that the building footprint is viewable. If the property address is confirmed, the latitude and longitude provided by Google Maps is entered into the record. If the footprint cannot easily be confirmed, the latitude and longitude provided by the source data is retained and the property remains flagged.

Second, all properties that have received a comment from users or staff from databasefeedback@preservationdatabase.org are flagged for manual review. Comments may pertain to incorrect address information as described above, indicate that a property is a duplicate property, state that a property’s name has changed, or other data issues that require change and verification. Several common duplicate and name discrepancy scenarios and their corresponding cleaning protocol are listed below.

 Property is a Duplicate Entry:

  • Case 1: Properties with same street name and city/state, same or similar name, unit count is +/- 2.
  • Case 2: Properties with same property name, city/state, unit count +/- 2, different street address.

Solution: The main property address is validated as described above. Then the subsidies located at the duplicate property are attached to the property with the valid address or more subsidies.Property Name is Incorrect or Has Changed:

  • Case 1: Property name on Google Maps is different.
  • Case 2: Property name on official website site is different.

Solution: The correct name is verified using Google and Google Maps and is replaced.

  •  Case 3: Property name is correct, but can be linked to a duplicate property with different address and different property name (most likely from a new data source).

Solution: The correct name is verified using Google and Google Maps. The subsidy from the older dataset with the incorrect name is linked to the property with the updated name from the newer dataset.

Third, properties may be flagged for manual review if there are major inconsistencies in property information between data sources, but the property can be linked to another property from a different data source through the use of (HUD) IDs. Likewise if there are major inconsistencies in property information from one update to the next in a particular data source, the property is flagged for manual review. The discrepancies are verified and if they cannot be verified, one source is chosen based on staff’s assessment of data quality and the radical nature of the change.