Purpose
The purpose of this guideline is to outline the UA standards for de-identifying data. To protect the privacy and security of data subjects and to maintain confidentiality of other sensitive information, data (human subject or other types) may need to be de-identified before use in several academic, research, business, or operational functions. For example, sensitive data may need to be stripped of identifying information before use for research purposes, institutional effectiveness studies, operational efficiency, public safety or information security, or prior to release to other entities.
A de-identified data set is a data set that meets both of the following:
- Does not provide any reasonable basis for identifying any individual that is a subject of the data.
- Does not identify any individual that is a subject of the data.
Acceptable De-identification Methods
The University of Alabama has adopted the methods below for de-identification:
Safe Harbor
The safe harbor method of de-identification is removing all unique identifiers that could be used to identify the individual or the individual’s relatives, household members, and employers (when applicable) to ensure residual information cannot be used for identification.
Unique Identifiers
- Names
- All geographic subdivisions smaller than a State, including
- Street address
- City
- County
- Precinct
- Zip code[1]
- All elements of dates (except year) for dates directly related to the individual, including
- Birth date
- Admission date
- Discharge date
- Date of death
- Elements of dates for individuals over 89 years old[2]
- Telephone numbers
- Fax numbers
- Social security numbers
- Medical record numbers
- Health plan beneficiary numbers
- Account numbers
- Certificate/license numbers
- Email addresses
- Social media profile names (or handles)
- Web Universal Resource Locators (URLs)
- Internet Protocol (IP) address numbers
- Device identifiers and serial numbers
- Vehicle identifiers and serial numbers, including license plate numbers
- Device identifiers and serial numbers
- Biometric identifiers, including finger and voice prints
- Full-face photographs and any comparable images
- Any other unique identifying number, characteristic, or code. In addition to the removal of unique identifiers, there should be reasonable assurance that the individual or entity intending to use the data does not have actual knowledge that the remaining information could be used alone or in combination with any reasonably available information to identify an individual who is subject. Other details that may result in the identification of an individual include: initials, circumstances associated with the care of an individual, highly publicized details, and profession or occupation.
[1] Zip code, and their equivalent geocodes, except for the initial three digits of a zip code if, according to the current publicly available data from the Bureau of Census (1) the geographic units formed by combining all zip codes with the same three initial digits contains more than 20,000 people; and (2) the initial three digits of a zip code for all such geographic units containing 20,000 or fewer people is changed to 000
[2] Elements of dates and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older
Masking
The masking method of de-identification is replacing sensitive or personally identifiable information with other characters, like using pound symbols or asterisks to hide a name or social security number.
- Example
- Unmasked SSN: 123-45-6789
- Masked SSN: ###-##-####
Record Coding
The record coding method of de-identification is replacing identifiers with unique, temporary codes. A record code cannot be derived from related information about the individual and is only accessible to authorized parties with the authority to re-identify records. Individuals may not disclose any information about how these codes are generated or assigned which would allow a recipient to identify a student based on a code
- Example
- An original data set containing CWIDs as the sole means to identify individual records.
- unique identifier with CWID: 12345678
- A data set with record coding applied for unique identifiers
- unique identifier with record code: 3s8b#c4?lm7
- An original data set containing CWIDs as the sole means to identify individual records.
Tokenization
The tokenization method of de-identification is substituting data with random tokens that can be reversed, allowing for data retrieval and re-identification if needed.