Data Retention Policy Guide
Data retention policy will become a major focus for CIOs in 2021. Here’s why:
First, enterprise organizations are producing larger volumes of data than ever before and utilizing enterprise data across a wider range of business processes and applications. To maximize its value, this data must be managed effectively throughout its entire life cycle - from collection and storage, through to usage, archiving, and eventually deletion.
As the volume, variety, and velocity of big data increases, cost-conscious organizations must be more selective about which data is retained, where it will be retained, and for how long. An effective data retention policy ensures that data is available for its intended applications, stored in a cost-effective way across its entire life cycle, and properly destroyed when it is no longer needed.
Second, lawmakers are continuing to introduce regulations that create new data retention obligations for enterprises operating in jurisdictions around the world. These include the data retention requirements in the European GDPR and industry-specific data retention requirements in the HIPPA Act. An effective data retention policy is necessary to ensure ongoing compliance with data security, privacy, and retention laws that apply to your organization.
In this week’s blog post, we’re taking a closer look at the importance of data retention policies and how organizations can create and implement a data retention policy that supports key business processes and compliance objectives.
What is Data Retention?
Data retention is the practice of storing, archiving or otherwise retaining data to support internal business processes (e.g. analytics, auditing) and/or comply with external laws/regulations.
Data retention can be understood in the context of the data lifecycle, a model or roadmap for enterprise data utilization. This model has been described in different ways by various publications, but here’s our simplified version:
- Data Generation - Data is generated in the course of doing business.
- Data Collection - Data is collected by the organization.
- Data Storage - Data is stored by the organization in structured, unstructured, or semi-structured format.
- Data Processing - Data is cleaned, transformed, and normalized before it can be analyzed.
- Data Analysis/Utilization - Data is analyzed, visualized, or otherwise utilized by the organization.
- Data Archiving - Data is retained for future use in business processes or to comply with data retention laws.
- Data Destruction - Data is deleted or destroyed when it is no longer useful to the organization, and/or to comply with data retention laws.
Through the practice of data retention, organizations manage the activities of data archiving and data destruction in accordance with their business needs and objectives, and regulatory requirements.
What is a Data Retention Policy?
A data retention policy is a document that establishes requirements and guidelines within an organization for archiving, retaining, and destroying enterprise data. The policy should clarify questions such as:
- What categories or types of data are generated by the organization?
- Which departments or managers are in control of which data?
- What are the regulatory requirements (for every jurisdiction) for each type of data?
- Where should data be stored, archived, or retained?
- How long should data be retained by the organization?
- When should data be destroyed by the organization?
An effective data retention policy ensures that:
- Data is available and accessible when needed for internal business processes, such as analytics, reporting, or financial auditing.
- Data is stored in the most cost-effective format based on its intended purpose and how frequently it will be accessed.
- Data is retained for the appropriate time frame when required by law.
- Data is deleted or destroyed when it is no longer needed by the organization, or as required by law.
To achieve its intended purpose, a corporate data retention policy should account for the people, processes, and technologies required to ensure that enterprise data is archived and destroyed as needed to meet the organization’s business objectives and legal obligations.
Next, we’ll outline our seven-step process for creating and implementing a data retention policy within your organization.
How to Create a Data Retention Policy
1. Identify Key Data Owners
Data is often siloed in departments - the sales team has ownership of sales data, the accounting department owns financial data, the HR department owns staffing data, the IT department owns log data, and so forth.
Creating a data retention policy begins with identifying key data owners within your organization, getting their buy-in, and assembling a project team that represents all data owners. Each department or data owner will be responsible for managing their data in compliance with the data retention policy, so it’s important to get everyone invested and involved in the process.
2. Sort Data into Categories
The next step in creating your data retention policy is to inventory your data. Make a list of all types of data generated by your organization. When your list is complete, sort the data into categories based on where the data is generated or its intended use.
Some common categories of enterprise data could include:
- Staffing Data (e.g. employee earnings, commissions, medical records)
- Payroll Data (e.g. time cards, payroll deductions, pension records)
- Transactional Data (e.g. purchases, invoices, payments)
- Event Log Data (e.g. application logs, system logs, security logs)
- Credit Card Data (e.g. Primary Account Number (PAN), cardholder name, service code)
- Insurance Data (e.g. policies, releases and settlements, claims)
- Project Documents (e.g. reports, correspondence, images)
- Corporate Data (e.g. annual reports, committee minutes, board minutes)
Each category of data you identify may be subject to different data retention laws and business requirements. To account for this, you’ll need to conduct research and implement individualized data retention policies for each type of data your organization collects.

READ: How HubSpot Moved from 5 to 30 Days of Data Retention with ChaosSearch
3. Conduct a Policy & Business Needs Assessment for Each Data Category/Type
Once you have identified and categorized your enterprise data, the next step is to conduct a policy assessment. For each data category, you will conduct research to determine the following:
Business Needs
- How are we using this data?
- Which data types are supporting internal business processes?
- Where should the data be stored?
- What is the useful life of the data?
- What should happen after that?
Compliance Needs
- What regulations or laws apply to data in this category?
- Which data types are affected?
- Where should the data be stored?
- What are the associated data retention requirements?
- What should happen after that?
In making these assessments, you will develop an understanding of how your organization is utilizing its data. You may even uncover some new and valuable use cases for the data you’re already collecting. Finally, you’ll be able to identify the specific data retention requirements that apply to your data - remember to check regulations in every jurisdiction where you operate!
Here’s how this might look for a data category we’re familiar with: event log data.
Business Needs Assessment Template
Data Category |
Business Use Cases |
Affected Data Types |
Storage Location |
Data Retention Period |
Data Disposal Policy |
Event Log Data |
|
All Types (Application logs, security logs, setup logs, system logs, user logs, routing logs, etc.) |
Amazon S3 Cloud Storage |
90 days |
Delete |
Compliance Needs Assessment Template
Data Category |
Compliance Requirements |
Affected Data Types |
Storage Location |
Data Retention Period |
Data Disposal Policy |
Event Log Data |
Application, security, and user logs from systems containing ePHI. |
Amazon S3 Cloud Storage |
6 years |
Delete |
|
Event Log Data |
Security logs |
Amazon S3 Cloud Storage |
1 year |
Delete |
4. Write a Data Retention Policy for Each Data Type
At this point, you’ve categorized all of your enterprise data and investigated business use cases and regulatory compliance requirements for each data type. Based on the information you found, you’ve determined where the data should be stored, how long it should be retained, and if/when it should be destroyed.
Now you can start writing data retention policies for each type of data you collect.
Your data retention policy for each data type should include the following:
- Data Category
- Data Owner - The role/individual responsible for managing this data in compliance with the retention policy. Some or all data types in the same category may have the same data owner.
- Data Storage Location - The storage location for the data, e.g. On-prem servers, Amazon S3 buckets.
- Data Retention Period - The desired retention period for the data, based on your policy/business needs assessment.
- Data Disposal Policy - A disposal policy for the data, indicating whether it should be archived or destroyed at the end of its lifecycle.
Your data retention policy should also include general guidelines for things like revision histories and policy exemptions, as well as a communication plan for data retention issues. You may also want to document a plan for enforcing compliance with your data retention policy.
5. Establish SOPs for Retaining, Archiving, and Destroying Data
Standard operating procedures (SOPs) describe the processes and technologies that your organization will use to store, retain, archive, and destroy data in compliance with your documented data retention policy. Data retention policies may be executed by human operators or automated using software technology and services.
Public cloud vendors like AWS offer services that help organizations automate their data retention policies in the cloud. Two examples are Amazon S3 Intelligent Tiering, which automatically transfers data between cost-optimized storage tiers based on user access patterns, and Amazon S3 Object Expiration, a feature that makes it easy for data owners to schedule the deletion of data objects in S3 buckets.
New technologies like ChaosSearch can also be used to support a cloud data retention strategy. ChaosSearch is a cloud data platform that uses a proprietary indexing technology to greatly reduce the amount of storage required in S3 for a full, searchable representation of data, eliminating the need for additional data movement. ChaosSearch also has its own data retention features that can be used to augment your existing S3 retention policies and automation.
The ability to index, search, and analyze log data at scale with ChaosSearch means that organizations can retain their log data for longer periods while fully realizing its value through applications like security monitoring and cloud log analysis.
6. Implement Your Data Retention Policy
At this point, you should have everything in place to successfully implement your data retention policy. As the final step, you’ll need to start implementing your data retention policy and working to ensure compliance throughout your organization.
You’ll need to communicate the new policies and expectations to department leaders and their teams, ensure that data owners understand any new responsibilities, explain the importance of compliance, implement any new processes or technologies required to support your data retention strategy, and provide additional training as needed.
Depending on the size of your organization, implementing your data retention policy can take years. We recommend focusing on your biggest compliance priorities first and finding quick wins that can help energize stakeholders and sustain momentum as you work through the implementation process.
A Final Word on Enterprise Data Retention
Big data retention is getting more complex, with organizations facing competitive pressures to maximize the value of their data and regulatory pressures to comply with a growing number of data retention laws.
By establishing and implementing a comprehensive data retention policy, organizations can drive down their data storage costs while maximizing the value of data and achieving compliance with local and international data retention laws and regulations.