The 'No Data Movement' Movement
Organizations are building data lakes and bringing data together from many systems in raw format into these data lakes, hoping to process and extract differentiated value out of this data. Anyone familiar with trying to get value out of operational data, whether on prem or in the cloud, understands the inherent risks and costs associated with moving data from one environment to another. Businesses continuously move data from cheap temporary storage to siloed databases and data warehouses for critical analytical workloads. As part of this movement, companies will ETL this data between environments, all the while 'cleaning and/or preparing' it to be put into a more usable format/solution.
The problems arise (as they inevitably will) when data volume grows, data composition changes and data pipelines fail. There is a real inherent and residual risk of data loss during this process, not to mention data security liability. For instance, GDPR compliance requires “all” data to reside in the “country of origin.” To say that movement is fraught with danger, at every step, is a painful understatement.
The obvious solution to this is to not move data. Store it once, as the source of truth, and extract value where it lives. Over the last several years cloud object storage, particularly AWS S3, has become this system of record. And herein lies the problem and the reasons for movement. Object storage is, frankly, dumb. Sure it is cheap, durable, secure, and elastic, but to derive value, historically you’ve had to ETL the data into external and siloed data analytic solutions.
The simple (self-evident) answer is to just turn storage such as S3 into a database for traditional analytic workloads. While easier said than done, this is exactly and purposely what ChaosSearch has so uniquely achieved - while never requiring customers to move data outside the boundaries of a data center, region or country. Storing data in S3 provides the highest level of security available. It has 11-9's of durability, can be replicated in real time for business continuity, with high availability for disaster recovery. This makes the data always available.
ChaosSearch has a fundamental philosophy: object storage should be 'home base' and the 'source of truth' for all data, and that data should be 100% owned and controlled by the customer. With ChaosSearch, object storage can be transformed into a high performance data analytic platform. And to do this, we created the first UltraHot® solution with our patent pending index technology. Based on this innovation, ChaosSearch created an intelligent compute fabric using AWS EC2, which runs in the same region as the customers’ S3 storage. Our service automates/streamlines the process of taking raw S3 data and cataloguing it, indexing it, and storing it back into the customers’ S3 bucket. All search and queries are performed on this indexed data in the customers’ S3, providing fast, scalable, and extremely cost-effective access.
Sample Configurations of CloudFront log analysis:
In summary, data movement should be a serious concern and consideration when designing a data analytic solution. Minimizing or eliminating data movement altogether should be a high priority for any compliance or security team within an organization. ChaosSearch offers a highly scalable/performant solution with an architecture that supports security and compliance, at a best-in-class price point.