Data Refinery

Clean, prepare, and transform data in your Amazon S3.
Start Free Trial

The Chaos Refinery® software platform cleans, prepares, and virtually transforms data - allowing users to programmatically and visually interact with information as needed.

The Refinery Play

Clean, Prepare and Transform Data

No data movement outside of Amazon S3. Programmatically and dynamically change schema and associated data on the fly.  Visually interact with information as needed, without the cost or complexity of additional services.

Create Logical Indexes and Views

Not only can you transform a field into multiple new fields to query and aggregate, but you can also adjust the schema, changing fields from strings to integers, allowing you to search on ranges. Create an entirely new view into your data, within seconds, all without ever having to spend time and money reindexing your data.

  1. Select data sources
  2. Filter selected sources
  3. Generate source parsing
  4. Define indexing life-cycle
With ChaosSearch, we are able to quickly create materialized views that effectively and accurately parse our various log formats.
Jimmy McDermott CTO at Transeo

Materialize Data

Users can shape (i.e. remove / move) attributes, change attribute schema (i.e. built-in transforms), sort by multiple attributes (i.e. composite keys), and correlate two indexes (i.e. inner / join).

Eliminate Data Movement

No longer do you need to extract data, transform it, and load it (ETL) into a log analysis solution just to get better insights into your logs. Instead you simply push your log and event data into your Amazon S3 data lake and leverage the Chaos Refinery® to perform the virtual transformations needed by each of your users.

Refinery Data Movement_04
No Sharding

Stop Reindexing and Sharding

Legacy search technologies, like Elasticsearch, require re-indexing of data if you want to change the schema. Add a new field, reindex. Change a field type, reindex. Change your mapping, reindex. Correct a mistake in your mapping, reindex. This is time consuming and a nuisance if you only have a few hundred gigabytes of data, but certainly time and cost prohibitive when dealing with 10s or 100s of terabytes of data. And as your index grows you need to shard it across nodes. This requires capacity planning and cluster growth - all of these moving parts demand time, resources, and money.