ChaosSearch Blog - Tips for Wrestling Your Data Chaos

Getting Started with Cloudflare Logpush and CHAOSSEARCH

Written by Pete Cheslock | Jun 11, 2019

Cloudflare is one of the world’s largest cloud network platforms. They provide their customers with a robust content distribution network (CDN), as well as the ability to protect their customers’ web properties from distributed denial of service attacks (DDoS), illegitimate bot traffic, and a slew of advanced edge security features. Enterprise customers on the platform get access to all of the raw logs of their network traffic, which can be invaluable when debugging customer issues, security incidents, and understanding user access and growth scaling patterns.

In the past, for an enterprise customer to get access to their logs, they would need to use the Logpull APIs to continually download their Cloudflare logs and send them off to another data storage location to be processed and indexed by a log management utility. It's common for many of our customers who are Cloudflare users to push their logs to Amazon S3 after downloading, and then use tools like the Logstash S3 input plugin to ingest them into their existing ELK stack. For companies with a large amount of traffic to their sites, this can quickly become cost prohibitive.

Cloudflare now has early access to their new Logpush API, which greatly simplifies the analysis of your Cloudflare logs. You can configure Cloudflare to automatically send GZIP JSON files of your access logs directly to either Amazon S3 or Google Cloud Storage. This saves the step of a user having code continually downloading and uploading to an object store, but it doesn’t solve the cost and complexity of storing all this data in a hot Elasticsearch cluster.

With CHAOSSEARCH, we integrate within minutes to your Cloudflare log bucket, index your data, and write those indexes back into YOUR Amazon S3 account - providing you Elasticsearch API and a fully integrated Kibana interface without ever having to move your Cloudflare logs.

You can get this all set up in just 4 simple steps. First, input the name of the bucket in your AWS account that you plan on having your logs land in.

Then on the AWS S3 bucket properties, copy and paste over the bucket policy given to you by Cloudflare. This policy enables Cloudflare to write your log files to this bucket.

As an added security measure, Cloudflare will write a special file to your bucket with an ownership token. Simply copy and paste the token to prove that you are the owner of the bucket.

Finally - select which data fields you would like to include in the log. Since CHAOSSEARCH has advanced compression technology and will index all fields by default, there is no storage penalty to having all the fields enabled.

Now that our Logpush service is enabled, we’ll start seeing our logs landing in our Amazon S3 account, grouped together in a single prefix per day.

From here I can go from raw data to insights within minutes by integrating the CHAOSSEARCH service with these logs on my Amazon S3 account. First, I will create an object group, which is a logical grouping of all the logs that I want to include within an index. In this case, the only logs that exist within this bucket are my Cloudflare logs so I can leave the discovery regex as the default wildcard.

CHAOSSEARCH will automatically identify this data as GZIP JSON files and will let you configure this bucket for live indexing. CHAOSSEARCH uses the native Amazon SQS notifications from S3 on every single PUT request that Cloudflare makes. This means as objects land in your bucket, CHAOSSEARCH will continually index this data as soon as Cloudflare writes the file.

Because the CHAOSSEARCH platform separates storage from compute - we can scale up indexing immediately and process huge amounts of data without the legacy constraints of a distributed database like Elasticsearch. And because of this we can now go into the CHAOSSEARCH fully integrated Kibana interface and start getting answers from our data.

I can now create a series of visualizations to help me understand my usage patterns, such as identifying the IPs that most requests are coming from.

I can quickly see which were the top user agents that were hitting my site.

And in the event of a more targeted attack on my corporate web properties, I can see which IP addresses are consuming the most amount of data from the edge.  

I can use the native Kibana functionality to now group all these visualizations together in a single dashboard - giving my engineers quick access to all these insights.

Due to the power of the CHAOSSEARCH platform, we can go from raw data directly to insights within minutes. We don’t need to spend any time creating index mappings or figuring out what the appropriate schema layout needs to be. No longer do you need to move your data into an ELK stack, or ETL into a data analytics platform. CHAOSSEARCH provides you the ability to retain and query an unlimited amount of data for an unlimited amount of time leveraging the power and cost-effectiveness of Amazon S3.

If you are a Cloudflare customer and are looking to get more insights from your logs or reign in the cost of your growing Elasticsearch cluster reach out today for a free trial and see how quickly you can get answers to your data.