Amazon Web Services, the cloud computing subsidiary of the Internet retail giant, recently announced Amazon Kinesis, a fully managed service for real time processing of high volume data streaming in to Amazon's Web-based repositories. The intro was made at the Re:Invent conference held in Las Vegas.
Amazon Kinesis allows a customer to store and process terabytes of data an hour from hundreds of thousands of sources as the data is coming in. This allows developers to create applications that act on real-time data, such as Web site traffic, marketing and financial transactions, social media feeds, or logs.
Kinesis is capable of accepting any amount of data, from any number of sources, scaling up and down as needed. The client library handles load balancing, coordination, and error handling, doing the background work, so the developer only needs to focus on processing the data as it becomes available.
Written entirely in Java, all a developer has to do is add the Amazon Kinesis Client Library to a Java application and you will be notified when new data is available for processing. It also integrates with third-party products, so developers can use their preferred method of data processing apps.
A Kinesis-enabled application runs on Amazon EC2 instances of the customer's choice and includes apps like real-time dashboards that show how well a Web advertising campaign is going, alerts that fire when errors are detected in a server log, and tools to aggregate and transform real-time data before loading that data into a Hadoop cluster or a data warehouse like Amazon Redshift, according to Ryan Waite, general manager for data services at AWS.
Kinesis requires two applications -- a "Producer" and a "Worker." The Producer takes data from a source and converts it into a Kinesis stream, a continuous flow of 50-kilobyte data chunks sent in the form of HTTP PUTs. The Worker then takes the data from the Kinesis Stream and does whatever processing is required.
Waite said the system could also work with front-end applications such as Chartio, where a Kinesis feed could create simple, moving charts based on real-time data. If this stream contained the Twitter firehose, then Chartio could generate charts on trending topics on Twitter. "A company using these charts could have a real time understanding of what their customers think about their products," said Waite.
Paul Burns, an analyst focused on cloud computing and president of the firm Neovise, saw demos of Kinesis while at Amazon's Re:Invent conference. He felt Kinesis will help developers more easily create big data apps.
"Right now there's a few ways of doing it. Sometimes people spend hours or days just collecting the data, then coming back and processing it, so it’s out of date. Obviously that's not a good way to do it. The other way is to build your own software with this streaming capability. There are some open source solutions out there but you need to install and tweak it and have infrastructure and data center. So Amazon said we'll take care of all that for you, just write your own program and connect to us," he said.
Right now, the one potential sticky issue is where the apps are deployed. Kinesis is deployed on Amazon services and does its processing on Amazon. Only after the data is processed can it be sent to another data store, like Hadoop. So if you decide to take a big data service internally after deploying on Amazon, you might be out of luck.
Waite said Amazon is building Kinesis Connectors that make it easy to send data to Amazon Redshift, Amazon DynamoDB, and Amazon S3, and the company will provide connectors to other data sources in the future. "We're providing these connectors as source code so our customers can modify the connectors to suit their needs," he said.
But if you move bring your big data apps internally, you're a little stuck. You'd have to go build your own internal Kinesis. "To some degree it will handcuff people, but I think these are the kind of people who will be aware of that up front," said Burns. "They know they want to build this kind of app and be up and running in a week. If they really wanted to, they'd have to develop their own software," he said.
An early access developer edition is available now.