InstaCluster
Building A Big Data Cluster in Minutes
InstaCluster is a tool for building a BigData analytic stack in minutes. The tool is composed by a set of shell and python scripts plus some configuration template for the main supported services. Thequickest way to use the tool is by launching the provided public Amazon Machine Image (AMI) with id: ami-e99171ad available in the N.California (us-west-1) region.
Setup
In order to set up a cluster the user should spawn a number of Slave replicas of the provided AMI, specifying as user provided data your AWS_ACCES_KEY_ID only (follow the syntax shown in the example configuration file).
After Slave instances have been created the user can spawn the Master instance, used to host the ambari server. When launching this instance the user should specify the following parameters:
- AWS_ACCESS_KEY_ID - your aws access key needed to query aws about runnign instances and discover slaves
- AWS_SECRET_ACCESS_KEY - your aws secret key needed to query aws about runnign instances and discover slaves
- AWS_DEFAULT_REGION - the region in which to look for slaves
- REVOKE_KEYS - wether to revoke the key after the slave discovery (useful for many security reasons)
An example of the parameters required for the configuration is: AWS_ACCESS_KEY_ID=498o37f08b9f2vjv0982 AWS_SECRET_ACCESS_KEY=n97375hf28047bv84075gfb128b8b7c21857b AWS_DEFAULT_REGION=us-west-1 REVOKE_KEYS=No
A video showing how to setup a cluster is available here
Services
InstaCluster delegaes to Ambari the provisionign of services, for this reason all of the services supported by Ambari can be used, for a complete list of the services supported by Ambari please refer to the official documentation In addition to these services InstaCluster adds support for the Spark standalone and Hue. These services can be providioned and managed from the Ambari user interface.