Install Apache Cassandra

Cassandra is an Open Source Distributed Data Persistence system which is designed for storing and managing large amounts of data across servers.

To quote elevator pitch by Eben Hewitt

Apache Cassandra is an open source, distributed, decentralized, elastically scalable, highly available, fault-tolerant, tuneably consistent, column-oriented database that bases its distribution design on Amazon’s Dynamo and its data model on Google’s Bigtable. Created at Facebook, it is now used at some of the most popular sites on the Web.
Installing Cassandra

Cassandra can be installed in most popular  operating systems Windows (vista/XP/7/8), Mac OSX, and Linux variants such as Ubuntu, Red Hat, and CentOS.

Ideally all platforms which have JVM 1.6 or higher should be just fine. There is a Debian Packaging or a third party RPM distribution by DataStax.

There is a more popular way of downloading the tar distribution , follow this article by downloading and then un packing binary distribution from Apache website.

I am demonstrating this in windows 7.

Once un packed the folder structure should look like this.

cassandra folder structure

I generally create an environment variable JAVA_HOME which points to java jdk and CASSANDRA_HOME which points to the root directory of Cassandra as shown above in the screen shot.

Quick details about the directory structure under Cassandra root
  1. bin

    As the convention has it, this directory contains executable(batch and shell) to run Cassandra, along with the startup scripts and the nodetool utility. It also has scripts for converting SSTables (the datafiles) to JSON and back.

  2. conf

    This folder contains configurations for Cassandra. Storage-conf.xml file allows to create data store by configuring keyspace and column families,cassandra.yaml and SH file are for configuring cassandra and the environemnt, log4j files are to configure logging levels.

  3. interface

    This folder contains an RPC Description file defining Cassandra's interface in cassandra.thrift

  4. javadoc

    Contains Standard Javadoc API documentation for Cassandra.Cassandra is a wonderful project, but the code contains precious few comments, so you might find the JavaDoc’s usefulness limited. It may be more fruitful to simply read the class files directly if you’re familiar with Java.

  5. lib

    This folder contains all the dependecies and libraries which Cassandra needs, there are json parsers and google's collection libraries.

There is a git repo with source available git clone git://git.apache.org/cassandra.git

Before starting Cassandra

By default, Cassandra uses the following directories for data and commitlog storage (this is linux filesystem):

/var/lib/cassandra
/var/log/cassandra

In the past (version <= 0.6), Cassandra use to have a file called storage-conf.xml, however from 0.7 all log related stuff goes in log4j-tools.properties,in windows I recommend changing them appropriately.

If in Linux/Unix machine, you may have to give rights to the folder which Cassandra creates.
 cd apache-cassandra-$VERSION

 sudo mkdir -p /var/log/cassandra

 sudo chown -R `whoami` /var/log/cassandra

 sudo mkdir -p /var/lib/cassandra

 sudo chown -R `whoami` /var/lib/cassandra

If you like to configure other variables DataStax has a nice description of what goes in the cassandra.yaml

Start Cassandra in windows

open command prompt, and type

 cd %CASSANDRA_HOME%/bin
cassandra.bat -f

Start cassandra in *nix

cd $CASSANDRA_HOME
sh bin/cassandra -f/

You should see something like this (Window screen shot)

start apache cassandra

Hope this helps

admin

Install Apache Cassandra  by  admin