I've been wanting to try out Cassandra for a while now, ever since I heard how easy replication is. Let's start with the basics:

  • Cassandra is a part of Apache Hadoop

  • Cassandra is a NoSQL database

  • Cassandra needs Java

  • Cassandra has two ways to manipulate data - The thrift language and also CQL

  • Cassandra can use CQL (Cassandra Query Language) for it's queries - parts of it SQL but it's not the same thing

Requirements:

  • Java needs to be installed already. If you don't have it already installed, follow my post here, steps 1-4, to install OpenJDK Note: Instead of steps 1-3 you can use this this script

1. Download Cassandra in the /tmp directory: (16mb)

cd /tmp ; wget ftp://apache.cs.utah.edu/apache.org/cassandra/1.2.4/apache-cassandra-1.2.4-bin.tar.gz

2. Create a directory to untar it to - based on http://wiki.apache.org/cassandra/GettingStarted Cassandra's default settings expect it to be under /var/lib/cassandra. It's easy to change those.

mkdir /opt

Untar it - in this case we untar to the directory we previously created: /opt

tar -xzf apache-cassandra-1.2.4-bin.tar.gz -C /opt

4. Now let's go and edit the directories that Cassandra expects to find. To do that we need to edit the file: /opt/apache-cassandra-1.2.4/conf/cassandra.yaml

ee /opt/apache-cassandra-1.2.4/conf/cassandra.yaml

5. Change the lines 106-107:

data_file_directories: - /var/lib/cassandra/data

to:

data_file_directories: - /opt/cassandra/data

6. Change line 110

commitlog_directory: /var/lib/cassandra/commitlog

to:

commitlog_directory: /opt/cassandra/commitlog

7. Change line 188:

saved_caches_directory: /var/lib/cassandra/saved_caches

to:

saved_caches_directory: /opt/cassandra/saved_caches

8. Now create the above 3 directories:

mkdir -p /opt/cassandra/data 
mkdir -p /opt/cassandra/commitlog 
mkdir -p /opt/cassandra/saved_caches

9. By default Cassandra also expects it's logs to go under: /var/log/cassandra/ This will work fine with FreeBSD but the directory needs to be created - create it with:

mkdir /var/log/cassandra

If your needs are different and you need to change this setting you can change that in file: /opt/apache-cassandra-1.2.4/conf/log4j-server.properties in line: 35

log4j.appender.R.File=/var/log/cassandra/system.log

10. Set the PATH for cassandra so you don't have to change to the correct path all the time:

set path=(/opt/apache-cassandra-1.2.4/bin $path)

11. Make the PATH change permanent so everytime you login it's still set:

echo 'set path=(/opt/apache-cassandra-1.2.4/bin $path)' >> ~/.cshrc

12. Start cassandra with the -f switch to have it run in the foreground - recommended for the first run so you can see whether or not you get any errors

cassandra -f

Note: If you get an error like this:

Error: Exception thrown by the agent : java.net.MalformedURLException: Local host name unknown: java.net.UnknownHostException: weirdbricks: weirdbricks

The solution is here: http://www.wowza.com/forums/showthread.php?337-Malformed-URL-exception

In my /etc/hosts I had this entry:

127.0.0.1 localhost localhost.my.domain

I replaced it with:

127.0.0.1 weirdbricks localhost localhost.my.domain

and tried the command again:

cassandra -f

If all went well you will get loads of text but the end should look like this:

INFO 18:22:28,485 Binding thrift service to localhost/127.0.0.1:9160
INFO 18:22:28,553 Using TFramedTransport with a max frame size of 15728640 bytes.
INFO 18:22:28,569 Using synchronous/threadpool thrift server on localhost : 9160
INFO 18:22:28,572 Listening for thrift clients... 
INFO 18:22:38,482 Created default superuser 'cassandra'

13. OK - now use CTRL+C to stop it and restart without the -f flag so it runs in the background:

cassandra

Note: If you get stuck here:

INFO 18:31:16,933 Node localhost/127.0.0.1 state jump to normal 
INFO 18:31:16,944 Startup completed! Now serving reads.

Just press ENTER to continue - then you'll be back at the shell.

14. Make sure Cassandra is running - check the open ports:

sockstat -4 | grep 9160

You should get something like this:

root java 1163 55 tcp4 127.0.0.1:9160 *:*

15. Now connect with the cassandra-cli

cd /opt/apache-cassandra-1.2.4/bin 
./cassandra-cli

You should see something like this:

Connected to: "Test Cluster" on 127.0.0.1/9160 
Welcome to Cassandra CLI version 1.2.4 
Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit.

16. To make Cassandra start up on boot let's write a nice one-liner and put it in a script:

Save the script under: /opt/apache-cassandra-1.2.4/bin/cassandra-boot.sh and add the following code:

#!/bin/csh 
#set the cassandra dir 
set cassandra_dir=/opt/apache-cassandra-1.2.4/bin
#set the cassandra log dir
set cassandra_log_dir=/var/log/cassandra 
#set the boot log file including a date 
set boot_log_file=$cassandra_log_dir/boot-`date +%m-%d-%Y-%H%M%S`.log
#checks if cassandra is already running 
set cassandra_running=`ps auxwww | grep java | grep cassandra | grep -v grep | wc -l` 
if ($cassandra_running >= 1) then 
  logger -s "Cassandra already running!"
  exit 1 
endif 
logger -s "Cassandra starting.."
#starts cassandra 
$cassandra_dir/cassandra -p /var/run/cassandra.pid > $boot_log_file
#checks if cassandra started successfully 
if ($? == 0) then 
  logger -s "Cassandra started - PID: `cat /var/run/cassandra.pid`"
else logger -s "Cassandra did not start!" 
endif 
#housekeeping - delete files older than 3 days
find $cassandra_log_dir -mtime +3 -exec rm {} ;

17. Edit your crontab

ee /etc/crontab

Add the following:

@reboot root /opt/apache-cassandra-1.2.4/bin/cassandra-boot.sh

Now on the next reboot Cassandra will start automatically!

18. Ooops, almost forgot to make the script bootable!

chmod +x /opt/apache-cassandra-1.2.4/bin/cassandra-boot.sh

19. To kill Cassandra type:

kill `cat /var/run/cassandra.pid`

20. To restart Cassandra type from anywhere:

cassandra-boot.sh

References:

http://wiki.apache.org/cassandra/GettingStarted http://wiki.apache.org/cassandra/RunningCassandra