Counting 2,899 Big Data & Machine Learning Frameworks, Toolsets, and Examples...
Suggestion? Feedback? Tweet @stkim1

Author
Last Commit
Jun. 22, 2018
Created
May. 21, 2009

Apache Cassandra

Apache Cassandra is a highly-scalable partitioned row store. Rows are organized into tables with a required primary key.

Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster.

Row store means that like relational databases, Cassandra organizes data by rows and columns. The Cassandra Query Language (CQL) is a close relative of SQL.

For more information, see the Apache Cassandra web site.

Requirements

  1. Java >= 1.8 (OpenJDK and Oracle JVMS have been tested)

  2. Python 2.7 (for cqlsh)

Getting started

This short guide will walk you through getting a basic one node cluster up and running, and demonstrate some simple reads and writes. For a more-complete guide, please see the Apache Cassandra website’s Getting Started Guide.

First, we’ll unpack our archive:

$ tar -zxvf apache-cassandra-$VERSION.tar.gz
$ cd apache-cassandra-$VERSION

After that we start the server. Running the startup script with the -f argument will cause Cassandra to remain in the foreground and log to standard out; it can be stopped with ctrl-C.

$ bin/cassandra -f

Note for Windows users: to install Cassandra as a service, download Procrun, set the PRUNSRV environment variable to the full path of prunsrv (e.g., C:\procrun\prunsrv.exe), and run "bin\cassandra.bat install". Similarly, "uninstall" will remove the service.

Now let’s try to read and write some data using the Cassandra Query Language:

$ bin/cqlsh

The command line client is interactive so if everything worked you should be sitting in front of a prompt:

Connected to Test Cluster at localhost:9160.
[cqlsh 2.2.0 | Cassandra 1.2.0 | CQL spec 3.0.0 | Thrift protocol 19.35.0]
Use HELP for help.
cqlsh>

As the banner says, you can use 'help;' or '?' to see what CQL has to offer, and 'quit;' or 'exit;' when you’ve had enough fun. But lets try something slightly more interesting:

cqlsh> CREATE SCHEMA schema1
       WITH replication = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
cqlsh> USE schema1;
cqlsh:Schema1> CREATE TABLE users (
                 user_id varchar PRIMARY KEY,
                 first varchar,
                 last varchar,
                 age int
               );
cqlsh:Schema1> INSERT INTO users (user_id, first, last, age)
               VALUES ('jsmith', 'John', 'Smith', 42);
cqlsh:Schema1> SELECT * FROM users;
 user_id | age | first | last
---------+-----+-------+-------
  jsmith |  42 |  john | smith
 cqlsh:Schema1>

If your session looks similar to what’s above, congrats, your single node cluster is operational!

For more on what commands are supported by CQL, see the CQL reference. A reasonable way to think of it is as, "SQL minus joins and subqueries, plus collections."

Wondering where to go from here?

  • Join us in #cassandra on irc.freenode.net and ask questions

  • Subscribe to the Users mailing list by sending a mail to [email protected]pache.org

  • Visit the community section of the Cassandra website for more information on getting involved.

  • Visit the development section of the Cassandra website for more information on how to contribute.

Latest Releases
cassandra-3.11.2
 Feb. 14 2018
cassandra-3.0.16
 Feb. 13 2018
cassandra-3.11.1
 Oct. 2 2017
cassandra-3.0.15
 Oct. 2 2017
cassandra-3.11.0
 Jun. 19 2017