1. First upgrade your software as is with the following two commands (just for good measure):
sudo apt-get update
sudo apt-get upgrade
2. Now, open up your Debian package sources list with Nano for editing using the following command:
sudo nano /etc/apt/sources.list
3. Next, add the following sources to your /etc/apt/sources.list file.
deb http://www.apache.org/dist/incubator/cassandra/debian unstable main
deb-src http://www.apache.org/dist/incubator/cassandra/debian unstable main
After you add these two lines, press cntrl+X to close Nano. It’ll ask
“Save modified buffer?” Press Y. Press Enter when Nano asks “File Name
to Write.”
4. Run the update to install Casandra with this command:
sudo apt-get update
5. ERROR! At this point you receive an error similar to this:
W: GPG error: http://www.apache.org unstable Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY F758CE318D77295D
6. Use the following three commands to ignore the signature error, and continue installing:
NOTE: You must replace the key value ‘F758CE318D77295D’ with the key value you received in your error message.
gpg --keyserver wwwkeys.eu.pgp.net --recv-keys F758CE318D77295D
sudo apt-key add ~/.gnupg/pubring.gpg
sudo apt-get update
7. Install Cassandra:
sudo apt-get install cassandra
8. Next you need to change Cassandra’s default port number
from 8080 to something else, because the 8080 port typically conflicts
with SSH terminal connections. Use Nano to open up the Cassandra
configuration file using the following command:
sudo nano /usr/share/cassandra/cassandra.in.sh
9. Then change the port number 8080 on the following line to 10036, and save the file:
-Dcom.sun.management.jmxremote.port=10036 \
10. Start Cassandra with the command:
/etc/init.d/cassandra start
Once you have Cassandra running, test it with Cassandra’s command line tool CLI.
Starting the CLI
You can start the CLI using the bin/cassandra-cli script in your Cassandra installation (bin\cassandra-cli.bat
on windows). If you are evaluating a local cassandra node then be sure
that it has been correctly configured and successfully started before
starting the CLI.
If successful you will see output similar to this:
Welcome to cassandra CLI.
Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit.
You must then specify a system to connect to:
connect localhost/9160;
Creating a Keyspace
We first create a keyspace to run our examples in.
create keyspace Twissandra;
Selecting the keyspace to user
We must then select our example keyspace as our new context before we can run any queries.
use Twissandra;
To Create A Column
We can then create a column to play with.
create column family User with comparator = UTF8Type;
For
the later examples to work you must also update the schema using the
following command. This will set the return type for the first and last
name to make them human readable. It will also add and index for the age
field so that you filter your gets using the Users name field.
update column family User with
column_metadata =
[
{column_name: first, validation_class: UTF8Type},
{column_name: last, validation_class: UTF8Type},
{column_name: age, validation_class: UTF8Type, index_type: KEYS}
];
To Add Data
To add data
we want to into our new column we must first specify our default key
type otherwise we would have to specify it for each key using the format
[utf8('keyname')] this is probably advisable if you have mixed key types but makes simple cases harder to read.
So we run the command below, which will last the length of you cli session. On quitting and restarting we must run it again.
assume User keys as utf8;
and then we add our data.
set User['jsmith']['first'] = 'John';
set User['jsmith']['last'] = 'Smith';
set User['jsmith']['age'] = '38';
If you get the error like this cannot parse 'John' as hex bytes,
then it likely you either haven't set your default key type or you
haven't updated your schema as in the create column example.
To Update Data
If we need to update a value we simply set it again.
set User['jsmith']['first'] = 'Jack';
To Get Data
Now let's read back the jsmith row to see what it contains:
get User['jsmith'];
To Query Data
get User where age = '12';
For help
help;
To Quit
quit;
To Execute Script
bin/cassandra-cli -host localhost -port 9160 -f script.txt
Getting Started Using the Cassandra CLI
The Cassandra CLI client utility can be used to do basic data
definition (DDL) and data manipulation (DML) within a Cassandra cluster.
It is located in
/usr/bin/cassandra-cli in packaged installations or
<install_location>/bin/cassandra-cli in binary installations.
To start the CLI and connect to a particular Cassandra instance, launch the script together with
-host and
-port options. It will connect to the cluster name specified in the
cassandra.yaml file (which is
Test Cluster by default). For example, if you have a single-node cluster on
localhost:
$ cassandra-cli -host localhost -port 9160
Or to connect to a node in a multi-node cluster, give the IP address of the node:
$ cassandra-cli -host 110.123.4.5 -port 9160
To see help on the various commands available:
For detailed help on a specific command, use
help <command>;. For example:
[default@unknown] help SET;
Note
A command is not sent to the server unless it is terminated by a semicolon (;). Hitting the return key without a semicolon at the end of the line echos an ellipsis ( . . . ), which indicates that the CLI expects more input.
Creating a Keyspace
You can use the Cassandra CLI commands described in this section to
create a keyspace. In this example, we create a keyspace called
demo, with a replication factor of 1 and using the
SimpleStrategy replica placement strategy.
Note the single quotes around the string value of
placement_strategy:
[default@unknown] CREATE KEYSPACE demo
with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
and strategy_options = [{replication_factor:1}];
You can verify the creation of a keyspace with the
SHOW KEYSPACES command. The new keyspace is listed along with the
system keyspace and any other existing keyspaces.
Creating a Column Family
First, connect to the keyspace where you want to define the column family with the
USE command.
[default@unknown] USE demo;
In this example, we create a
users column family in the
demo keyspace. In this column family we are defining a few columns;
full_name,
email,
state,
gender, and
birth_year. This is considered a
static column family - we are defining the column names up front and most rows are expected to have more-or-less the same columns.
Notice the settings of
comparator,
key_validation_class and
validation_class.
These are setting the default encoding used for column names, row key
values and column values. In the case of column names, the comparator
also determines the sort order.
[default@unknown] USE demo;
[default@demo] CREATE COLUMN FAMILY users
WITH comparator = UTF8Type
AND key_validation_class=UTF8Type
AND column_metadata = [
{column_name: full_name, validation_class: UTF8Type}
{column_name: email, validation_class: UTF8Type}
{column_name: state, validation_class: UTF8Type}
{column_name: gender, validation_class: UTF8Type}
{column_name: birth_year, validation_class: LongType}
];
Next, create a
dynamic column family called
blog_entry.
Notice that here we do not specify column definitions as the column
names are expected to be supplied later by the client application.
[default@demo] CREATE COLUMN FAMILY blog_entry
WITH comparator = TimeUUIDType
AND key_validation_class=UTF8Type
AND default_validation_class = UTF8Type;
Creating a Counter Column Family
A counter column family contains counter columns. A counter column is
a specific kind of column whose user-visible value is a 64-bit signed
integer that can be incremented (or decremented) by a client
application. The counter column tracks the most recent value (or count)
of all updates made to it. A counter column cannot be mixed in with
regular columns of a column family, you must create a column family
specifically to hold counters.
To create a column family that holds counter columns, set the
default_validation_class of the column family to
CounterColumnType. For example:
[default@demo] CREATE COLUMN FAMILY page_view_counts
WITH default_validation_class=CounterColumnType
AND key_validation_class=UTF8Type AND comparator=UTF8Type;
To insert a row and counter column into the column family (with the initial counter value set to 0):
[default@demo] INCR page_view_counts['www.datastax.com'][home] BY 0;
To increment the counter:
[default@demo] INCR page_view_counts['www.datastax.com'][home] BY 1;
Inserting Rows and Columns
The following examples illustrate using the
SET command to insert columns for a particular row key into the
users column family. In this example, the row key is
bobbyjo and we are setting each of the columns for this user. Notice that you can only set one column at a time in a
SET command.
[default@demo] SET users['bobbyjo']['full_name']='Robert Jones';
[default@demo] SET users['bobbyjo']['email']='bobjones@gmail.com';
[default@demo] SET users['bobbyjo']['state']='TX';
[default@demo] SET users['bobbyjo']['gender']='M';
[default@demo] SET users['bobbyjo']['birth_year']='1975';
In this example, the row key is
yomama and we are just setting some of the columns for this user.
[default@demo] SET users['yomama']['full_name']='Cathy Smith';
[default@demo] SET users['yomama']['state']='CA';
[default@demo] SET users['yomama']['gender']='F';
[default@demo] SET users['yomama']['birth_year']='1969';
In this example, we are creating an entry in the
blog_entry column family for row key
yomama:
[default@demo] SET blog_entry['yomama'][timeuuid()] = 'I love my new shoes!';
Note
The Cassandra CLI uses a default consistency level of
ONE for all write and read operations. Specifying different consistency
levels is not supported within Cassandra CLI.
Reading Rows and Columns
Use the
GET command within Cassandra CLI to retrieve a particular row from a column family. Use the
LIST command to return a batch of rows and their associated columns (default limit of rows returned is 100).
For example, to return the first 100 rows (and all associated columns) from the
users column family:
[default@demo] LIST users;
Cassandra stores all data internally as hex byte arrays by default.
If you do not specify a default row key validation class, column
comparator and column validation class when you define the column
family, Cassandra CLI will expect input data for row keys, column names,
and column values to be in hex format (and data will be returned in hex
format).
To pass and return data in human-readable format, you can pass a value through an encoding function. Available encodings are:
- ascii
- bytes
- integer (a generic variable-length integer type)
- lexicalUUID
- long
- utf8
For example to return a particular row key and column in UTF8 format:
[default@demo] GET users[utf8('bobby')][utf8('full_name')];
You can also use the
ASSUME
command to specify the encoding in which column family data should be
returned for the entire client session. For example, to return row keys,
column names, and column values in ASCII-encoded format:
[default@demo] ASSUME users KEYS AS ascii;
[default@demo] ASSUME users COMPARATOR AS ascii;
[default@demo] ASSUME users VALIDATOR AS ascii;
Setting an Expiring Column
When you set a column in Cassandra, you can optionally set an expiration time, or
time-to-live (TTL) attribute for it.
For example, suppose we are tracking coupon codes for our users that expire after 10 days. We can define a
coupon_code column and set an expiration date on that column. For example:
[default@demo] SET users['bobbyjo']
[utf8('coupon_code')] = utf8('SAVE20') WITH ttl=864000;
After ten days, or 864,000 seconds have elapsed since the setting of
this column, its value will be marked as deleted and no longer be
returned by read operations. Note, however, that the value is not
actually deleted from disk until normal Cassandra compaction processes
are completed.
Indexing a Column
The CLI can be used to create secondary indexes (indexes on column
values). You can add a secondary index when you create a column family
or add it later using the
UPDATE COLUMN FAMILY command.
For example, to add a secondary index to the
birth_year column of the
users column family:
[default@demo] UPDATE COLUMN FAMILY users
WITH comparator = UTF8Type
AND column_metadata = [{column_name: birth_year, validation_class: LongType, index_type: KEYS}];
Because of the secondary index created for the column
birth_year, its values can be queried directly for users born in a given year as follows:
[default@demo] GET users WHERE birth_date = 1969;
Deleting Rows and Columns
The Cassandra CLI provides the
DEL command to delete a row or column (or subcolumn).
For example, to delete the
coupon_code column for the
yomama row key in the
users column family:
[default@demo] DEL users ['yomama']['coupon_code'];
[default@demo] GET users ['yomama'];
Or to delete an entire row:
[default@demo] DEL users ['yomama'];
Dropping Column Families and Keyspaces
With Cassandra CLI commands you can drop column families and
keyspaces in much the same way that tables and databases are dropped in a
relational database. This example shows the commands to drop our
example
users column family and then drop the
demo keyspace altogether:
[default@demo] DROP COLUMN FAMILY users;
[default@demo] DROP KEYSPACE demo;
Getting Started with CQL
Developers can access CQL commands in a variety of ways. Drivers are
available for Python, Twisted Python, and JDBC-based client programs.
For the purposes of administrators, the most direct way to run simple CQL commands is via the Python-based
cqlsh command-line client.
Starting the CQL Command-Line Program (cqlsh)
As of Apache Cassandra version 1.0.5 and DataStax Community version 1.0.1, the
cqlsh client is installed with Cassandra in
<install_location>/bin/cqlsh for tarball installations, or
/usr/bin/cqlsh for packaged installations.
When you start
cqlsh, you must provide the IP of a Cassandra node to connect to (default is
localhost) and the RPC connection port (default is 9160). For example:
$ cqlsh 103.263.89.126 9160
cqlsh>
To exit
cqlsh type
exit at the command prompt.
Running CQL Commands with cqlsh
Commands in
cqlsh
combine SQL-like syntax that maps to Cassandra concepts and operations.
If you are just getting started with CQL, make sure to refer to the
CQL Reference.
As of CQL version 2.0,
cqlsh has the following limitations in support for Cassandra operations and data objects:
- Super Columns are not supported; column_type and subcomparator arguments are not valid
- Composite columns are not supported
- Only a subset of all the available column family storage properties can be set using CQL.
The rest of this section provides some guidance with simple CQL commands using
cqlsh. This is a similar (but not identical) set of commands as the set described in
Using the Cassandra Client.
Creating a Keyspace
You can use the
cqlsh
commands described in this section to create a keyspace. In creating
an example keyspace for Twissandra, we will assume a desired replication
factor of 3 and implementation of the NetworkTopologyStrategy replica
placement strategy. For more information on these keyspace options, see
About Replication in Cassandra.
Note the single quotes around the string value of
strategy_class:
cqlsh> CREATE KEYSPACE twissandra WITH
strategy_class = 'NetworkTopologyStrategy'
AND strategy_options:DC1 = 3;
Creating a Column Family
For this example, we use
cqlsh to create a
users column family in the newly created keyspace. Note the
USE command to connect to the twissandra keyspace.
cqlsh> USE twissandra;
cqlsh> CREATE COLUMNFAMILY users (
... KEY varchar PRIMARY KEY,
... password varchar,
... gender varchar,
... session_token varchar,
... state varchar,
... birth_year bigint);
Inserting and Retrieving Columns
Though in production scenarios it is more practical to insert columns and column values programmatically, it is possible to use
cqlsh for these operations. The example in this section illustrates using the
INSERT and
SELECT commands to insert and retrieve some columns in the
users column family.
The following commands create and then get a user record for
“jsmith.” The record includes a value for the password column we
created when we created the column family, as well as an expiration time
for the password column. Note that the user name “jsmith” is the row
key, or in CQL terms, the primary key.
cqlsh> INSERT INTO users (KEY, password) VALUES ('jsmith', 'ch@ngem3a') USING TTL 86400;
cqlsh> SELECT * FROM users WHERE KEY='jsmith';
u'jsmith' | u'password',u'ch@ngem3a' | u'ttl', 86400
Adding Columns with ALTER COLUMNFAMILY
The
ALTER COLUMNFAMILY command lets you add new columns to a column family. For example, to add a
coupon_code column with the
varchar validation type to the
users column family:
cqlsh> ALTER TABLE users ADD coupon_code varchar;
This creates the column metadata and adds the column to the column family schema, but does not update any existing rows.
Specifying Column Expiration with TTL
Both the
INSERT and
UPDATE commands support setting a column expiration time (TTL). In the
INSERT example above for the key
jsmith
we set the password column to expire at 86400 seconds, or one day. If
we wanted to extend the expiration period to five days, we could use the
UPDATE command a shown:
cqlsh> UPDATE users USING TTL 432000 SET 'password' = 'ch@ngem3a' WHERE KEY = 'jsmith';
Indexing a Column
cqlsh can be used to create secondary indexes, or indexes on column values.
In this example, we will create an index on the
state and
birth_year columns in the users column family.
cqlsh> CREATE INDEX state_key ON users (state);
cqlsh> CREATE INDEX birth_year_key ON users (birth_year);
Because of the secondary index created for the two columns, their values can be queried directly as follows:
cqlsh> SELECT * FROM users
... WHERE gender='f' AND
... state='TX' AND
... birth_year='1968';
u'user1' | u'birth_year',1968 | u'gender',u'f' | u'password',u'ch@ngem3' | u'state',u'TX'
Deleting Columns and Rows
cqlsh provides the
DELETE
command to delete a column or row. In this example we will delete user
jsmith’s session token column, and then delete jsmith’s row entirely.
cqlsh> DELETE session_token FROM users where KEY = 'jsmith';
cqlsh> DELETE FROM users where KEY = 'jsmith';
Note, however, that the phenomena called
“range ghosts” in Cassandra may mean that keys for deleted rows are still retrieved by
SELECT
statements and other “get” operations. Deleted values, including range
ghosts, are removed completely by the first compaction following
deletion.
Dropping Column Families and Keyspaces
With
cqlsh
commands you can drop column families and keyspaces in much the same
way that tables and databases are dropped in relational models. This
example shows the commands to drop our example
users column family and then drop the
twissandra keyspace altogether:
cqlsh> DROP COLUMNFAMILY users;
cqlsh> DROP KEYSPACE twissandra;