Thursday 31 May 2012

Using Cassandra in ubuntu


1. First upgrade your software as is with the following two commands (just for good measure):
sudo apt-get update
sudo apt-get upgrade
2. Now, open up your Debian package sources list with Nano for editing using the following command:
sudo nano /etc/apt/sources.list
3. Next, add the following sources to your /etc/apt/sources.list file.
deb http://www.apache.org/dist/incubator/cassandra/debian unstable main
deb-src http://www.apache.org/dist/incubator/cassandra/debian unstable main
After you add these two lines, press cntrl+X to close Nano. It’ll ask “Save modified buffer?” Press Y. Press Enter when Nano asks “File Name to Write.”
4. Run the update to install Casandra with this command:
sudo apt-get update
5. ERROR! At this point you receive an error similar to this:
W: GPG error: http://www.apache.org unstable Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY F758CE318D77295D
6. Use the following three commands to ignore the signature error, and continue installing:
NOTE: You must replace the key value ‘F758CE318D77295D’ with the key value you received in your error message.
gpg --keyserver wwwkeys.eu.pgp.net --recv-keys F758CE318D77295D
sudo apt-key add ~/.gnupg/pubring.gpg
sudo apt-get update
7. Install Cassandra:
sudo apt-get install cassandra
8. Next you need to change Cassandra’s default port number from 8080 to something else, because the 8080 port typically conflicts with SSH terminal connections. Use Nano to open up the Cassandra configuration file using the following command:
sudo nano /usr/share/cassandra/cassandra.in.sh
9. Then change the port number 8080 on the following line to 10036, and save the file:
-Dcom.sun.management.jmxremote.port=10036 \
10. Start Cassandra with the command:
/etc/init.d/cassandra start
Once you have Cassandra running, test it with Cassandra’s command line tool CLI.

Starting the CLI

You can start the CLI using the bin/cassandra-cli script in your Cassandra installation (bin\cassandra-cli.bat on windows). If you are evaluating a local cassandra node then be sure that it has been correctly configured and successfully started before starting the CLI.
If successful you will see output similar to this:
Welcome to cassandra CLI.

Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit.
You must then specify a system to connect to:
connect localhost/9160;

Creating a Keyspace

We first create a keyspace to run our examples in.
create keyspace Twissandra;

Selecting the keyspace to user

We must then select our example keyspace as our new context before we can run any queries.
use Twissandra;

To Create A Column

We can then create a column to play with.
create column family User with comparator = UTF8Type;
For the later examples to work you must also update the schema using the following command. This will set the return type for the first and last name to make them human readable. It will also add and index for the age field so that you filter your gets using the Users name field.
update column family User with
        column_metadata =
        [
        {column_name: first, validation_class: UTF8Type},
        {column_name: last, validation_class: UTF8Type},
        {column_name: age, validation_class: UTF8Type, index_type: KEYS}
        ];

To Add Data

To add data we want to into our new column we must first specify our default key type otherwise we would have to specify it for each key using the format [utf8('keyname')] this is probably advisable if you have mixed key types but makes simple cases harder to read.
So we run the command below, which will last the length of you cli session. On quitting and restarting we must run it again.
assume User keys as utf8;
and then we add our data.
set User['jsmith']['first'] = 'John';
set User['jsmith']['last'] = 'Smith';
set User['jsmith']['age'] = '38';
If you get the error like this cannot parse 'John' as hex bytes, then it likely you either haven't set your default key type or you haven't updated your schema as in the create column example.
The set command uses API#insert

To Update Data

If we need to update a value we simply set it again.
set User['jsmith']['first'] = 'Jack';

To Get Data

Now let's read back the jsmith row to see what it contains:
get User['jsmith'];
The get command uses API#get_slice

To Query Data

get User where age = '12';

For help

help;

To Quit

quit;

To Execute Script

bin/cassandra-cli -host localhost -port 9160 -f script.txt
 
 

Getting Started Using the Cassandra CLI

The Cassandra CLI client utility can be used to do basic data definition (DDL) and data manipulation (DML) within a Cassandra cluster. It is located in /usr/bin/cassandra-cli in packaged installations or <install_location>/bin/cassandra-cli in binary installations.
To start the CLI and connect to a particular Cassandra instance, launch the script together with -host and -port options. It will connect to the cluster name specified in the cassandra.yaml file (which is Test Cluster by default). For example, if you have a single-node cluster on localhost:
$ cassandra-cli -host localhost -port 9160
Or to connect to a node in a multi-node cluster, give the IP address of the node:
$ cassandra-cli -host 110.123.4.5 -port 9160
To see help on the various commands available:
[default@unknown] help;
For detailed help on a specific command, use help <command>;. For example:
[default@unknown] help SET;
Note
A command is not sent to the server unless it is terminated by a semicolon (;). Hitting the return key without a semicolon at the end of the line echos an ellipsis ( . . . ), which indicates that the CLI expects more input.

Creating a Keyspace

You can use the Cassandra CLI commands described in this section to create a keyspace. In this example, we create a keyspace called demo, with a replication factor of 1 and using the SimpleStrategy replica placement strategy.
Note the single quotes around the string value of placement_strategy:
[default@unknown] CREATE KEYSPACE demo
with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
and strategy_options = [{replication_factor:1}];
You can verify the creation of a keyspace with the SHOW KEYSPACES command. The new keyspace is listed along with the system keyspace and any other existing keyspaces.

Creating a Column Family

First, connect to the keyspace where you want to define the column family with the USE command.
[default@unknown] USE demo;
In this example, we create a users column family in the demo keyspace. In this column family we are defining a few columns; full_name, email, state, gender, and birth_year. This is considered a static column family - we are defining the column names up front and most rows are expected to have more-or-less the same columns.
Notice the settings of comparator, key_validation_class and validation_class. These are setting the default encoding used for column names, row key values and column values. In the case of column names, the comparator also determines the sort order.
[default@unknown] USE demo;

[default@demo] CREATE COLUMN FAMILY users
WITH comparator = UTF8Type
AND key_validation_class=UTF8Type
AND column_metadata = [
{column_name: full_name, validation_class: UTF8Type}
{column_name: email, validation_class: UTF8Type}
{column_name: state, validation_class: UTF8Type}
{column_name: gender, validation_class: UTF8Type}
{column_name: birth_year, validation_class: LongType}
];
Next, create a dynamic column family called blog_entry. Notice that here we do not specify column definitions as the column names are expected to be supplied later by the client application.
[default@demo] CREATE COLUMN FAMILY blog_entry
WITH comparator = TimeUUIDType
AND key_validation_class=UTF8Type
AND default_validation_class = UTF8Type;

Creating a Counter Column Family

A counter column family contains counter columns. A counter column is a specific kind of column whose user-visible value is a 64-bit signed integer that can be incremented (or decremented) by a client application. The counter column tracks the most recent value (or count) of all updates made to it. A counter column cannot be mixed in with regular columns of a column family, you must create a column family specifically to hold counters.
To create a column family that holds counter columns, set the default_validation_class of the column family to CounterColumnType. For example:
[default@demo] CREATE COLUMN FAMILY page_view_counts
WITH default_validation_class=CounterColumnType
AND key_validation_class=UTF8Type AND comparator=UTF8Type;
To insert a row and counter column into the column family (with the initial counter value set to 0):
[default@demo] INCR page_view_counts['www.datastax.com'][home] BY 0;
To increment the counter:
[default@demo] INCR page_view_counts['www.datastax.com'][home] BY 1;

Inserting Rows and Columns

The following examples illustrate using the SET command to insert columns for a particular row key into the users column family. In this example, the row key is bobbyjo and we are setting each of the columns for this user. Notice that you can only set one column at a time in a SET command.
[default@demo] SET users['bobbyjo']['full_name']='Robert Jones';

[default@demo] SET users['bobbyjo']['email']='bobjones@gmail.com';

[default@demo] SET users['bobbyjo']['state']='TX';

[default@demo] SET users['bobbyjo']['gender']='M';

[default@demo] SET users['bobbyjo']['birth_year']='1975';
In this example, the row key is yomama and we are just setting some of the columns for this user.
[default@demo] SET users['yomama']['full_name']='Cathy Smith';

[default@demo] SET users['yomama']['state']='CA';

[default@demo] SET users['yomama']['gender']='F';

[default@demo] SET users['yomama']['birth_year']='1969';
In this example, we are creating an entry in the blog_entry column family for row key yomama:
[default@demo] SET blog_entry['yomama'][timeuuid()] = 'I love my new shoes!';
Note
The Cassandra CLI uses a default consistency level of ONE for all write and read operations. Specifying different consistency levels is not supported within Cassandra CLI.

Reading Rows and Columns

Use the GET command within Cassandra CLI to retrieve a particular row from a column family. Use the LIST command to return a batch of rows and their associated columns (default limit of rows returned is 100).
For example, to return the first 100 rows (and all associated columns) from the users column family:
[default@demo] LIST users;
Cassandra stores all data internally as hex byte arrays by default. If you do not specify a default row key validation class, column comparator and column validation class when you define the column family, Cassandra CLI will expect input data for row keys, column names, and column values to be in hex format (and data will be returned in hex format).
To pass and return data in human-readable format, you can pass a value through an encoding function. Available encodings are:
  • ascii
  • bytes
  • integer (a generic variable-length integer type)
  • lexicalUUID
  • long
  • utf8
For example to return a particular row key and column in UTF8 format:
[default@demo] GET users[utf8('bobby')][utf8('full_name')];
You can also use the ASSUME command to specify the encoding in which column family data should be returned for the entire client session. For example, to return row keys, column names, and column values in ASCII-encoded format:
[default@demo] ASSUME users KEYS AS ascii;
[default@demo] ASSUME users COMPARATOR AS ascii;
[default@demo] ASSUME users VALIDATOR AS ascii;

Setting an Expiring Column

When you set a column in Cassandra, you can optionally set an expiration time, or time-to-live (TTL) attribute for it.
For example, suppose we are tracking coupon codes for our users that expire after 10 days. We can define a coupon_code column and set an expiration date on that column. For example:
[default@demo] SET users['bobbyjo']
[utf8('coupon_code')] = utf8('SAVE20') WITH ttl=864000;
After ten days, or 864,000 seconds have elapsed since the setting of this column, its value will be marked as deleted and no longer be returned by read operations. Note, however, that the value is not actually deleted from disk until normal Cassandra compaction processes are completed.

Indexing a Column

The CLI can be used to create secondary indexes (indexes on column values). You can add a secondary index when you create a column family or add it later using the UPDATE COLUMN FAMILY command.
For example, to add a secondary index to the birth_year column of the users column family:
[default@demo] UPDATE COLUMN FAMILY users
WITH comparator = UTF8Type
AND column_metadata = [{column_name: birth_year, validation_class: LongType, index_type: KEYS}];
Because of the secondary index created for the column birth_year, its values can be queried directly for users born in a given year as follows:
[default@demo] GET users WHERE birth_date = 1969;

Deleting Rows and Columns

The Cassandra CLI provides the DEL command to delete a row or column (or subcolumn).
For example, to delete the coupon_code column for the yomama row key in the users column family:
[default@demo] DEL users ['yomama']['coupon_code'];

[default@demo] GET users ['yomama'];
Or to delete an entire row:
[default@demo] DEL users ['yomama'];

Dropping Column Families and Keyspaces

With Cassandra CLI commands you can drop column families and keyspaces in much the same way that tables and databases are dropped in a relational database. This example shows the commands to drop our example users column family and then drop the demo keyspace altogether:
[default@demo] DROP COLUMN FAMILY users;

[default@demo] DROP KEYSPACE demo;

Getting Started with CQL

Developers can access CQL commands in a variety of ways. Drivers are available for Python, Twisted Python, and JDBC-based client programs.
For the purposes of administrators, the most direct way to run simple CQL commands is via the Python-based cqlsh command-line client.

Starting the CQL Command-Line Program (cqlsh)

As of Apache Cassandra version 1.0.5 and DataStax Community version 1.0.1, the cqlsh client is installed with Cassandra in <install_location>/bin/cqlsh for tarball installations, or /usr/bin/cqlsh for packaged installations.
When you start cqlsh, you must provide the IP of a Cassandra node to connect to (default is localhost) and the RPC connection port (default is 9160). For example:
$ cqlsh 103.263.89.126 9160
cqlsh>
To exit cqlsh type exit at the command prompt.
cqlsh> exit

Running CQL Commands with cqlsh

Commands in cqlsh combine SQL-like syntax that maps to Cassandra concepts and operations. If you are just getting started with CQL, make sure to refer to the CQL Reference.
As of CQL version 2.0, cqlsh has the following limitations in support for Cassandra operations and data objects:
  • Super Columns are not supported; column_type and subcomparator arguments are not valid
  • Composite columns are not supported
  • Only a subset of all the available column family storage properties can be set using CQL.
The rest of this section provides some guidance with simple CQL commands using cqlsh. This is a similar (but not identical) set of commands as the set described in Using the Cassandra Client.

Creating a Keyspace

You can use the cqlsh commands described in this section to create a keyspace. In creating an example keyspace for Twissandra, we will assume a desired replication factor of 3 and implementation of the NetworkTopologyStrategy replica placement strategy. For more information on these keyspace options, see About Replication in Cassandra.
Note the single quotes around the string value of strategy_class:
cqlsh> CREATE KEYSPACE twissandra WITH
       strategy_class = 'NetworkTopologyStrategy'
       AND strategy_options:DC1 = 3;

Creating a Column Family

For this example, we use cqlsh to create a users column family in the newly created keyspace. Note the USE command to connect to the twissandra keyspace.
cqlsh> USE twissandra;

cqlsh> CREATE COLUMNFAMILY users (
 ...  KEY varchar PRIMARY KEY,
 ...  password varchar,
 ...  gender varchar,
 ...  session_token varchar,
 ...  state varchar,
 ...  birth_year bigint);

Inserting and Retrieving Columns

Though in production scenarios it is more practical to insert columns and column values programmatically, it is possible to use cqlsh for these operations. The example in this section illustrates using the INSERT and SELECT commands to insert and retrieve some columns in the users column family.
The following commands create and then get a user record for “jsmith.” The record includes a value for the password column we created when we created the column family, as well as an expiration time for the password column. Note that the user name “jsmith” is the row key, or in CQL terms, the primary key.
cqlsh> INSERT INTO users (KEY, password) VALUES ('jsmith', 'ch@ngem3a') USING TTL 86400;
cqlsh> SELECT * FROM users WHERE KEY='jsmith';
u'jsmith' | u'password',u'ch@ngem3a' | u'ttl', 86400

Adding Columns with ALTER COLUMNFAMILY

The ALTER COLUMNFAMILY command lets you add new columns to a column family. For example, to add a coupon_code column with the varchar validation type to the users column family:
cqlsh> ALTER TABLE users ADD coupon_code varchar;
This creates the column metadata and adds the column to the column family schema, but does not update any existing rows.

Altering Column Metadata

With ALTER COLUMNFAMILY, you can change the type of a column any time after it is defined or added to a column family. For example, if we decided the coupon_code column should store coupon codes in the form of integers, we could change the validation type as follows:
cqlsh> ALTER TABLE users ALTER coupon_code TYPE int;
Note that existing coupon codes will not be validated against the new type, only newly inserted values.

Specifying Column Expiration with TTL

Both the INSERT and UPDATE commands support setting a column expiration time (TTL). In the INSERT example above for the key jsmith we set the password column to expire at 86400 seconds, or one day. If we wanted to extend the expiration period to five days, we could use the UPDATE command a shown:
cqlsh> UPDATE users USING TTL 432000 SET 'password' = 'ch@ngem3a' WHERE KEY = 'jsmith';

Dropping Column Metadata

If your aim is to remove a column’s metadata entirely, including the column name and validation type, you can use ALTER TABLE <columnFamily> DROP <column>. The following command removes the name and validator without affecting or deleting any existing data:
cqlsh> ALTER TABLE users DROP coupon_code;
After you run this command, clients can still add new columns named coupon_code to the users column family – but they will not be validated until you explicitly add a type again.

Indexing a Column

cqlsh can be used to create secondary indexes, or indexes on column values. In this example, we will create an index on the state and birth_year columns in the users column family.
cqlsh> CREATE INDEX state_key ON users (state);
cqlsh> CREATE INDEX birth_year_key ON users (birth_year);
Because of the secondary index created for the two columns, their values can be queried directly as follows:
cqlsh> SELECT * FROM users
 ... WHERE gender='f' AND
 ...  state='TX' AND
...  birth_year='1968';
u'user1' | u'birth_year',1968 | u'gender',u'f' | u'password',u'ch@ngem3' | u'state',u'TX'

Deleting Columns and Rows

cqlsh provides the DELETE command to delete a column or row. In this example we will delete user jsmith’s session token column, and then delete jsmith’s row entirely.
cqlsh> DELETE session_token FROM users where KEY = 'jsmith';
cqlsh> DELETE FROM users where KEY = 'jsmith';
Note, however, that the phenomena called “range ghosts” in Cassandra may mean that keys for deleted rows are still retrieved by SELECT statements and other “get” operations. Deleted values, including range ghosts, are removed completely by the first compaction following deletion.

Dropping Column Families and Keyspaces

With cqlsh commands you can drop column families and keyspaces in much the same way that tables and databases are dropped in relational models. This example shows the commands to drop our example users column family and then drop the twissandra keyspace altogether:
cqlsh> DROP COLUMNFAMILY users;
cqlsh> DROP KEYSPACE twissandra;
 

No comments:

Post a Comment