Thursday 31 May 2012

Install cassandra on ubuntu

Installing the JRE on Debian or Ubuntu Systems

The Oracle Java Runtime Environment (JRE) has been removed from the official software repositories of Ubuntu and only provides a binary (.bin) version. You can get the JRE from the Java SE Downloads.
  1. Download the appropriate version of the JRE, such as jre-6u31-linux-i586.bin, for your system and unpack it directly under /opt/java/<32 or 64>.
  2. Make the file executable:
    sudo chmod 755 /opt/java/32/jre-6u31-linux-i586.bin
    
  3. Go to the new folder:
    cd /opt/java
    
  4. Execute the file:
    sudo ./jre-6u31-linux-i586.bin
    
  5. If needed, accept the license terms to continue installing the JRE.
  6. Tell the system that there’s a new Java version available:
    sudo update-alternatives --install "/usr/bin/java" "java" "/opt/java/32/jre1.6.0_31/bin/java" 1
    
Note
If updating from a previous version that was removed manually, execute the above command twice, because you’ll get an error message the first time.
  1. Set the new JRE as the default:
    sudo update-alternatives --set java /opt/java/32/jre1.6.0_31/bin/java
    
  2. Make sure your system is now using the correct JRE:
$ sudo java -version

java version "1.6.0_31"
Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b04, mixed mode)


Installing Cassandra Debian Packages

DataStax provides a debian package repository for Apache Cassandra.
These instructions assume that you have the aptitude package management application installed, and that you have root access on the machine where you are installing.
Note
By downloading community software from DataStax you agree to the terms of the DataStax Community EULA (End User License Agreement) posted on the DataStax web site.
  1. Edit the aptitude repository source list file (/etc/apt/sources.list).
    $ sudo vi /etc/apt/sources.list
    
  2. In this file, add the DataStax Community repository.
    deb http://debian.datastax.com/community stable main
    
  3. (Debian Systems Only) Find the line that describes your source repository for Debian and add contrib non-free to the end of the line. This allows installation of the Oracle JVM instead of the OpenJDK JVM. For example:
    deb http://some.debian.mirror/debian/ $distro main contrib non-free
    
Save and close the file when you are done adding/editing your sources.
  1. Add the DataStax repository key to your aptitude trusted keys.
    $ curl -L http://debian.datastax.com/debian/repo_key | sudo apt-key add -
    
  2. If needed, install the Python CQL driver.
    sudo apt-get install python-cql=1.0.10-1
    
  3. Install the package.
    $ sudo apt-get update
    $ sudo apt-get install cassandra=1.0.10 dsc
    
    This installs the Cassandra, DataStax Community demos, and OpsCenter packages. By default, the Debian packages start the Cassandra service automatically.
  4. To stop the service and clear the initial gossip history that gets populated by this initial start:
    $ sudo service cassandra stop
    $ sudo bash -c 'rm /var/lib/cassandra/data/system/*'
    

Configuring and Starting a Cassandra Cluster

The process for initializing a Cassandra cluster (be it a single node, multiple node, or multiple data center cluster) is to first correctly configure the Node and Cluster Initialization Properties in each node’s cassandra.yaml configuration file, and then start each node individually starting with the seed node(s).
For more guidance on choosing the right configuration properties for your needs, see Choosing Node Configuration Options.

Initializing a Multi-Node or Multi-Data Center Cluster

To correctly configure a multi-node or multi-data center cluster, you must determine the following information:
  • A name for your cluster.
  • How many total nodes your cluster will have, and how many nodes per data center (or replication group).
  • The IP addresses of each node.
  • The token for each node (see Calculating Tokens).
    If you are deploying a multi-data center cluster, make sure to assign tokens so that data is evenly distributed within each data center or replication grouping (see Calculating Tokens for a Multi-Data Center Cluster).
  • Which nodes will serve as the seed nodes.
    If you are deploying a multi-data center cluster, the seed list (a comma-delimited list of addresses) should include a node from each data center or replication group. Cassandra nodes use this host list to find each other and learn the topology of the ring.
  • The snitch you plan to use.
This information is used to configure the Node and Cluster Initialization Properties in the cassandra.yaml configuration file on each node in the cluster. Each node should be correctly configured before starting up the cluster, one node at a time (starting with the seed nodes).
For example, suppose you are configuring a 6 node cluster spanning 2 racks in a single data center. The nodes have the following IPs, and one node per rack will serve as a seed:
  • node0 110.82.155.0 (seed1)
  • node1 110.82.155.1
  • node2 110.82.155.2
  • node3 110.82.156.3 (seed2)
  • node4 110.82.156.4
  • node5 110.82.156.5
The cassandra.yaml files for each node would then have the following modified property settings.
node0
cluster_name: 'MyDemoCluster'
initial_token: 0
seed_provider:
      - class_name: org.apache.cassandra.locator.SimpleSeedProvider
        parameters:
            - seeds: "110.82.155.0,110.82.155.3"
listen_address: 110.82.155.0
rpc_address: 0.0.0.0
endpoint_snitch: RackInferringSnitch
node1
cluster_name: 'MyDemoCluster'
initial_token: 28356863910078205288614550619314017621
seed_provider:
      - class_name: org.apache.cassandra.locator.SimpleSeedProvider
        parameters:
            - seeds: "110.82.155.0,110.82.155.3"
listen_address: 110.82.155.1
rpc_address: 0.0.0.0
endpoint_snitch: RackInferringSnitch
node2
cluster_name: 'MyDemoCluster'
initial_token: 56713727820156410577229101238628035242
seed_provider:
      - class_name: org.apache.cassandra.locator.SimpleSeedProvider
        parameters:
            - seeds: "110.82.155.0,110.82.155.3"
listen_address: 110.82.155.2
rpc_address: 0.0.0.0
endpoint_snitch: RackInferringSnitch
node3
cluster_name: 'MyDemoCluster'
initial_token: 85070591730234615865843651857942052864
seed_provider:
      - class_name: org.apache.cassandra.locator.SimpleSeedProvider
        parameters:
            - seeds: "110.82.155.0,110.82.155.3"
listen_address: 110.82.155.3
rpc_address: 0.0.0.0
endpoint_snitch: RackInferringSnitch
node4
cluster_name: 'MyDemoCluster'
initial_token: 113427455640312821154458202477256070485
seed_provider:
      - class_name: org.apache.cassandra.locator.SimpleSeedProvider
        parameters:
            - seeds: "110.82.155.0,110.82.155.3"
listen_address: 110.82.155.4
rpc_address: 0.0.0.0
endpoint_snitch: RackInferringSnitch
node5
cluster_name: 'MyDemoCluster'
initial_token: 141784319550391026443072753096570088106
seed_provider:
      - class_name: org.apache.cassandra.locator.SimpleSeedProvider
        parameters:
            - seeds: "110.82.155.0,110.82.155.3"
listen_address: 110.82.155.5
rpc_address: 0.0.0.0
endpoint_snitch: RackInferringSnitch

Calculating Tokens

Tokens are used to assign a range of data to a particular node. Assuming you are using the RandomPartitioner (the default partitioner), the approaches described in this section will ensure even data distribution.
Each node in the cluster should be assigned a token before it is started for the first time. The token is set with the initial_token property in the cassandra.yaml configuration file.

Calculating Tokens for Multiple Racks

If you have multiple racks in single data center or a multiple data center cluster, you can use the same formula for calculating the tokens. However you should assign the tokens to nodes in alternating racks. For example: rack1, rack2, rack3, rack1, rack2, rack3, and so on. Be sure to have the same number of nodes in each rack.
../../_images/multirack_tokens.png

Calculating Tokens for a Single Data Center

  1. Create a new file for your token generator program:
    vi tokengentool
    
  2. Paste the following Python program into this file:
    #! /usr/bin/python
    import sys
    if (len(sys.argv) > 1):
        num=int(sys.argv[1])
    else:
        num=int(raw_input("How many nodes are in your cluster? "))
    for i in range(0, num):
        print 'token %d: %d' % (i, (i*(2**127)/num))
    
  3. Save and close the file and make it executable:
    chmod +x tokengentool
    
  4. Run the script:
    ./tokengentool
    
  5. When prompted, enter the total number of nodes in your cluster:
    How many nodes are in your cluster? 6
    token 0: 0
    token 1: 28356863910078205288614550619314017621
    token 2: 56713727820156410577229101238628035242
    token 3: 85070591730234615865843651857942052864
    token 4: 113427455640312821154458202477256070485
    token 5: 141784319550391026443072753096570088106
    
  6. On each node, edit the cassandra.yaml file and enter its corresponding token value in the initial_token property.

Calculating Tokens for a Multi-Data Center Cluster

In multi-data center deployments, replica placement is calculated per data center using the NetworkTopologyStrategy replica placement strategy. In each data center (or replication group) the first replica for a particular row is determined by the token value assigned to a node. Additional replicas in the same data center are placed by walking the ring clockwise until it reaches the first node in another rack.
If you do not calculate partitioner tokens so that the data ranges are evenly distributed for each data center, you could end up with uneven data distribution within a data center. The goal is to ensure that the nodes for each data center are evenly dispersed around the ring, or to calculate tokens for each replication group individually (without conflicting token assignments).
One way to avoid uneven distribution is to calculate tokens for all nodes in the cluster, and then alternate the token assignments so that the nodes for each data center are evenly dispersed around the ring.
../../_images/multidc_alternate_tokens.png Another way to assign tokens in a multi data center cluster is to generate tokens for the nodes in one data center, and then offset those token numbers by 1 for all nodes in the next data center, by 2 for the nodes in the next data center, and so on. This approach is good if you are adding a data center to an established cluster, or if your data centers do not have the same number of nodes.
../../_images/multidc_tokens_offset.png

Starting and Stopping a Cassandra Node

After you have installed and configured Cassandra on all nodes, you are ready to start your cluster. On initial start-up, each node must be started one at a time, starting with your seed nodes.
Packaged installations include startup scripts for running Cassandra as a service. Binary packages do not.

Starting/Stopping Cassandra as a Stand-Alone Process

You can start the Cassandra Java server process as follows:
$ cd <install_location>
$ sh bin/cassandra -f
To stop the Cassandra process, find the Cassandra Java process ID (PID), and then kill -9 the process using its PID number. For example:
$ ps ax | grep java
$ kill -9 1539

Starting/Stopping Cassandra as a Service

Packaged installations provide startup scripts in /etc/init.d for starting Cassandra as a service. The service runs as the cassandra user. You must have root or sudo permissions to start or stop services.
To start the Cassandra service (as root):
# service cassandra start
To stop the Cassandra service (as root):
# service cassandra stop
Note
On Enterprise Linux systems, the Cassandra service runs as a java process. On Debian systems, the Cassandra service runs as a jsvc process.


# enable add-apt-repository
sudo apt-get install python-software-properties
# add repository for java
sudo add-apt-repository ppa:ferramroberto/java
# update
sudo apt-get update
# install Sun (I hate Oracle) java
sudo apt-get install sun-java6-jdk sun-java6-plugin
# create directory for installation
sudo mkdir /opt/cassandra
# add cassandra user [set password]
sudo adduser cassandra
# change owner of istallation directory
sudo chown cassandra:cassandra /opt/cassandra/
# switch to cassandra user
su -l cassandra
# go to installation directory
cd /opt/cassandra
# download latest version (check address on cassandra.apache.org)
wget http://www.apache.net.pl//cassandra/1.0.7/apache-cassandra-1.0.7-bin.tar.gz
# untar
tar xvzf apache-cassandra-1.0.7-bin.tar.gz
# back to admin account, create cassandra var directory
logout
sudo mkdir /var/lib/cassandra/
sudo chown cassandra:cassandra /var/lib/cassandra/
sudo mkdir /var/log/cassandra/
sudo chown cassandra:cassandra /var/log/cassandra/
# switch again to cassandra user
su -l cassandra
mkdir /var/lib/cassandra/data
mkdir /var/lib/cassandra/commitlog
mkdir /var/lib/cassandra/saved_caches



To install Cassandra on Debian or other Debian derivatives like Ubuntu, LinuxMint…, use the following:
1- First upgrade your software :
sudo apt-get upgrade
2- Now open sources.list
sudo vi /etc/apt/sources.list
3- add the following lines to your source.list
deb http://www.apache.org/dist/cassandra/debian unstable main
deb-src http://www.apache.org/dist/cassandra/debian unstable main
4- Run update
sudo apt-get update 
Now you will see an error similar to this:
GPG error: http://www.apache.org unstable Release: The following signatures couldn’t be verified because the public key is not available: NO_PUBKEY F758CE318D77295D
This simply means you need to add the PUBLIC_KEY. You do that like this:
gpg –keyserver wwwkeys.eu.pgp.net –recv-keys F758CE318D77295D
gpg –export –armor F758CE318D77295D | sudo apt-key add -
5- Run update again and install cassandra
sudo apt-get update && sudo apt-get install cassandra
6- Now start Cassandra :
sudo /etc/init.d/cassandra start

2 comments:

  1. How would you illuminate in the event that you are not ready to Edit Cassandra User Password? Contact to Cassandra Technical Support
    On the off chance that you found any kind of many-sided quality or issue with respect to Cassandra Password like: overlook secret key, unfit to alter the client watchword et cetera then we at Cognegic give Apache Cassandra Support or Cassandra Customer Service for our clients. Here we give world's best help to our clients and raise them hell free. Get in touch with us for getting Cassandra Database Consulting and Support. You can whenever get in touch with us by dialing this underneath number and bring out the best help for your everything issues.

    #cassandradatabasesupport #cassandracustomersservice #apachecassandrasupport
    #cassandratechnicalsupport #cassandradatabasesupport #apachecassandracommercialsupport
    #CassandraDatabaseConsulting #cassandracustomersserviceUSA

    For More Info: https://cognegicsystems.com/
    Contact Number: 1-800-450-8670
    Email Address- info@cognegicsystems.com
    Company’s Address- 507 Copper Square Drive Bethel Connecticut (USA) 06801

    ReplyDelete
  2. Baffling with Data Replication Issue in Cassandra? Contact to Cassandra Customer Service
    On the off chance that you need to supervise you information replication issue in Cassandra, our submitted pros that run our controlled affiliations gives you 24*7 Cassandra Database Support or Apache Cassandra Support. From starting to satisfaction, which means to plot we give uncommon procedure which you never found in some other help collusion. As necessities seem to be, interface with us by systems for dialing our toll number and get most proper help through our gifted specialists.
    For More Info: https://cognegicsystems.com/
    Contact Number: 1-800-450-8670
    Email Address- info@cognegicsystems.com
    Company’s Address- 507 Copper Square Drive Bethel Connecticut (USA) 06801

    ReplyDelete