Sqoop Import Examples:
Import the data (MySQL table) to HBase:
$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table tableName --hbase-table hbase_tableName --column-family hbase_table_col1 --hbase-create-table
$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table tableName --hbase-table hbase_tableName --columns column1,column2 --column-family hbase_table_col1 --hbase-create-table
$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table tableName --hbase-table hbase_tableName --column-family hbase_table_col1 --hbase-row-key column1 --hbase-create-table
$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table tableName --hbase-table hbase_tableName --columns column1,column2 --column-family hbase_table_col --hbase-row-key column1 --hbase-create-table
Records of test_table:
________________
name address
----------------
abc 123
sqw 345
abc 125
sdf 1234
aql 23dw
$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table test_table --hbase-table hbase_test_table --column-family test_table_col1 --hbase-row-key name --hbase-create-table
Case 1: Import MySQL table into Hive if table have primary key.
bin/sqoop-import --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --hive-table tableName --create-hive-table --hive-import --hive-home path/to/hive_home
Case 2: Import MySQL table into Hive if table doesn't have primary key.
$ bin/sqoop-import --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --hive-table tableName --create-hive-table --hive-import --hive-home path/to/hive_home --split-by column_name
or
$ bin/sqoop-import --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --hive-table tableName --create-hive-table --hive-import --hive-home path/to/hive_home -m 1
Case 1: Import MySQL table into HDFS if table have primary key.
$ bin/sqoop import -connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --target-dir /user/me/tableName
Case 2: Import MySQL table into HDFS if table doesn't have primary key.
$ bin/sqoop import -connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --target-dir /user/me/tableName -m 1
Sqoop Export Examples:
Export Hive table back to an RDBMS:
$ bin/sqoop export --connect jdbc:mysql://localhost/test_db --table tableName --export-dir /user/hive/warehouse/tableName --username root --password password -m 1 --input-fields-terminated-by '\001'
where '\001' is octal representation of ^A.
Sqoop Import :- Import data from a
relational database management system (RDBMS) such as MySQL or Oracle
into the Hadoop Distributed File System (HDFS) and its subprojects (Hive, HBase).
Import the data (MySQL table) to HBase:
Case 1: If table have primary key and import all the column of MySQL table into HBase table.
$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table tableName --hbase-table hbase_tableName --column-family hbase_table_col1 --hbase-create-table
Case 2: If table have primary key and import only few columns of MySQL table into HBase table.
$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table tableName --hbase-table hbase_tableName --columns column1,column2 --column-family hbase_table_col1 --hbase-create-table
Note : Column names specified in --columns attribute must contain the primary key column.
Case 3: If table doesn't have primary key then choose one column
as a hbase-row-key. Import all the column of MySQL table into HBase
table.
$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table tableName --hbase-table hbase_tableName --column-family hbase_table_col1 --hbase-row-key column1 --hbase-create-table
Case 4: If table doesn't have primary key then choose one column
as a hbase-row-key. Import only few columns of MySQL table into HBase
table.
$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table tableName --hbase-table hbase_tableName --columns column1,column2 --column-family hbase_table_col --hbase-row-key column1 --hbase-create-table
Note: Column name specified in hbase-row-key atribute must be in columns
list. Otherwise command will execute successfully but no records are
inserted into hbase.
Note : The value of primary key column or column specified in
--hbase-row-key attribute become the HBase row value. If MySQL table
doesn't have primary key or column specified in --hbase-row-key
attribute doesn't have unique value then there is a lost of few records.
Example : Let us consider a MySQL table test_table which have two
columns name,address. The table test_table doesn't have primary key or
unique key column.
Records of test_table:
________________
name address
----------------
abc 123
sqw 345
abc 125
sdf 1234
aql 23dw
Run the following command to import test_table data into HBase:
$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table test_table --hbase-table hbase_test_table --column-family test_table_col1 --hbase-row-key name --hbase-create-table
Only 4 records are visible into HBase table instead of 5. In above
example two rows have same value 'abc' of name column and value of this
column is used as a HBase row key value. If record having value 'abc' of
name column come then thoes record will inserted into HBase table. Next
time, another record having the same value 'abc' of name column come
then thoes column will overwrite the value previous column.
Above problem also occured if table have composite primary key because
the one column from composite key is used as a HBase row key.
Import the data (MySQL table) to Hive
Case 1: Import MySQL table into Hive if table have primary key.
bin/sqoop-import --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --hive-table tableName --create-hive-table --hive-import --hive-home path/to/hive_home
Case 2: Import MySQL table into Hive if table doesn't have primary key.
$ bin/sqoop-import --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --hive-table tableName --create-hive-table --hive-import --hive-home path/to/hive_home --split-by column_name
or
$ bin/sqoop-import --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --hive-table tableName --create-hive-table --hive-import --hive-home path/to/hive_home -m 1
Import the data (MySQL table) to HDFS
Case 1: Import MySQL table into HDFS if table have primary key.
$ bin/sqoop import -connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --target-dir /user/me/tableName
Case 2: Import MySQL table into HDFS if table doesn't have primary key.
$ bin/sqoop import -connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --target-dir /user/me/tableName -m 1
Sqoop Export Examples:
Sqoop Export: export the HDFS and its subproject (Hive, HBase) data back
into an RDBMS.
By default, Hive will stored data using ^A as a field delimiter and \n as a row delimiter.
$ bin/sqoop export --connect jdbc:mysql://localhost/test_db --table tableName --export-dir /user/hive/warehouse/tableName --username root --password password -m 1 --input-fields-terminated-by '\001'
where '\001' is octal representation of ^A.
[root@slave ~]# cd /usr/local
ReplyDelete[root@slave local]# tar -xzf sqoop-1.4.1-incubating__hadoop-1.0.0.tar.gz
[root@slave local]# cd sqoop-1.4.1-incubating__hadoop-1.0.0
[root@slave sqoop-1.4.1-incubating__hadoop-1.0.0]# ls
bin CHANGES.txt conf docs ivy.xml LICENSE.txt pom-old.xml sqoop-1.4.1-incubating.jar src
build.xml COMPILING.txt DISCLAIMER.txt ivy lib NOTICE.txt README.txt sqoop-test-1.4.1-incubating.jar testdata
[root@slave sqoop-1.4.1-incubating__hadoop-1.0.0]# cd
[root@slave ~]# export SQOOP_HOME=/usr/local/sqoop-1.4.1-incubating__hadoop-1.0.0
[root@slave ~]# export PATH=$PATH:$SQOOP_HOME/bin
[root@slave ~]# cd /usr/local
[root@slave local]# tar -xzf mysql-connector-java-5.1.21.tar.gz
[root@slave local]# sqoop help
Error: /usr/lib/hadoop does not exist!
Please set $HADOOP_HOME to the root of your Hadoop installation.
[root@slave local]# echo $HADOOP_HOME
[root@slave local]# export HADOOP_HOME=/usr/local/hadoop/
[root@slave local]# sqoop help
[root@slave local]# cd sqoop-1.4.1-incubating__hadoop-1.0.0
[root@slave sqoop-1.4.1-incubating__hadoop-1.0.0]# ls
bin CHANGES.txt conf docs ivy.xml LICENSE.txt pom-old.xml sqoop-1.4.1-incubating.jar src
build.xml COMPILING.txt DISCLAIMER.txt ivy lib NOTICE.txt README.txt sqoop-test-1.4.1-incubating.jar testdata
[root@slave sqoop-1.4.1-incubating__hadoop-1.0.0]# cd
[root@slave ~]# export SQOOP_HOME=/usr/local/sqoop-1.4.1-incubating__hadoop-1.0.0
[root@slave ~]# export PATH=$PATH:$SQOOP_HOME/bin
[root@slave ~]# cd /usr/local
[root@slave local]# tar -xzf mysql-connector-java-5.1.21.tar.gz
[root@slave local]# ls
bin hadoop-1.0.3.tar.gz lib sbin
etc hbase lib64 share
games hbase-0.92.1 libexec sqoop-1.4.1-incubating__hadoop-1.0.0
hadoop hbase-0.92.1.tar.gz mysql-connector-java-5.1.21 sqoop-1.4.1-incubating__hadoop-1.0.0.tar.gz
hadoop1 include mysql-connector-java-5.1.21.tar.gz src
[root@slave local]# sqoop help
Error: /usr/lib/hadoop does not exist!
Please set $HADOOP_HOME to the root of your Hadoop installation.
[root@slave local]# echo $HADOOP_HOME
[root@slave local]# export HADOOP_HOME=/usr/local/hadoop/
[root@slave local]# sqoop help
[root@slave local]# cd sqoop-1.4.1-incubating__hadoop-1.0.0
[root@slave sqoop-1.4.1-incubating__hadoop-1.0.0]# ls
bin CHANGES.txt conf docs ivy.xml LICENSE.txt pom-old.xml sqoop-1.4.1-incubating.jar src
build.xml COMPILING.txt DISCLAIMER.txt ivy lib NOTICE.txt README.txt sqoop-test-1.4.1-incubating.jar testdata
[root@slave sqoop-1.4.1-incubating__hadoop-1.0.0]# cd
[root@slave ~]# export SQOOP_HOME=/usr/local/sqoop-1.4.1-incubating__hadoop-1.0.0
[root@slave ~]# export PATH=$PATH:$SQOOP_HOME/bin
[root@slave ~]# cd /usr/local
[root@slave local]# tar -xzf mysql-connector-java-5.1.21.tar.gz
[root@slave local]# ls
bin hadoop-1.0.3.tar.gz lib sbin
etc hbase lib64 share
games hbase-0.92.1 libexec sqoop-1.4.1-incubating__hadoop-1.0.0
hadoop hbase-0.92.1.tar.gz mysql-connector-java-5.1.21 sqoop-1.4.1-incubating__hadoop-1.0.0.tar.gz
hadoop1 include mysql-connector-java-5.1.21.tar.gz src
[root@slave local]# sqoop help
Error: /usr/lib/hadoop does not exist!
Please set $HADOOP_HOME to the root of your Hadoop installation.
[root@slave local]# echo $HADOOP_HOME
[root@slave local]# export HADOOP_HOME=/usr/local/hadoop/
[root@slave local]# sqoop help
[root@master local]# cd /usr/local
Delete[root@master local]# tar -xzf sqoop-1.4.1-incubating__hadoop-1.0.0.tar.gz
[root@master local]# export HADOOP_HOME=/usr/local/hadoop/
[root@master local]# export SQOOP_HOME=/usr/local/sqoop-1.4.1-incubating__hadoop-1.0.0
[root@master local]# export PATH=$PATH:$SQOOP_HOME/bin
[root@master local]# sqoop help
[root@master lib]# export MSSQL_CONNECTOR_HOME=/usr/local/sqoop-1.4.1-incubating__hadoop-1.0.0/connector/mssql
ReplyDelete[root@master lib]# cd /usr/local/sqoop-1.4.1-incubating__hadoop-1.0.0/connector/mssql
[root@master mssql]# sh ./install.sh
Starting 'Microsoft SQL Server - Hadoop' Connector installation ...
SQOOP_CONF_DIR is not specified. Using $SQOOP_HOME/conf as SQOOP_CONF_DIR
SQOOP_HOME set to: /usr/local/sqoop-1.4.1-incubating__hadoop-1.0.0
SQOOP_CONF_DIR set to: /usr/local/sqoop-1.4.1-incubating__hadoop-1.0.0/conf
Installing 'SQL Server - Hadoop' Connector library ...
Installed library
Installing 'SQL Server - Hadoop' Connector configuration ...
Installed configuration
'SQL Server - Hadoop' Connector Installation completed successfully.
Sqoop and Microsoft SQL Server
ReplyDeleteFrom Microsoft's technet:
With SQL Server-Hadoop Connector [1], you import data from:
Tables in SQL Server to delimited text files on HDFS
Tables in SQL Server to SequenceFiles files on HDFS
Tables in SQL Server to tables in Hive*
Queries executed in SQL Server to delimited text files on HDFS
Queries executed in SQL Server to SequenceFiles files on HDFS
Queries executed in SQL Server to tables in Hive*
With SQL Server-Hadoop Connector, you can export data from:
Delimited text files on HDFS to SQL Server
SequenceFiles on HDFS to SQL Server
Hive Tables* to tables in SQL Server
But before it works you have to setup the connector. First get the MS JDBC driver [2]:
You have just to download the driver, unpack them and copy the driver (sqljdbc4.jar) file to the $SQOOP_HOME/lib/ directory. Now download the connector (.tar.gz) from [1], unpack them and set the MSSQL_CONNECTOR_HOME into that directory. Let's assume you unpack into /usr/sqoop/connector/mssql, do:
# export MSSQL_CONNECTOR_HOME=/usr/sqoop/connector/mssql
control the export:
# echo $MSSQL_CONNECTOR_HOME
/usr/sqoop/connector/mssql
and run the install.sh in the unpacked directory.
sh ./install.sh
Sqoop import with where clause
ReplyDeleteIf you are following from previous sqoop import tutorial http://jugnu-life.blogspot.in/2012/03/sqoop-import-tutorial.html then , lets try to do conditional import from RDBMS in sqoop
$ sqoop import --connect jdbc:mysql://localhost/CompanyDatabase --table Customer --username root --password mysecret -m 1
The sqoop command above would import all the rows present in the table Customer.
Let say that customer table is something like this
CustomerName
DateOfJoining
Adam
2012-12-12
John
2002-1-3
Emma
2011-1-3
Tina
2009-3-8
Now lets say we want to import only those customers which are joining after 2005-1-1
We can modify the sqoop import as
$ sqoop import --connect jdbc:mysql://localhost/CompanyDatabase --table Customer --username root --password mysecret --where "DateOfJoining > '2005-1-1' "
This would import only 3 records from above table.
We are facing problem while exporting data from Cassandra to MySql database. For example,
ReplyDelete1. I have created keyspace called test_ks and column family called test_cf in cassandra using cql.
I have inserted some dummy data into test_cf and corresponding files are created in the folder /var/lib/cassandra/data/test_ks/test_cf as shown below.
[root@balnsand01 test_cf]# pwd
/var/lib/cassandra/data/test_ks/test_cf
[root@balnsand01 test_cf]# ls -lrt
total 24
-rw-r--r-- 1 root root 4340 Sep 19 03:39 test_ks-test_cf-hf-1-Statistics.db
-rw-r--r-- 1 root root 22 Sep 19 03:39 test_ks-test_cf-hf-1-Index.db
-rw-r--r-- 1 root root 16 Sep 19 03:39 test_ks-test_cf-hf-1-Filter.db
-rw-r--r-- 1 root root 89 Sep 19 03:39 test_ks-test_cf-hf-1-Data.db
-rw-r--r-- 1 root root 46 Sep 19 03:39 test_ks-test_cf-hf-1-CompressionInfo.db
2. When i am trying to export this data into MySql using below command, I see data is getting exported in non-readable format. I mean i am not able to see data which i have inserted in cassandra.
./dse sqoop export --connect jdbc:mysql://127.0.0.1:3306/testdb --username testuser --password mysql123 --export-dir /var/lib/cassandra/data/test_ks/test_cf --table --columns 'col1,col2' --input-fields-terminated-by '\t';
where, --export-dir /var/lib/cassandra/data/test_ks/test_cf is the path where data files gets created.
Could you please guide me where exactly I am doing wrong here.
Thanks in advance, Santosh