Sqoop Import :- Import data from a
relational database management system (RDBMS) such as MySQL or Oracle
into the Hadoop Distributed File System (HDFS) and its subprojects (Hive, HBase).
Import the data (MySQL table) to HBase:
Case 1: If table have primary key and import all the column of MySQL table into HBase table. 
$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table tableName --hbase-table hbase_tableName --column-family hbase_table_col1 --hbase-create-table
Case 2: If table have primary key and import only few columns of MySQL table into HBase table.   
$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table tableName --hbase-table hbase_tableName --columns column1,column2 --column-family hbase_table_col1 --hbase-create-table
Note : Column names specified in --columns attribute must contain the primary key column. 
Case 3: If table doesn't have primary key then choose one column 
as a hbase-row-key. Import all the column of MySQL table into HBase 
table. 
$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table tableName --hbase-table hbase_tableName --column-family hbase_table_col1 --hbase-row-key column1 --hbase-create-table
Case 4: If table doesn't have primary key then choose one column 
as a hbase-row-key. Import only few columns of MySQL table into HBase 
table.
$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table tableName --hbase-table hbase_tableName --columns column1,column2 --column-family hbase_table_col --hbase-row-key column1 --hbase-create-table
Note: Column name specified in hbase-row-key atribute must be in columns
 list. Otherwise command will execute successfully but no records are 
inserted into hbase.
Note : The value of primary key column or column specified in 
--hbase-row-key attribute become the HBase row value. If MySQL table 
doesn't have primary key or column specified in --hbase-row-key 
attribute doesn't have unique value then there is a lost of few records.
Example : Let us consider a MySQL table test_table which have two
 columns name,address. The table test_table doesn't have primary key or 
unique key column. 
Records of test_table:
________________
name address
----------------
sa 123
sb 234
sa 345
sc 456
sd 567
Run the following command to import test_table data into HBase:
$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table test_table --hbase-table hbase_test_table --column-family test_table_col1 --hbase-row-key name --hbase-create-table
Only 4 records are visible into HBase table instead of 5. In above 
example two rows have same value 'sa' of name column and value of this 
column is used as a HBase row key value. If record having value 'sa' of
 name column come again then those records will be inserted into HBase table. Next
 time, another record having the same value 'sa' of name column come 
then those column will overwrite the value of the previous column.
Above problem also occurs if table have composite primary key because 
the one column from composite key is used as a HBase row key.
Import the data (MySQL table) to Hive
Case 1: Import MySQL table into Hive if table have primary key.
bin/sqoop-import --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --hive-table tableName --create-hive-table --hive-import --hive-home path/to/hive_home
Case 2: Import MySQL table into Hive if table doesn't have primary key.
$ bin/sqoop-import --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --hive-table tableName --create-hive-table --hive-import --hive-home path/to/hive_home --split-by column_name
or
$ bin/sqoop-import --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --hive-table tableName --create-hive-table --hive-import --hive-home path/to/hive_home -m 1
Import the data (MySQL table) to HDFS 
Case 1: Import MySQL table into HDFS if table have primary key.
$ bin/sqoop import -connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --target-dir /user/ankit/tableName
Case 2: Import MySQL table into HDFS if table doesn't have primary key.
$ bin/sqoop import -connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --target-dir /user/ankit/tableName -m 1
Sqoop Export Examples:
Sqoop Export:  export the HDFS and its subproject (Hive, HBase) data back
into an RDBMS. 
By default, Hive will stored data using ^A as a field delimiter and \n as a row delimiter.
$ bin/sqoop export --connect jdbc:mysql://localhost/test_db --table tableName --export-dir /user/hive/warehouse/tableName --username root --password password -m 1 --input-fields-terminated-by '\001'
where '\001' is octal representation of ^A.
Good Article about sqoop import. While searching about sqoop import and export this http://www.geoinsyssoft.com url also has a good practices. Thank You Very Much
ReplyDelete