Sqoop Import :- Import data from a
relational database management system (RDBMS) such as MySQL or Oracle
into the Hadoop Distributed File System (HDFS) and its subprojects (Hive, HBase).
Import the data (MySQL table) to HBase:
Case 1: If table have primary key and import all the column of MySQL table into HBase table.
$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table tableName --hbase-table hbase_tableName --column-family hbase_table_col1 --hbase-create-table
Case 2: If table have primary key and import only few columns of MySQL table into HBase table.
$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table tableName --hbase-table hbase_tableName --columns column1,column2 --column-family hbase_table_col1 --hbase-create-table
Note : Column names specified in --columns attribute must contain the primary key column.
Case 3: If table doesn't have primary key then choose one column
as a hbase-row-key. Import all the column of MySQL table into HBase
table.
$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table tableName --hbase-table hbase_tableName --column-family hbase_table_col1 --hbase-row-key column1 --hbase-create-table
Case 4: If table doesn't have primary key then choose one column
as a hbase-row-key. Import only few columns of MySQL table into HBase
table.
$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table tableName --hbase-table hbase_tableName --columns column1,column2 --column-family hbase_table_col --hbase-row-key column1 --hbase-create-table
Note: Column name specified in hbase-row-key atribute must be in columns
list. Otherwise command will execute successfully but no records are
inserted into hbase.
Note : The value of primary key column or column specified in
--hbase-row-key attribute become the HBase row value. If MySQL table
doesn't have primary key or column specified in --hbase-row-key
attribute doesn't have unique value then there is a lost of few records.
Example : Let us consider a MySQL table test_table which have two
columns name,address. The table test_table doesn't have primary key or
unique key column.
Records of test_table:
________________
name address
----------------
sa 123
sb 234
sa 345
sc 456
sd 567
Run the following command to import test_table data into HBase:
$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table test_table --hbase-table hbase_test_table --column-family test_table_col1 --hbase-row-key name --hbase-create-table
Only 4 records are visible into HBase table instead of 5. In above
example two rows have same value 'sa' of name column and value of this
column is used as a HBase row key value. If record having value 'sa' of
name column come again then those records will be inserted into HBase table. Next
time, another record having the same value 'sa' of name column come
then those column will overwrite the value of the previous column.
Above problem also occurs if table have composite primary key because
the one column from composite key is used as a HBase row key.
Import the data (MySQL table) to Hive
Case 1: Import MySQL table into Hive if table have primary key.
bin/sqoop-import --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --hive-table tableName --create-hive-table --hive-import --hive-home path/to/hive_home
Case 2: Import MySQL table into Hive if table doesn't have primary key.
$ bin/sqoop-import --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --hive-table tableName --create-hive-table --hive-import --hive-home path/to/hive_home --split-by column_name
or
$ bin/sqoop-import --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --hive-table tableName --create-hive-table --hive-import --hive-home path/to/hive_home -m 1
Import the data (MySQL table) to HDFS
Case 1: Import MySQL table into HDFS if table have primary key.
$ bin/sqoop import -connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --target-dir /user/ankit/tableName
Case 2: Import MySQL table into HDFS if table doesn't have primary key.
$ bin/sqoop import -connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --target-dir /user/ankit/tableName -m 1
Sqoop Export Examples:
Sqoop Export: export the HDFS and its subproject (Hive, HBase) data back
into an RDBMS.
By default, Hive will stored data using ^A as a field delimiter and \n as a row delimiter.
$ bin/sqoop export --connect jdbc:mysql://localhost/test_db --table tableName --export-dir /user/hive/warehouse/tableName --username root --password password -m 1 --input-fields-terminated-by '\001'
where '\001' is octal representation of ^A.
Good Article about sqoop import. While searching about sqoop import and export this http://www.geoinsyssoft.com url also has a good practices. Thank You Very Much
ReplyDelete