public class DataDrivenDBInputFormat<T extends DBWritable> extends DBInputFormat<T> implements org.apache.hadoop.conf.Configurable
Modifier and Type | Class and Description |
---|---|
static class |
DataDrivenDBInputFormat.DataDrivenDBInputSplit
A InputSplit that spans a set of rows
|
DBInputFormat.DBInputSplit, DBInputFormat.NullDBWritable
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
SUBSTITUTE_TOKEN
If users are providing their own query, the following string is expected to
appear in the WHERE clause, which will be substituted with a pair of conditions
on the input to allow input splits to parallelise the import.
|
Constructor and Description |
---|
DataDrivenDBInputFormat() |
Modifier and Type | Method and Description |
---|---|
protected RecordReader<org.apache.hadoop.io.LongWritable,T> |
createDBRecordReader(DBInputFormat.DBInputSplit split,
org.apache.hadoop.conf.Configuration conf) |
protected java.lang.String |
getBoundingValsQuery() |
java.util.List<InputSplit> |
getSplits(JobContext job)
Logically split the set of input files for the job.
|
protected DBSplitter |
getSplitter(int sqlDataType) |
static void |
setBoundingQuery(org.apache.hadoop.conf.Configuration conf,
java.lang.String query)
Set the user-defined bounding query to use with a user-defined query.
|
static void |
setInput(Job job,
java.lang.Class<? extends DBWritable> inputClass,
java.lang.String inputQuery,
java.lang.String inputBoundingQuery)
setInput() takes a custom query and a separate "bounding query" to use
instead of the custom "count query" used by DBInputFormat.
|
static void |
setInput(Job job,
java.lang.Class<? extends DBWritable> inputClass,
java.lang.String tableName,
java.lang.String conditions,
java.lang.String splitBy,
java.lang.String... fieldNames)
Note that the "orderBy" column is called the "splitBy" in this version.
|
closeConnection, createRecordReader, getConf, getConnection, getCountQuery, getDBConf, getDBProductName, setConf
public static final java.lang.String SUBSTITUTE_TOKEN
protected DBSplitter getSplitter(int sqlDataType)
public java.util.List<InputSplit> getSplits(JobContext job) throws java.io.IOException
Each InputSplit
is then assigned to an individual Mapper
for processing.
Note: The split is a logical split of the inputs and the
input files are not physically split into chunks. For e.g. a split could
be <input-file-path, start, offset> tuple. The InputFormat
also creates the RecordReader
to read the InputSplit
.
getSplits
in class DBInputFormat<T extends DBWritable>
job
- job configuration.InputSplit
s for the job.java.io.IOException
protected java.lang.String getBoundingValsQuery()
public static void setBoundingQuery(org.apache.hadoop.conf.Configuration conf, java.lang.String query)
protected RecordReader<org.apache.hadoop.io.LongWritable,T> createDBRecordReader(DBInputFormat.DBInputSplit split, org.apache.hadoop.conf.Configuration conf) throws java.io.IOException
createDBRecordReader
in class DBInputFormat<T extends DBWritable>
java.io.IOException
public static void setInput(Job job, java.lang.Class<? extends DBWritable> inputClass, java.lang.String tableName, java.lang.String conditions, java.lang.String splitBy, java.lang.String... fieldNames)
public static void setInput(Job job, java.lang.Class<? extends DBWritable> inputClass, java.lang.String inputQuery, java.lang.String inputBoundingQuery)
Copyright © 2009 The Apache Software Foundation