public abstract class CombineFileInputFormat<K,V> extends FileInputFormat<K,V>
InputFormat
that returns CombineFileSplit
's
in InputFormat.getSplits(JobConf, int)
method.
Splits are constructed from the files under the input paths.
A split cannot have files from different pools.
Each split returned may contain blocks from different files.
If a maxSplitSize is specified, then blocks on the same node are
combined to form a single split. Blocks that are left over are
then combined with other blocks in the same rack.
If maxSplitSize is not specified, then blocks from the same rack
are combined in a single split; no attempt is made to create
node-local splits.
If the maxSplitSize is equal to the block size, then this class
is similar to the default spliting behaviour in Hadoop: each
block is a locally processed split.
Subclasses implement InputFormat.getRecordReader(InputSplit, JobConf, Reporter)
to construct RecordReader
's for CombineFileSplit
's.CombineFileSplit
LOG
Constructor and Description |
---|
CombineFileInputFormat()
default constructor
|
Modifier and Type | Method and Description |
---|---|
protected void |
createPool(JobConf conf,
java.util.List<org.apache.hadoop.fs.PathFilter> filters)
Create a new pool and add the filters to it.
|
protected void |
createPool(JobConf conf,
org.apache.hadoop.fs.PathFilter... filters)
Create a new pool and add the filters to it.
|
abstract RecordReader<K,V> |
getRecordReader(InputSplit split,
JobConf job,
Reporter reporter)
This is not implemented yet.
|
InputSplit[] |
getSplits(JobConf job,
int numSplits)
Splits files returned by
FileInputFormat.listStatus(JobConf) when
they're too big. |
protected boolean |
isSplitable(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path file)
Is the given filename splitable? Usually, true, but if the file is
stream compressed, it will not be.
|
protected void |
setMaxSplitSize(long maxSplitSize)
Specify the maximum size (in bytes) of each split.
|
protected void |
setMinSplitSizeNode(long minSplitSizeNode)
Specify the minimum size (in bytes) of each split per node.
|
protected void |
setMinSplitSizeRack(long minSplitSizeRack)
Specify the minimum size (in bytes) of each split per rack.
|
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getSplitHosts, listStatus, setInputPathFilter, setInputPaths, setInputPaths, setMinSplitSize
protected void setMaxSplitSize(long maxSplitSize)
protected void setMinSplitSizeNode(long minSplitSizeNode)
protected void setMinSplitSizeRack(long minSplitSizeRack)
protected void createPool(JobConf conf, java.util.List<org.apache.hadoop.fs.PathFilter> filters)
protected void createPool(JobConf conf, org.apache.hadoop.fs.PathFilter... filters)
protected boolean isSplitable(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path file)
FileInputFormat
FileInputFormat
implementations can override this and return
false
to ensure that individual input files are never split-up
so that Mapper
s process entire files.isSplitable
in class FileInputFormat<K,V>
fs
- the file system that the file is onfile
- the file name to checkpublic InputSplit[] getSplits(JobConf job, int numSplits) throws java.io.IOException
FileInputFormat
FileInputFormat.listStatus(JobConf)
when
they're too big.getSplits
in interface InputFormat<K,V>
getSplits
in class FileInputFormat<K,V>
job
- job configuration.numSplits
- the desired number of splits, a hint.InputSplit
s for the job.java.io.IOException
public abstract RecordReader<K,V> getRecordReader(InputSplit split, JobConf job, Reporter reporter) throws java.io.IOException
getRecordReader
in interface InputFormat<K,V>
getRecordReader
in class FileInputFormat<K,V>
split
- the InputSplit
job
- the job that this split belongs toRecordReader
java.io.IOException
Copyright © 2009 The Apache Software Foundation