public abstract class FileInputFormat<K,V> extends InputFormat<K,V>
InputFormat
s.
FileInputFormat
is the base class for all file-based
InputFormat
s. This provides a generic implementation of
getSplits(JobContext)
.
Subclasses of FileInputFormat
can also override the
isSplitable(JobContext, Path)
method to ensure input-files are
not split-up and are processed as a whole by Mapper
s.
Constructor and Description |
---|
FileInputFormat() |
Modifier and Type | Method and Description |
---|---|
static void |
addInputPath(Job job,
org.apache.hadoop.fs.Path path)
Add a
Path to the list of inputs for the map-reduce job. |
static void |
addInputPaths(Job job,
java.lang.String commaSeparatedPaths)
Add the given comma separated paths to the list of inputs for
the map-reduce job.
|
protected long |
computeSplitSize(long blockSize,
long minSize,
long maxSize) |
protected int |
getBlockIndex(org.apache.hadoop.fs.BlockLocation[] blkLocations,
long offset) |
protected long |
getFormatMinSplitSize()
Get the lower bound on split size imposed by the format.
|
static org.apache.hadoop.fs.PathFilter |
getInputPathFilter(JobContext context)
Get a PathFilter instance of the filter set for the input paths.
|
static org.apache.hadoop.fs.Path[] |
getInputPaths(JobContext context)
Get the list of input
Path s for the map-reduce job. |
static long |
getMaxSplitSize(JobContext context)
Get the maximum split size.
|
static long |
getMinSplitSize(JobContext job)
Get the minimum split size
|
java.util.List<InputSplit> |
getSplits(JobContext job)
Generate the list of files and make them into FileSplits.
|
protected boolean |
isSplitable(JobContext context,
org.apache.hadoop.fs.Path filename)
Is the given filename splitable? Usually, true, but if the file is
stream compressed, it will not be.
|
protected java.util.List<org.apache.hadoop.fs.FileStatus> |
listStatus(JobContext job)
List input directories.
|
static void |
setInputPathFilter(Job job,
java.lang.Class<? extends org.apache.hadoop.fs.PathFilter> filter)
Set a PathFilter to be applied to the input paths for the map-reduce job.
|
static void |
setInputPaths(Job job,
org.apache.hadoop.fs.Path... inputPaths)
Set the array of
Path s as the list of inputs
for the map-reduce job. |
static void |
setInputPaths(Job job,
java.lang.String commaSeparatedPaths)
Sets the given comma separated paths as the list of inputs
for the map-reduce job.
|
static void |
setMaxInputSplitSize(Job job,
long size)
Set the maximum split size
|
static void |
setMinInputSplitSize(Job job,
long size)
Set the minimum input split size
|
createRecordReader
protected long getFormatMinSplitSize()
protected boolean isSplitable(JobContext context, org.apache.hadoop.fs.Path filename)
FileInputFormat
implementations can override this and return
false
to ensure that individual input files are never split-up
so that Mapper
s process entire files.context
- the job contextfilename
- the file name to checkpublic static void setInputPathFilter(Job job, java.lang.Class<? extends org.apache.hadoop.fs.PathFilter> filter)
job
- the job to modifyfilter
- the PathFilter class use for filtering the input paths.public static void setMinInputSplitSize(Job job, long size)
job
- the job to modifysize
- the minimum sizepublic static long getMinSplitSize(JobContext job)
job
- the jobpublic static void setMaxInputSplitSize(Job job, long size)
job
- the job to modifysize
- the maximum split sizepublic static long getMaxSplitSize(JobContext context)
context
- the job to look at.public static org.apache.hadoop.fs.PathFilter getInputPathFilter(JobContext context)
protected java.util.List<org.apache.hadoop.fs.FileStatus> listStatus(JobContext job) throws java.io.IOException
job
- the job to list input paths forjava.io.IOException
- if zero items.public java.util.List<InputSplit> getSplits(JobContext job) throws java.io.IOException
getSplits
in class InputFormat<K,V>
job
- job configuration.InputSplit
s for the job.java.io.IOException
protected long computeSplitSize(long blockSize, long minSize, long maxSize)
protected int getBlockIndex(org.apache.hadoop.fs.BlockLocation[] blkLocations, long offset)
public static void setInputPaths(Job job, java.lang.String commaSeparatedPaths) throws java.io.IOException
job
- the jobcommaSeparatedPaths
- Comma separated paths to be set as
the list of inputs for the map-reduce job.java.io.IOException
public static void addInputPaths(Job job, java.lang.String commaSeparatedPaths) throws java.io.IOException
job
- The job to modifycommaSeparatedPaths
- Comma separated paths to be added to
the list of inputs for the map-reduce job.java.io.IOException
public static void setInputPaths(Job job, org.apache.hadoop.fs.Path... inputPaths) throws java.io.IOException
Path
s as the list of inputs
for the map-reduce job.job
- The job to modifyinputPaths
- the Path
s of the input directories/files
for the map-reduce job.java.io.IOException
public static void addInputPath(Job job, org.apache.hadoop.fs.Path path) throws java.io.IOException
Path
to the list of inputs for the map-reduce job.job
- The Job
to modifypath
- Path
to be added to the list of inputs for
the map-reduce job.java.io.IOException
public static org.apache.hadoop.fs.Path[] getInputPaths(JobContext context)
Path
s for the map-reduce job.context
- The jobPath
s for the map-reduce job.Copyright © 2009 The Apache Software Foundation