public class TeraInputFormat extends FileInputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>
LOG
Constructor and Description |
---|
TeraInputFormat() |
Modifier and Type | Method and Description |
---|---|
RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text> |
getRecordReader(InputSplit split,
JobConf job,
Reporter reporter)
Get the
RecordReader for the given InputSplit . |
InputSplit[] |
getSplits(JobConf conf,
int splits)
Splits files returned by
FileInputFormat.listStatus(JobConf) when
they're too big. |
static void |
writePartitionFile(JobConf conf,
org.apache.hadoop.fs.Path partFile)
Use the input splits to take samples of the input and generate sample
keys.
|
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getSplitHosts, isSplitable, listStatus, setInputPathFilter, setInputPaths, setInputPaths, setMinSplitSize
public static void writePartitionFile(JobConf conf, org.apache.hadoop.fs.Path partFile) throws java.io.IOException
conf
- the job to samplepartFile
- where to write the output file tojava.io.IOException
- if something goes wrongpublic RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text> getRecordReader(InputSplit split, JobConf job, Reporter reporter) throws java.io.IOException
InputFormat
RecordReader
for the given InputSplit
.
It is the responsibility of the RecordReader
to respect
record boundaries while processing the logical split to present a
record-oriented view to the individual task.
getRecordReader
in interface InputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>
getRecordReader
in class FileInputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>
split
- the InputSplit
job
- the job that this split belongs toRecordReader
java.io.IOException
public InputSplit[] getSplits(JobConf conf, int splits) throws java.io.IOException
FileInputFormat
FileInputFormat.listStatus(JobConf)
when
they're too big.getSplits
in interface InputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>
getSplits
in class FileInputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>
conf
- job configuration.splits
- the desired number of splits, a hint.InputSplit
s for the job.java.io.IOException
Copyright © 2009 The Apache Software Foundation