public class JobClient
extends org.apache.hadoop.conf.Configured
implements org.apache.hadoop.util.Tool
JobClient
is the primary interface for the user-job to interact
with the JobTracker
.
JobClient
provides facilities to submit jobs, track their
progress, access component-tasks' reports/logs, get the Map-Reduce cluster
status information etc.
The job submission process involves:
InputSplit
s for the job.
DistributedCache
of the job, if necessary.
JobTracker
and optionally monitoring
it's status.
JobConf
and then uses the JobClient
to submit
the job and monitor its progress.
Here is an example on how to use JobClient
:
// Create a new JobConf JobConf job = new JobConf(new Configuration(), MyJob.class); // Specify various job-specific parameters job.setJobName("myjob"); job.setInputPath(new Path("in")); job.setOutputPath(new Path("out")); job.setMapperClass(MyJob.MyMapper.class); job.setReducerClass(MyJob.MyReducer.class); // Submit the job, then poll for progress until the job is complete JobClient.runJob(job);
At times clients would chain map-reduce jobs to accomplish complex tasks which cannot be done via a single map-reduce job. This is fairly easy since the output of the job, typically, goes to distributed file-system and that can be used as the input for the next job.
However, this also means that the onus on ensuring jobs are complete (success/failure) lies squarely on the clients. In such situations the various job-control options are:
runJob(JobConf)
: submits the job and returns only after
the job has completed.
submitJob(JobConf)
: only submits the job, then poll the
returned handle to the RunningJob
to query status and make
scheduling decisions.
JobConf.setJobEndNotificationURI(String)
: setup a notification
on job-completion, thus avoiding polling.
JobConf
,
ClusterStatus
,
Tool
,
DistributedCache
Modifier and Type | Class and Description |
---|---|
static class |
JobClient.Renewer |
static class |
JobClient.TaskStatusFilter |
Modifier and Type | Field and Description |
---|---|
static long |
COUNTER_UPDATE_INTERVAL |
static long |
DEFAULT_DISK_HEALTH_CHECK_INTERVAL
How often TaskTracker needs to check the health of its disks, if not
configured using mapred.disk.healthChecker.interval
|
static int |
FILE_NOT_FOUND |
static java.lang.String |
FOR_REDUCE_TASK
The reduce task number for which this map output is being transferred
|
static java.lang.String |
FROM_MAP_TASK
The map task from which the map output data is being transferred
|
static int |
HEARTBEAT_INTERVAL_MIN_DEFAULT |
static java.lang.String |
MAP_OUTPUT_LENGTH
The custom http header used for the map output length.
|
static int |
MAPREDUCE_CLIENT_RPC_TIMEOUT_DEFAULT
Default rpc timeout for the JobClient is -1, or no timeout
|
static java.lang.String |
MAPREDUCE_CLIENT_RPC_TIMEOUT_KEY
Key in mapred-*.xml that determines what is the rpc timeout to use
for the JobClient, in milliseconds.
|
static java.lang.String |
RAW_MAP_OUTPUT_LENGTH
The custom http header used for the "raw" map output length.
|
static int |
SUCCESS |
static java.lang.String |
WORKDIR |
Constructor and Description |
---|
JobClient()
Create a job client.
|
JobClient(org.apache.hadoop.conf.Configuration conf)
Build a job client with the given
Configuration ,
and connect to the default cluster |
JobClient(java.net.InetSocketAddress jobTrackAddr,
org.apache.hadoop.conf.Configuration conf)
Build a job client, connect to the indicated job tracker.
|
JobClient(JobConf conf)
Build a job client with the given
JobConf , and connect to the
default JobTracker . |
Modifier and Type | Method and Description |
---|---|
void |
cancelDelegationToken(org.apache.hadoop.security.token.Token<DelegationTokenIdentifier> token)
Cancel a delegation token from the JobTracker
|
void |
close()
Close the
JobClient . |
void |
displayTasks(JobID jobId,
java.lang.String type,
java.lang.String state)
Display the information about a job's tasks, of a particular type and
in a particular state
|
JobStatus[] |
getAllJobs()
Get the jobs that are submitted.
|
TaskReport[] |
getCleanupTaskReports(JobID jobId)
Get the information of the current state of the cleanup tasks of a job.
|
ClusterStatus |
getClusterStatus()
Get status information about the Map-Reduce cluster.
|
ClusterStatus |
getClusterStatus(boolean detailed)
Get status information about the Map-Reduce cluster.
|
int |
getDefaultMaps()
Get status information about the max available Maps in the cluster.
|
int |
getDefaultReduces()
Get status information about the max available Reduces in the cluster.
|
org.apache.hadoop.security.token.Token<DelegationTokenIdentifier> |
getDelegationToken(org.apache.hadoop.io.Text renewer) |
org.apache.hadoop.fs.FileSystem |
getFs()
Get a filesystem handle.
|
RunningJob |
getJob(JobID jobid)
Get an
RunningJob object to track an ongoing job. |
RunningJob |
getJob(java.lang.String jobid)
Deprecated.
Applications should rather use
getJob(JobID) . |
JobStatus[] |
getJobsFromQueue(java.lang.String queueName)
Gets all the jobs which were added to particular Job Queue
|
TaskReport[] |
getMapTaskReports(JobID jobId)
Get the information of the current state of the map tasks of a job.
|
TaskReport[] |
getMapTaskReports(java.lang.String jobId)
Deprecated.
Applications should rather use
getMapTaskReports(JobID) |
org.apache.hadoop.mapred.QueueAclsInfo[] |
getQueueAclsForCurrentUser()
Gets the Queue ACLs for current user
|
JobQueueInfo |
getQueueInfo(java.lang.String queueName)
Gets the queue information associated to a particular Job Queue
|
JobQueueInfo[] |
getQueues()
Return an array of queue information objects about all the Job Queues
configured.
|
TaskReport[] |
getReduceTaskReports(JobID jobId)
Get the information of the current state of the reduce tasks of a job.
|
TaskReport[] |
getReduceTaskReports(java.lang.String jobId)
Deprecated.
Applications should rather use
getReduceTaskReports(JobID) |
static int |
getRpcTimeout(org.apache.hadoop.conf.Configuration conf)
Returns the rpc timeout to use according to the configuration.
|
TaskReport[] |
getSetupTaskReports(JobID jobId)
Get the information of the current state of the setup tasks of a job.
|
org.apache.hadoop.fs.Path |
getStagingAreaDir()
Grab the jobtracker's view of the staging directory path where
job-specific files will be placed.
|
org.apache.hadoop.fs.Path |
getSystemDir()
Grab the jobtracker system directory path where job-specific files are to be placed.
|
JobClient.TaskStatusFilter |
getTaskOutputFilter()
Deprecated.
|
static JobClient.TaskStatusFilter |
getTaskOutputFilter(JobConf job)
Get the task output filter out of the JobConf.
|
void |
init(JobConf conf)
Connect to the default
JobTracker . |
static boolean |
isJobDirValid(org.apache.hadoop.fs.Path jobDirPath,
org.apache.hadoop.fs.FileSystem fs)
Checks if the job directory is clean and has all the required components
for (re) starting the job
|
JobStatus[] |
jobsToComplete()
Get the jobs that are not completed and not failed.
|
static void |
main(java.lang.String[] argv) |
boolean |
monitorAndPrintJob(JobConf conf,
RunningJob job)
Monitor a job and print status in real-time as progress is made and tasks
fail.
|
long |
renewDelegationToken(org.apache.hadoop.security.token.Token<DelegationTokenIdentifier> token)
Renew a delegation token
|
int |
run(java.lang.String[] argv) |
static RunningJob |
runJob(JobConf job)
Utility that submits a job, then polls for progress until the job is
complete.
|
void |
setTaskOutputFilter(JobClient.TaskStatusFilter newValue)
Deprecated.
|
static void |
setTaskOutputFilter(JobConf job,
JobClient.TaskStatusFilter newValue)
Modify the JobConf to set the task output filter.
|
RunningJob |
submitJob(JobConf job)
Submit a job to the MR system.
|
RunningJob |
submitJob(java.lang.String jobFile)
Submit a job to the MR system.
|
RunningJob |
submitJobInternal(JobConf job)
Internal method for submitting jobs to the system.
|
@InterfaceAudience.Private public static final java.lang.String MAPREDUCE_CLIENT_RPC_TIMEOUT_KEY
@InterfaceAudience.Private public static final int MAPREDUCE_CLIENT_RPC_TIMEOUT_DEFAULT
public static final int HEARTBEAT_INTERVAL_MIN_DEFAULT
public static final long COUNTER_UPDATE_INTERVAL
public static final long DEFAULT_DISK_HEALTH_CHECK_INTERVAL
public static final int SUCCESS
public static final int FILE_NOT_FOUND
public static final java.lang.String MAP_OUTPUT_LENGTH
public static final java.lang.String RAW_MAP_OUTPUT_LENGTH
public static final java.lang.String FROM_MAP_TASK
public static final java.lang.String FOR_REDUCE_TASK
public static final java.lang.String WORKDIR
public JobClient()
public JobClient(JobConf conf) throws java.io.IOException
JobConf
, and connect to the
default JobTracker
.conf
- the job configuration.java.io.IOException
public JobClient(org.apache.hadoop.conf.Configuration conf) throws java.io.IOException
Configuration
,
and connect to the default clusterconf
- the configuration.java.io.IOException
public JobClient(java.net.InetSocketAddress jobTrackAddr, org.apache.hadoop.conf.Configuration conf) throws java.io.IOException
jobTrackAddr
- the job tracker to connect to.conf
- configuration.java.io.IOException
public void init(JobConf conf) throws java.io.IOException
JobTracker
.conf
- the job configuration.java.io.IOException
@InterfaceAudience.Private public static int getRpcTimeout(org.apache.hadoop.conf.Configuration conf)
conf
- public void close() throws java.io.IOException
JobClient
.java.io.IOException
public org.apache.hadoop.fs.FileSystem getFs() throws java.io.IOException
java.io.IOException
public RunningJob submitJob(java.lang.String jobFile) throws java.io.FileNotFoundException, InvalidJobConfException, java.io.IOException
RunningJob
which can be used to track
the running-job.jobFile
- the job configuration.RunningJob
which can be used to track the
running-job.java.io.FileNotFoundException
InvalidJobConfException
java.io.IOException
public RunningJob submitJob(JobConf job) throws java.io.FileNotFoundException, java.io.IOException
RunningJob
which can be used to track
the running-job.job
- the job configuration.RunningJob
which can be used to track the
running-job.java.io.FileNotFoundException
java.io.IOException
public RunningJob submitJobInternal(JobConf job) throws java.io.FileNotFoundException, java.lang.ClassNotFoundException, java.lang.InterruptedException, java.io.IOException
job
- the configuration to submitjava.io.FileNotFoundException
java.lang.ClassNotFoundException
java.lang.InterruptedException
java.io.IOException
public static boolean isJobDirValid(org.apache.hadoop.fs.Path jobDirPath, org.apache.hadoop.fs.FileSystem fs) throws java.io.IOException
java.io.IOException
public RunningJob getJob(JobID jobid) throws java.io.IOException
RunningJob
object to track an ongoing job. Returns
null if the id does not correspond to any known job.jobid
- the jobid of the job.RunningJob
handle to track the job, null if the
jobid
doesn't correspond to any known job.java.io.IOException
@Deprecated public RunningJob getJob(java.lang.String jobid) throws java.io.IOException
getJob(JobID)
.java.io.IOException
public TaskReport[] getMapTaskReports(JobID jobId) throws java.io.IOException
jobId
- the job to query.java.io.IOException
@Deprecated public TaskReport[] getMapTaskReports(java.lang.String jobId) throws java.io.IOException
getMapTaskReports(JobID)
java.io.IOException
public TaskReport[] getReduceTaskReports(JobID jobId) throws java.io.IOException
jobId
- the job to query.java.io.IOException
public TaskReport[] getCleanupTaskReports(JobID jobId) throws java.io.IOException
jobId
- the job to query.java.io.IOException
public TaskReport[] getSetupTaskReports(JobID jobId) throws java.io.IOException
jobId
- the job to query.java.io.IOException
@Deprecated public TaskReport[] getReduceTaskReports(java.lang.String jobId) throws java.io.IOException
getReduceTaskReports(JobID)
java.io.IOException
public void displayTasks(JobID jobId, java.lang.String type, java.lang.String state) throws java.io.IOException
jobId
- the ID of the jobtype
- the type of the task (map/reduce/setup/cleanup)state
- the state of the task
(pending/running/completed/failed/killed)java.io.IOException
public ClusterStatus getClusterStatus() throws java.io.IOException
ClusterStatus
.java.io.IOException
public ClusterStatus getClusterStatus(boolean detailed) throws java.io.IOException
detailed
- if true then get a detailed status including the
tracker names and memory usage of the JobTrackerClusterStatus
.java.io.IOException
public org.apache.hadoop.fs.Path getStagingAreaDir() throws java.io.IOException
java.io.IOException
public JobStatus[] jobsToComplete() throws java.io.IOException
JobStatus
for the running/to-be-run jobs.java.io.IOException
public JobStatus[] getAllJobs() throws java.io.IOException
JobStatus
for the submitted jobs.java.io.IOException
public static RunningJob runJob(JobConf job) throws java.io.IOException
job
- the job configuration.java.io.IOException
- if the job failspublic boolean monitorAndPrintJob(JobConf conf, RunningJob job) throws java.io.IOException, java.lang.InterruptedException
conf
- the job's configurationjob
- the job to trackjava.io.IOException
- if communication to the JobTracker failsjava.lang.InterruptedException
@Deprecated public void setTaskOutputFilter(JobClient.TaskStatusFilter newValue)
newValue
- task filter.public static JobClient.TaskStatusFilter getTaskOutputFilter(JobConf job)
job
- the JobConf to examine.public static void setTaskOutputFilter(JobConf job, JobClient.TaskStatusFilter newValue)
job
- the JobConf to modify.newValue
- the value to set.@Deprecated public JobClient.TaskStatusFilter getTaskOutputFilter()
public int run(java.lang.String[] argv) throws java.lang.Exception
run
in interface org.apache.hadoop.util.Tool
java.lang.Exception
public int getDefaultMaps() throws java.io.IOException
java.io.IOException
public int getDefaultReduces() throws java.io.IOException
java.io.IOException
public org.apache.hadoop.fs.Path getSystemDir()
public JobQueueInfo[] getQueues() throws java.io.IOException
java.io.IOException
public JobStatus[] getJobsFromQueue(java.lang.String queueName) throws java.io.IOException
queueName
- name of the Job Queuejava.io.IOException
public JobQueueInfo getQueueInfo(java.lang.String queueName) throws java.io.IOException
queueName
- name of the job queue.java.io.IOException
public org.apache.hadoop.mapred.QueueAclsInfo[] getQueueAclsForCurrentUser() throws java.io.IOException
java.io.IOException
public org.apache.hadoop.security.token.Token<DelegationTokenIdentifier> getDelegationToken(org.apache.hadoop.io.Text renewer) throws java.io.IOException, java.lang.InterruptedException
java.io.IOException
java.lang.InterruptedException
public long renewDelegationToken(org.apache.hadoop.security.token.Token<DelegationTokenIdentifier> token) throws org.apache.hadoop.security.token.SecretManager.InvalidToken, java.io.IOException, java.lang.InterruptedException
token
- the token to reneworg.apache.hadoop.security.token.SecretManager.InvalidToken
java.io.IOException
java.lang.InterruptedException
public void cancelDelegationToken(org.apache.hadoop.security.token.Token<DelegationTokenIdentifier> token) throws java.io.IOException, java.lang.InterruptedException
token
- the token to canceljava.io.IOException
java.lang.InterruptedException
public static void main(java.lang.String[] argv) throws java.lang.Exception
java.lang.Exception
Copyright © 2009 The Apache Software Foundation