Hadoop on Demand under SGE
Hadoop on Demand (HOD) is a system for dynamically creating and using Hadoop clusters. As distributed it is designed to integrate with the Torque resource manager, but it is relatively easy to make it work with other resource managers. This document describes how to integrate with SGE and provides the needed source changes to do so. If you install this, please drop us a line and let us know how things went. We'd be happy to help you get it working if it lets us clean up any bugs in our modules.
If you don't really care how this happens, but just want to get HOD working under SGE, follow these directions. A better description of what actually happens is provided below. They are enjoyable and a better understanding of how HOD works will make for a better administrator. Really. You'll need to rename the files when you download them, I named them so you would be less likely to overwrite them as two of them are sge.py! While we get storage configured for this wiki, these links are external, so sorry about the jump!
- Download sch_sge.py and place in hod/hodlib/Schedulers/sge.py
- Download np_sge.py] and place in hod/hodlib/NodePools/sge.py
- Replace the existing hod/hodlib/Common/nodepoolutil.py with nodepoolutil.py
- In your hodrc file set resource_manager.id to "sge" (it's "torque" by default)
- You may want to take a quick look at the HODV4 page for instructions on configuring HOD in general.
- For [Ranger] go HODRanger here
HOD is reasonably modular and provides an interface to Torque commands through the hodlib/Scheduler/torque.py classes. In order to work for SGE, we'll need to provide a new Schedulers interface and register this name with hodlib/Common/nodepoolutil.py. I've added a dynamic import of modules based on the resource_manager.id provided in hodrc, which maps to classes added to the distribution. This allows the addition of other resource manager interfaces, without editing any existing HOD code. So replacing this file makes it much easier to add more modules. To learn more about this, check the explanation included in the HODMoab integration page.
Torque qsub to SGE qsub
Torque and SGE job submission commands share the same name and many of the same arguments. There exists an old standard that both qsub's adhere to, but there has been divergence at some point, so it's good to look through the qsub line and make sure we understand the parameters. Here is the qsub line from HOD:
$ qsub -l nodes=4,walltime=300 -Wx= -l nodes=4,walltime=300 -W x= -N "HOD" -r n -d /tmp/ -q v4 -A acctName -v HOD_PYTHON_HOME=/usr/local/bin/python
- '-l' sets job resource limits and is the same for Torque and SGE. However, walltime= becomes h_rt for SGE and nodes should be specified under -pe. Sge also seems less fond of the repeats here. I simplified the command application. I should probably also think about templating these job files instead of the dictionary of dictionaries used here.
- '-W x= ' this unused setting does not get safely ignored by SGE
- '-N "HOD"' This names the job for Torque and SGE
- '-r n' Specifies that the job is not re-runnable
- '-d /tmp/' For Torque, this sets the working path of the job (PBS_INITDIR), should be -wd for SGE
- '-q v4' Specifies the queue the job should be run under for Torque and SGE
- '-A acctName' Specifies the account the job should be charged against for Torque and SGE
- '-v HOD_...' Specifies environment variables that should be exported into the batch script for Torque and SGE
So we add:
- '-pe 1way nodes*16' - we interpret the nodes request as a NODES request, so we translate to procs and go
- '-wd $SCRATCH'
So the only needed change is altering -d to be -wd, this change is made in NodePools/sge.py
Torque qstat to SGE qstat
Minor change to the call, fairly major changes to the output. Torque:
$ qstat -f -1 jobid
$ qstat -f -j jobid
However, the returned information is very different. It seems like the best way to extract all information is to use the xml input and pull extended task information (-t) and basically create a set to get the execution hosts. So we end up using this command:
$ qstat -s a -t -u `whoami`
There is a lot of duplication noise in that output, but I'm not sure of another way to get this information. Note that this results in truncated hostnames in the exec_host list, but it appears to note matter. The parallel start get's the correct hostnames from PE_HOSTFILE, which is all that seems to matter(?). As the dameons come up they correctly replrt their address which is what goes into the site.xml file. Job states must be mapped here again, the most notable offender is SGE's "QW" state, which is translated to 'Q'.
Torque qdel to SGE qdel
No changes, though it's pretty likely that the force option won't work.
I have yet to see this thing used and so I haven't bothered with it yet.