One can get started by reading the
Père Cluster User's Guide, which has details for getting an account
and using the primary resource on MUGrid. Below is a short narrative
on how one might use distributed resources to solve big computational
tasks. Additional tutorials are available
as well.
One can use a grid to solve big computational tasks. Big could mean running independent simulations millions of times with parameters changing (sometimes called distributed or embarassingly parallel). Big could also mean a massively parallel finite element simulation of a really large geometry with billions of nodes. Père and Pario are right for the latter type. The former can be done on all of MUGrid.
Suppose one has a simulation program that is called from the command line as follows:
$ runsimulation 27
Where runsimulation is the executable and 27 is the input parameter which can change ($ just represents the prompt). Also suppose that the output of this runsimulation program is given to an output file out27.dat. To have that simulation run on MUGrid, a submission script is needed, let's call it submit27.condor, shown here:
universe = vanilla executable = runsimulation transfer_output_files=out27.dat output = 27.out error = 27.err log = 27.log Arguments = 27 requirements = Arch == "INTEL" && OpSys == "LINUX" should_transfer_files = true when_to_transfer_output = on_exit queue
In the file, the executable is runsimulation, the argument is 27 (you can have a list), the outputfile is out27.dat and the rest in this simple example can be viewed as overhead.
To submit it, log into a submit host making sure a working version of runsimulation is on it and the submit27.condor file is there and then just type:
$ condor_submit submit27.condor
Then watch the progress of you job by typing
$ condor_q
If you want to submit runsimulation with parameters ranging from 0 to 999, a new script is needed, shown here in a file called submitall.condor.
universe = vanilla executable = runsimulation transfer_output_files=out$(Process).dat output = $(Process).out error = $(Process).err log = $(Process).log Arguments = $(Process) requirements = Arch == "INTEL" && OpSys == "LINUX" should_transfer_files = true when_to_transfer_output = on_exit queue 1000
This will run all 1000 jobs for you and out0.dat through out999.dat will be waiting for you at the submithost when they are done.
References