Gaussian Job with Checkpointing Run
For running a large system with Gaussian, it usually takes a long time and many resources to complete. It is a good idea to set up checkpointing so the calculation can keep going in case of any interruption due to walltime limit or possible system malfunction. The checkpointing function can save a snapshot of a Gaussian running state so it can restart from the previous calculation. Users can also divide a long-time job into many 4-hour short jobs since jobs with walltime less than or equal to 4 hours can use the buy-in nodes (55% of all nodes) on the HPCC.
In order to have an appropriate checkpointing run with Gaussian,
an unified read-write file setting (%RWF
) should be in the Link 0
section of the input file. An example water.gjf
is in the following:
water.gjf
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
The input file requests geometry optimization of 5 water molecules with
a very large basis set aug-cc-pVTZ
. It will take about 25 CPU hours to
finish the whole calculation. We have the setting on %RWF
which
specifies water.rwf
file for the checkpointing function besides
the water.chk
file. Since the specification %RWF
is placed before
the %NoSave
line, the rwf file will be deleted if the calculation is
normally completed without any error.
In order to have several restarts running after the first run
stops, we can build a restart Gaussian input file restart.gjf
simply
as
restart.gjf
1 2 3 4 5 6 |
|
Since all information about the calculation is recorded in the rwf file, a line with "Restart" is enough for Gaussian to restart from the previous job. This restart input file can also be created by the commands:
1 2 |
|
where we simply "grep
" the lines starting with "%
" sign in
water.gjf
and put them in the Gaussian restart file with
"#P Restart
" line in the end.
Now we need a job script to submit the Gaussian calculation. The
script needs to keep submitting jobs to restart the previous calculation
until it is completed. Here is a job script water.sb
which can do the
work:
water.sb
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
|
where a background script in (---)&
from line 14 to 20 is added to
keep submitting jobs
.
Once the job is started, the background script is running at the
same time as the foreground script. The background script is in sleep
for 3 hours and 55 minutes first. During this time, the foreground
script runs the Gaussian calculation or restarts the previous calculation
if the checkpointing files water.rwf
and water.chk
exist. After 5
minutes before the end of the job, the background is awake to print out
the resource usage and Gaussian output. It submits another job and stops
the current running job in line 19 and 20 if the g16 command in line 29
is not completed. If the g16 command is finished before the background
script is awake, the job will keep executing all command lines after line 30
and finish. There will be no more jobs submitted.
Since the rwf file usually takes a lot of file space, it is
suggested to run checkpointing jobs in scratch space in case your home
or research space is over quota. Users can create a directory in their
scratch space. Copy all files (water,gjf
, restart.gjf
and
water.sb
) and submit the job script there. Please check your job
status frequently. Make sure to copy necessary files back to your home or
research directory from time to time since files on scratch will be
purged if they have not been modified for 45 days.
Note
The time for running the background script needs to be longer than the time needed for a cycle of Gaussian analysis to avoid restarting from the point of privious run again. The checkpointing is done between cycles.