Deploying transfer matrix method calculations on a HTCondor cluster
The transfer matrix method (TMM) is a numerically exact method for calcuating the inherent optical properties (IOPs) of a particle with size comparable to the wavelength of light considered. Dr. Mishchenko released a series of articles and books detailing the method, in addition to a code base that can be readily used to calculate the IOPs of homogenous spheroid particles, among other axisymetric shapes. I am interested in computing the IOPs of aquatic particles in random and wave aligned orientations to examine the effect of wave preffered orientation on light attenuation near the surface of lakes and oceans. This project/blog post details some of the implementation details needed to produce a dataset of IOPs using a HTCondor cluster.
Mischenko’s code is written in Fortran-77 and expects to be used with Intel Fortran compilers. Testing on my workstation indicated that each particle I wished to compute the IOPs for required 4-24 hours of computation time with 8 cores and 1 GB of memory. For a single workstation testing many hunderds of cases could take months or years, however the modest computer requirements per particle case makes this workload perfect for a HTCondor cluster. The Center for High Throughput Computing (CHTC) at the University of Wisconsin-Madison provides a 300+ node cluster that can compute the IOPs in a matter of hours or days, by comparison.
The CHTC team does not provide Intel compilers by default, however you can use OCI containers on the cluster to ensure availability of the Intel fortran suite.
Intel provides ready made containers with their OneAPI tool kit installed on DockerHub like docker.io/intel/oneapi-hpckit:2024.0.1-devel-rockylinux9
. A custom Containerfile can be written to add additional packages, as needed.
With all the dependancies resolved, Mischenko’s code must now be modified for my particular use case. As distributed, his code uses a simple definition of input paramaters like
C INPUT DATA
AXI=10D0
RAT=0.1 D0
LAM=DACOS(-1D0)*2D0
MRR=1.5 D0
MRI=0.02 D0
EPS=0.5 D0
which is fine for a single test case, but requires each case to be compiled separately. Managing hundreds or thousands of pre-compilied binaries and deploying them to HTCondor is outside of my interests, so I simply rewrite the paramater input section like
C Use input file so that we can precompile
INQUIRE(file="inpout.txt", exist=exists)
IF (exists) then
OPEN(newunit=io, file="input.txt", status="old", action="read")
READ(io, *) AXI, RAT, LAM, MRR, MRI, EPS, ALPHA, BETA
END IF
with some additional modification in the variable definitions.
Now, the OCI container can be run and used to compile a binary that will be executed by Condor for all test cases defined by input.txt
.
Systematic generation of the test cases can be simplified with the use of python
.
Looping through each paramater of interest, create a directory and write a input file in each directory
import os, shutil
# set default values
myvals = {
'AXI':'1 D0',
'RAT':'1',
'LAM':'0.665 D0',
'MRR':'1.05 D0',
'MRI':'0.01 D0',
'EPS':'2.0 D0',
'ALPHA':'0 D0',
'BETA':'63.4 D0'
}
# calculate desired cases here
# I only vary size, shape, and zenith angle for my problem
# omitted for brevity
for (radius, beta, eps), datadir in zip(datavals, datadirs):
# ensure directory exists here
# omitted for brevity
# replace defaults with looping values
myvals['AXI'] = f"{radius:.1f} D0"
myvals['BETA'] = f"{beta:.2f} D0"
myvals['EPS'] = f"{eps:.2f} D0"
# in Fortran we read as space delimited data like
# READ(io, *) AXI, RAT, LAM, MRR, MRI, EPS, ALPHA, BETA
keys = ['AXI','RAT','LAM','MRR','MRI','ALPHA','BETA']
writevals = [ f"{myvals[key]}".replace(' ','') for key in keys ]
line = ' '.join(writevals)
with open(f'{datadir}/input.txt','w') as file:
file.write(line)
which ensures that each case of interest is generated in a way that HTCondor can locate and run.
Finally, we must tell HTCondor how to execute each test case. The submission file looks like
container_image = docker://quay.io/sharry1679/myhpc:latest
universe = container
initialdir = $(job_dir)
executable = run.sh
should_transfer_files = YES
transfer_input_files = main_precompile,input.txt
request_cpus = 8
request_memory = 1G
request_disk = 5G
queue job_dir from job-dirs.txt
where job-dirs.txt can be populated with a shell command like
find . -type d -exec test -e {}/input.txt -a ! -e {}/test.gz -print | tee job-dirs.txt | wc -l
because the input file must exist and the output file must not exist in any directory that needs to be processed by HTCondor.
By default Mischenko’s code produces and output file called test
but the storage savings of compressing with gzip
or similar are substantial.
Finally, all of the output data can be scanned and organized into an sqlite
database for postprocessing.
conn = sqlite3.connect("inherent_optical_properties.db")
cur = conn.cursor()
cur.execute("DROP TABLE IF EXISTS wave")
cur.execute('''
CREATE TABLE IF NOT EXISTS wave
(valid_results INTEGER, aspect_ratio REAL, light_length REAL, equal_sphere_radius REAL,
minor_radius REAL, major_radisu REAL, minor_diameter REAL, major_diameter REAL,
polar_angle REAL, c_ext REAL, c_sca REAL, c_abs REAL,
equal_sphere_radius_int INTEGER, aspect_ratio_int INTEGER,
UNIQUE (equal_sphere_radius_int , aspect_ratio_int) )
''')
def extract_data_from_dir(dirname):
'''
This function parses input.txt and test.gz
from the fotran program output into a sqlite3 friendly form
'''
pass
# identify all the available directories here
mydata = map(extract_data_from_dir, dirnames)
cur.executemany(''' INSERT INTO wave VALUES
(:valid_results, :aspect_ratio, :light_length, :equal_sphere_radius,
:minor_radius, :major_radius, :minor_diameter, :major_diameter,
:polar_angle, :c_ext, :c_sca, :c_abs, :equal_sphere_radius_int, :aspect_ratio_int)''', mydata)
conn.commit()
cur.close()
conn.close()
which allows for identification of cases with SQL SELECT
statements that simplify the searching process.