Deploying transfer matrix method calculations on a HTCondor cluster

The transfer matrix method (TMM) is a numerically exact method for calcuating the inherent optical properties (IOPs) of a particle with size comparable to the wavelength of light considered. Dr. Mishchenko released a series of articles and books detailing the method, in addition to a code base that can be readily used to calculate the IOPs of homogenous spheroid particles, among other axisymetric shapes. I am interested in computing the IOPs of aquatic particles in random and wave aligned orientations to examine the effect of wave preffered orientation on light attenuation near the surface of lakes and oceans. This project/blog post details some of the implementation details needed to produce a dataset of IOPs using a HTCondor cluster.

Mischenko’s code is written in Fortran-77 and expects to be used with Intel Fortran compilers. Testing on my workstation indicated that each particle I wished to compute the IOPs for required 4-24 hours of computation time with 8 cores and 1 GB of memory. For a single workstation testing many hunderds of cases could take months or years, however the modest computer requirements per particle case makes this workload perfect for a HTCondor cluster. The Center for High Throughput Computing (CHTC) at the University of Wisconsin-Madison provides a 300+ node cluster that can compute the IOPs in a matter of hours or days, by comparison.

The CHTC team does not provide Intel compilers by default, however you can use OCI containers on the cluster to ensure availability of the Intel fortran suite. Intel provides ready made containers with their OneAPI tool kit installed on DockerHub like docker.io/intel/oneapi-hpckit:2024.0.1-devel-rockylinux9. A custom Containerfile can be written to add additional packages, as needed.

With all the dependancies resolved, Mischenko’s code must now be modified for my particular use case. As distributed, his code uses a simple definition of input paramaters like

C  INPUT DATA
 
      AXI=10D0
      RAT=0.1 D0 
      LAM=DACOS(-1D0)*2D0
      MRR=1.5 D0
      MRI=0.02 D0 
      EPS=0.5 D0

which is fine for a single test case, but requires each case to be compiled separately. Managing hundreds or thousands of pre-compilied binaries and deploying them to HTCondor is outside of my interests, so I simply rewrite the paramater input section like

C  Use input file so that we can precompile
      INQUIRE(file="inpout.txt", exist=exists)
      IF (exists) then
         OPEN(newunit=io, file="input.txt", status="old", action="read")
         READ(io, *) AXI, RAT, LAM, MRR, MRI, EPS, ALPHA, BETA
      END IF

with some additional modification in the variable definitions. Now, the OCI container can be run and used to compile a binary that will be executed by Condor for all test cases defined by input.txt.

Systematic generation of the test cases can be simplified with the use of python. Looping through each paramater of interest, create a directory and write a input file in each directory

import os, shutil

# set default values

myvals = {
  'AXI':'1 D0',
  'RAT':'1',
  'LAM':'0.665 D0',
  'MRR':'1.05 D0',
  'MRI':'0.01 D0',
  'EPS':'2.0 D0',
  'ALPHA':'0 D0',
  'BETA':'63.4 D0'
}

# calculate desired cases here
# I only vary size, shape, and zenith angle for my problem
# omitted for brevity

for (radius, beta, eps), datadir in zip(datavals, datadirs):
  # ensure directory exists here
  # omitted for brevity

  # replace defaults with looping values
  myvals['AXI'] = f"{radius:.1f} D0"
  myvals['BETA'] = f"{beta:.2f} D0"
  myvals['EPS'] = f"{eps:.2f} D0"

  # in Fortran we read as space delimited data like
  # READ(io, *) AXI, RAT, LAM, MRR, MRI, EPS, ALPHA, BETA
  keys = ['AXI','RAT','LAM','MRR','MRI','ALPHA','BETA']
  writevals = [ f"{myvals[key]}".replace(' ','') for key in keys ]
  line = ' '.join(writevals)

  with open(f'{datadir}/input.txt','w') as file:
    file.write(line)

which ensures that each case of interest is generated in a way that HTCondor can locate and run.

Finally, we must tell HTCondor how to execute each test case. The submission file looks like

container_image = docker://quay.io/sharry1679/myhpc:latest
universe = container

initialdir = $(job_dir)
executable = run.sh

should_transfer_files = YES

transfer_input_files = main_precompile,input.txt

request_cpus = 8
request_memory = 1G
request_disk = 5G

queue job_dir from job-dirs.txt

where job-dirs.txt can be populated with a shell command like

find . -type d -exec test -e {}/input.txt -a ! -e {}/test.gz -print | tee job-dirs.txt | wc -l

because the input file must exist and the output file must not exist in any directory that needs to be processed by HTCondor. By default Mischenko’s code produces and output file called test but the storage savings of compressing with gzip or similar are substantial.

Finally, all of the output data can be scanned and organized into an sqlite database for postprocessing.

conn = sqlite3.connect("inherent_optical_properties.db")
cur = conn.cursor()

cur.execute("DROP TABLE IF EXISTS wave")
cur.execute('''
CREATE TABLE IF NOT EXISTS wave
(valid_results INTEGER, aspect_ratio REAL, light_length REAL, equal_sphere_radius REAL,
minor_radius REAL, major_radisu REAL, minor_diameter REAL, major_diameter REAL,
polar_angle REAL, c_ext REAL, c_sca REAL, c_abs REAL,
equal_sphere_radius_int INTEGER, aspect_ratio_int INTEGER,
UNIQUE (equal_sphere_radius_int , aspect_ratio_int) )
''')

def extract_data_from_dir(dirname):
  '''
  This function parses input.txt and test.gz
  from the fotran program output into a sqlite3 friendly form
  '''
  pass

# identify all the available directories here

mydata = map(extract_data_from_dir, dirnames)

cur.executemany(''' INSERT INTO wave VALUES
(:valid_results, :aspect_ratio, :light_length, :equal_sphere_radius,
:minor_radius, :major_radius, :minor_diameter, :major_diameter,
:polar_angle, :c_ext, :c_sca, :c_abs, :equal_sphere_radius_int, :aspect_ratio_int)''', mydata)

conn.commit()
cur.close()
conn.close()

which allows for identification of cases with SQL SELECT statements that simplify the searching process.