For better or worse, most of the software I write I can’t make publicly available. But some I can, or at least do and hope for forgiveness later. If I can share a repo here below but it needs to be restricted (say to Stanford employees only), I’ll note that after the name. Some things I can only make available to my current team as of now, but I’ll work to change that. If you want some specific examples, I can probably provide something by request.
Coordinator: A DAG-like asynchronous operation processor for node.js for when you need sequences of operations to either all succeed or fail if any fails. Easily add operational “stages”, with or without prerequisites, rollback rules, and result transformations. Allows repeated executions, as many times as needed. Published as daat-coordinator on npm, where “DAAT” stands for “Directed Acyclic Asynchronous Task”.
Contact me if you want to chat about this.
Currently in use at Stanford GSB by at least 5 research groups to enable self-service, efficient, and secure cloud computing for research purposes. Our installation has managed tens of thousands of dollars worth of computing resources in 7-8 months of beta program use.
cfserver (Team only): a node.js server that interacts with mongoDB and AWS, providing the basic infrastructure for on-demand cloud computing.
cfmetrics (Team only): a node.js server that provides and interface to the TICK stack for instance monitoring.
cfdashbd (Team only): a react/redux app providing dashboard functionality for cloudforest including instance management (creation, starting/stopping, deletion), activity monitoring (CPU/memory usage), group management (viewing, adding, removing users), access to jupyter notebooks, and more:
cfsetup (Team only): a structured repository of bash scripts defining how to set instances up after they launch. Uses bitbucket pipelines and AWS CodeDeploy to consistently deploy updates to instances.
cfsite (Team only): repo for a jekyll site, with build and deploy pipelines, containing basic information about cloudforest, complete documentation, and articles about use of the platform.
idlogit (in progress): A python package for estimating (binary outcome) “Idiosyncratic Deviations Logit” models using ECOS. idLogit models are Logit models for heterogeneous observations with a non-parametric portrait of response heterogeneity and a convex maximum likelihood estimation problem. You can review the slides for an academic talk at Stanford’s ICME about this method here.
A more useful example comes from our monitoring systems. We have a Slack app built on top of a Lambda that allows us to access summary metrics data about our machines; for example,
/yen 3 publish
for the entire channel to see and
/yen 3 users publish
for the entire channel to see. Actually, this user-specific data comes from a Lambda pipeline too: code on the actual servers sends process data to S3, and a python Lambda executes a rollup operation on the data to get and store user statistics. I’ve also helped teams set up complicated serverless applications, including Machine Learning model evaluations in python (with both Lambda using Layers and Google Cloud Functions).
Multilevel Optimization with Integrals (packaging for release): I have helped a GSB professor with code for solving a certain multi-level optimization problem (an optimization that depends on another optimization) whose objectives also are only approximately determinable, because they involve integrals. I wrote quite a bit of python code for exploration and solution of this subtly difficult numerical task, which I’ll try to package for some kind of demo or release during summer 2019.
superserver: A bash script (basically) that can help load balance horizontally-scaled arbitrary servers (like from python, node.js, or go code). You specify the source, start/stop actions, and any install commands, superserver sets up all the (restarting) systemd services for you to run an arbitrary number of instances of your server load balanced by your Apache web server.
stanshib (Stanford only): A set of scripts and templates for making Stanford Shibboleth setup easy on new machines/addresses.
pthreader: A lightweight C++ class for executing arbitrary code in parallel using pthreads, requiring only a definition of (a) setup, (b) evaluation, and (c) cleanup for each thread. Setup and cleanup once, but evaluate as many times as needed. Extends the condition variable mutual exclusion method from Divakar Viswanath’s book.
gslregressmpi: A simple example in C I wrote for a GSB professor interested in using MPI on our clusters to parallel function/derivative calls when solving an optimization problem. This example used OLS regression because it is trivial to implement other ways and verify results. Their real problem was much more complex, of course. This example also uses the GNU Scientific Libraries because this was the faculty’s preferred solver. The basic outline should work with any similar solver.
Code Optimization: I do a decent amount of low-level code optimization when people need it, but most of these cases are particular to a particular researcher and project and not shareable. In one particularly tangible case, I re-wrote some FORTRAN code multithreaded using OpenMP to simpler, optimized serial code which took a 35-day multi-task runtime down to 1 1/2 days. In another, I optimized and rewrote matlab scripts both in matlab and in C using the Intel IPP and MKL, improving speed by a factor of 5. This is a pretty small gain, actually, and is small due to the density of linear algebraic operations in the original matlab code (operations which matlab inherently does well already). Another case involved data extraction from ~ 50GB of detailed XML datafiles from a financial firm; using the Xerces library in C++ was able to tackle the task in an afternoon, whereas trying the same task with python (for fun) crashed a 32-core, 256GB memory AWS EC2 instance.