For better or worse, most of the software I write I can’t make publicly available. If I can share a repo here below but it needs to be restricted (say to Stanford employees only), I’ll note that after the name. Some things I can only make available to my current team as of now, but I’ll work to change that. If you want some specific examples, I can probably provide something by request.
Coordinator: A DAG-like asynchronous operation processor for node.js for when you need sequences of operations to either all succeed or fail if any fails. Easily add operational “stages”, with or without prerequisites, rollback rules, and result transformations. Allows repeated executions, as many times as needed, even with new “data” (but not a new execution graph). Published as daat-coordinator on npm, where “DAAT” stands for “Directed Acyclic Asynchronous Task”. Current code is callback based; I’ll make a promise-based version soon I promise.
Contact me if you want to chat about this.
Currently in use at Stanford GSB to enable self-service, efficient, and secure cloud computing for research purposes. Our installation has managed about $70k worth of computing resources in a year of platform use.
cfserver (Team only): a node.js server that interacts with mongoDB and AWS, providing the basic infrastructure for on-demand cloud computing.
cfmetrics (Team only): a node.js server that provides and interface to the TICK stack for instance monitoring.
cfdashbd (Team only): a react/redux app providing dashboard functionality for cloudforest including instance management (creation, starting/stopping, deletion), activity monitoring (CPU/memory usage), group management (viewing, adding, removing users), access to jupyter notebooks, and more:
cfsetup (Team only): a structured repository of bash scripts defining how to set instances up after they launch. Uses bitbucket pipelines and AWS CodeDeploy to consistently deploy updates to instances.
cfsite (Team only): repo for a jekyll site, with build and deploy pipelines, containing basic information about cloudforest, complete documentation, and articles about use of the platform.
Oh boy there is alot of python. But most of it is specific to certain research projects or other endeavors, and probably not worth sharing outside of specific circumstances. See, for example, my code used for assisting with Glenn Carroll’s “Authentic Distilleries” studies (an implementation of the ideas discussed next).
idlogit (in progress): A python package for estimating (binary outcome) “Idiosyncratic Deviations Logit” models using ECOS. idLogit models are Logit models for heterogeneous observations with a non-parametric portrait of response heterogeneity and a convex maximum likelihood estimation problem. You can review the slides for an academic talk at Stanford’s ICME about this method here.
comparing discrete choice models: A related application I can release is a simple(-ish) notebook-python example for, well, comparing discrete choice models. The main notebook itself probably speaks for itself, but the main idea is to discuss alternative “Logit-like” discrete choice models.
blendnpik: Allright, this one is tiny. But it’s still an example. A while ago I got pretty interested in learning more about randomized methods for solving linear systems. I’ve let this thread of interest die off, and should probably pick it up again. In any case, I wrote a simple, single-file implementation of the blendnpik method in python for learning purposes.
Serverless Codes: I’ve written a number of serverless functions using python. Some are for fun; for example, we have a few Slack apps that call python Lambdas to transform or manipulate text, such as converting a string to binary:
A more useful example comes from our monitoring systems. We have a Slack app built on top of a Lambda that allows us to access summary metrics data about our machines; for example,
/yen 3 publish
for the entire channel to see and
/yen 3 users publish
for the entire channel to see. Actually, this user-specific data comes from a Lambda pipeline too: code on the actual servers sends process data to S3, and a python Lambda executes a rollup operation on the data to get and store user statistics.
I’ve also helped teams set up complicated serverless applications, including Machine Learning model evaluations in python (with both Lambda using Layers and Google Cloud Functions), code for which I can discuss but can’t share.
Multilevel Optimization with Integrals (packaging for release): I have helped a GSB professor with code for solving a certain multi-level optimization problem — an optimization whose objective/cost function depends on another optimization — with approximate evaluations (of integrals). I wrote quite a bit of python code for exploration and solution of this subtly difficult numerical task, which I’ll try to package for some kind of demo or release during 2019.
Like python, there’s alot of (mostly) case-specific bash programming in my life. I’ve written a variety of simple “wrapper” utilities to make certain low-level functionality more accessible to users of GSB research computing systems, such as running (embarassingly parallel) jobs in parallel, or to watch (otherwise unconstrained) resource consumption on our interactive machines.
superserver: A bash script (basically) that can help load balance horizontally-scaled arbitrary servers (like from python, node.js, or go code). You specify the source, start/stop actions, and any install commands, superserver sets up all the (restarting) systemd services for you to run an arbitrary number of instances of your server load balanced by your Apache web server.
stanshib (Stanford only): A set of scripts and templates for making Stanford Shibboleth setup easy on new machines/addresses.
pthreader: A lightweight C++ class for executing arbitrary code in parallel using pthreads, requiring only a definition of (a) setup, (b) evaluation, and (c) cleanup for each thread. Setup and cleanup once, but evaluate as many times as needed. Extends the condition variable mutual exclusion method from Divakar Viswanath’s book.
gslregressmpi: A simple example in C I wrote for a GSB professor interested in using MPI on our clusters to parallel function/derivative calls when solving an optimization problem. This example used OLS regression because it is trivial to implement other ways and verify results. Their real problem was much more complex, of course. This example also uses the GNU Scientific Libraries because this was the faculty’s preferred solver. The basic outline should work with any similar solver.
Price Equilibria: All the code for the work described in this paper was done in C. I should dig it up and post it here.
Code Optimization: I do a decent amount of low-level code optimization when people need it, but most of these cases are particular to a particular researcher and project and not shareable. In one particularly tangible case, I re-wrote some FORTRAN code multithreaded using OpenMP to simpler, optimized serial code which took a 35-day multi-task runtime down to 1 1/2 days. In another, I optimized and rewrote matlab scripts both in matlab and in C using the Intel IPP and MKL, improving speed by a factor of 5. This is a pretty small gain, actually, and is small due to the density of linear algebraic operations in the original matlab code (operations which matlab inherently does well already). Another case involved data extraction from ~ 50GB of detailed XML datafiles from a financial firm; using the Xerces library in C++ was able to tackle the task in an afternoon, whereas (naively) trying the same task with python (for fun) crashed a 32-core, 256GB memory AWS EC2 instance.
I used to do quite a bit of MATLAB programming. Sometimes I still help people using matlab optimize their code, but mostly I don’t use it anymore. Here are some examples though.
Smart Parallelism: I used MATLAB for this simple case study of how to (not) run things in parallel on our shared research computing servers (or your own, for that matter). We have a persistent issue of users parallelizing work that is already implicitly parallelized by their software, and thus creating counterproductive amounts of CPU contention. MATLAB was a good environment for this sort of case study because (1) it’s a CPU hog by default, (2) changing how many cores it tries to use is easy, (3) “expensive” operations like solving linear systems can be trivially written in the syntax, and (4) I don’t know other, more GSB-popular platforms with the same problem (e.g., Stata).
Price Equilibria: All the code for the work described in this paper was done in MATLAB. I should dig it up and post it here.