I work on a variety of projects in different areas at Stanford GSB. In fact, the wide diversity of what I get to do is a key reason I work in RSS at GSB.
- I am leading a team effort to rebuild “CloudForest”, our team’s tool for helping faculty provision and use computers in AWS EC2 for research purposes. To build this tool we have written (a) a node.js server that can manage users, groups, requests for computers, and receive/provide information about computer status and (b) a react/redux “dashboard” that allows users to interact with their account. Our server interacts with both AWS and MongoDB. This is a revision of a pre-built command-line tool that didn’t meet functional expectations and didn’t have the user experience we desired.
- I designed, wrote, and maintain a “sponsored access” form for non-GSB students, staff, or faculty to request access to our on-premise computing servers. A request form can be filled out by anyone, but only approved by the sponsoring faculty, and logs all requests and actions in MongoDB. Notifications of requests and approvals (with reminders) are handled via automated emails.
- For our online experiments (described below) I have composed dashboards to allow research assistants to design and launch experiments with minimal intervention from us. These dashboards include definition of “treatments” for different experimental conditions and post-run information about completion and reward rates.
- A web-based negotiation experiment called the “ABC” experiment (for Nir Halevey). Multiple participants in a lab take turns choosing among (a subset of) three options: A (add value), C (capture value), or B (both) to affect the number of “points” in a communal pot. Participants are rewarded for participation in proportion to their final point total. The purpose of the experiment is to study cooperation and competition in a variety of conditions; e.g., when the participants can use an embedded chat application during the experiment vs not, or when the final pot is divided equally at the end versus decided by a single participant. We have run this experiment with hundreds of participants at Stanford GSB. Being an ongoing experiment, I can’t just host a demo online; contact me if you want to learn more.
- Several variations of an online shopping experiment to investigate if, and how, framing effects influence “add to cart” behaviors and final sale conversion (for Itamar Simonson). Online participants see an online shopping like experience and can choose to add items to a cart (or not) that they select from at the end of the experiment. We condition on treatments like the text of the “add” or “reject” buttons, framings injected into the experiment instructions, number of items they can view, and more. We have run 3 phases of this type of experiment requiring different front-end designs for a few thousand respondents through online platforms; another phase in in preparation. Being an ongoing experiment, I can’t just host a demo online; contact me if you want to learn more.
- I have also written software to support dynamic question generation in online surveys (code available on github). The client team needed to randomly sample questions from a “database” (Google sheet) conditional on the number of times a question had been shown to respondents already. To get around technical issues with the Google APIs, I wrote a multi-threaded server that could (a) load the dataset in each thread, (b) serve questions from the data randomly in any thread, and (c) coordinates sampling history across threads. In principle, this approach and code could be extended to any sampling strategy that depends on a “global state” (i.e., Markov sampling).
- I am also currently consulting on another, internal experiment regarding perceptions of “success” from online resumes. This (private) experiment will involve multiple phases of work and analysis, with related but distinct experimental front-ends; we may also be utilizing adaptive question generation methods.
RSS also gets questions about how to run statistics on data. I’m regularly asked to provide consulting feedback on statistics consulting questions that come to our team. So far I’ve worked on one “major” statistics-related project:
- I developed a method for including person-specific effects (which I preferred to call “idiosyncratic deviations”) for analysis of pairwise comparison data from a wikisurvey (for Glenn Carroll). My draft writeup is available online here. The approach I took was to model discrete-choice like pairwise comparison responses with a standard Logit model, including person-specific effects that are “shrunk” using a LASSO-like L1 penalty. The difference from methods like GLMNET is that this penalty does not apply to all model parameters. Thus, the approach attempts to find the fewest person-specific effects that rationalize the data. Using CVXPY and the ECOS solver, the estimation problems (with hundreds of respondents and thousands of observations) can be solved quickly enough (a minute or so) to use bootstrapping to compute confidence intervals, using only tens of lines of python code.
High Performance Computing
My academic expertise in high-performance computing is very strong. I am an experienced C/C++/FORTRAN programmer that understands low-level tools including the BLAS/LAPACK, the Intel IPP and MKL, compiler optimization, and MPI for parallelization. This has helped in the following projects:
- I have helped advise faculty on how to use MPI and Pthreads for complicated, but nearly embarrassingly parallel, computing tasks. This includes prototyping C/C++ code changes for them, providing examples (such as these examples about using with the GSL optimizers), and providing generic utilities like pthreader, a simple but powerful wrapper class for multithreaded evaluation in complicated simulations. I have also helped advise on whether GPUs will help faculty solve hard problems (though few are interested in engaging with OpenACC or CUDA programming), and help secure GPU resources for exploratory computing.
- I redesigned, rewrote, compiled, and ran FORTRAN code for solving a stochastic dynamic program related to a corporate finance problem (for Ivan Marinovic). Ultimately I ran the desired set of simulations in 1 1/2 days instead of over 30 days based on my improvements without code-internal parallelization. (The full-cycle project took only 22 days.)
- I consulted on scale-up for a method-of-moments (GMM) estimation strategy for an econometric problem (for Rebecca Diamond). They were interested in investigating parallelization strategies for a matlab code to enable them to include an order of magnitude more data. In less than a single day of consulting, I suggested they re-organize their computations in specific ways that ultimately made their desired scale feasible without any parallelization or environment changes.
- I have re-written matlab code in C (using the Intel IPP and MKL) to improve runtime performance as well as otherwise advised faculty on parallelization and acceleration strategy for their econometric structural modeling work, and regularly advise students or other research assistants on how to speed up their code or run it effectively in our infrastructure. I regularly provide programming advice that takes runtime expectations in the weeks down to days or even hours.
- Many researchers use Stanford infrastructure without thinking about the default settings for their environments, and what that means for compute efficiency in a multi-tenant environment. I have prepared examples regarding threading defaults, in particular, to help inform HPC users of the limits of the machines they use, and hopefully achieve their goals faster.
I lead our efforts to monitor shared use of several hundred thousand dollars worth of on-premise computers.
- We recommended (and installed) telegraf and the TICK stack to collect detailed metrics regarding system performance. (Stanford uses Ganglia, but that system makes data access, query, and analytics much more opaque.) The TICK stack also allows us to collect, visualize, and analyze traditional metrics across platforms; particularly, in both AWS and on premise. The ability to quickly query past data through a CLI, HTTP API, and a web dashboard is particularly useful.
- I also spearheaded user-centered data collection on our on-premise computers over a year ago. Traditional monitoring systems do not collect and aggregate user processes and commands in multi-tenant systems, which we require to make capital decisions and to enforce “soft quotas” in an interactive environment. The availability of this data has informed roughly $70k in new purchases within its first year of collection, and will soon be used to present users with a portrait of “their load” on the systems through a web-based dashboard.