PROFILE

Former mathematician and senior data architect / software engineer with over a decade of experience in helping companies to build production systems for complex data science, machine learning and research initiatives. Extensive experience with a wide range of big data technologies, including low latency event driven microservices architectures, and graph analytics.  Special focus on deploying distributed systems to support higher level machine learning such as deep reinforcement learning.  Experience in technology team leadership, entrepreneurship and business operations.

FEATURED SKILLS

Languages

Scala, Python, Java, Go, Ruby, C++, Elixir, Javascript.  Interests in Haskell and Rust

Distributed Computing & Data Science

Scio, Beam, Cloud Dataflow, Flink, Apache Spark, spark mllib,  kubeflow, Dask, Ray, Airflow, GraphX,, Reinforcement Learning,, Distributed Graph theory, Collective Intelligence Theory, Collaborative Filtering and Recommendation Systems, Hierarchical Clustering, Information Theory, Applied Persistent Homology, Hadoop, Presto, Athena, Glue, EMR

Streaming Technologies

scio streaming, spark-streaming, druid, flink, akka, fs2, lambda architectures

Database Technologies

Hive, BigQuery, Dremio, Redshift, MySQL, PostgresSQL, Cassandra, Redis, MongoDB, OrientDB, InnoDB, Dgraph, Neo4j, Elasticsearch, many more...

Devops/SRE

Datadog, Prometheus, Docker, Kubernetes, Mesos, AWS (EMR, S3, Cloudfront, EC2, Elastic Load Balancer, RDS, Redshift, Route 53), Terraform, Vault, Fastly, puppet, chef, SSH, Varnish, IDCF, Jenkins, Airbrake, Honeybadger, Sentry (Raven), Circle CI, Datadog, New Relic, Codeship, chef, Jenkins

Other Backend Technologies

Functional programming in scala: cats effect, circe, shapeless, frameless, doobie. GIT, AWS, IAM Permissions, JSON Schema and API schema validation, Error Monitoring, nginx, heroku, autoscaling.  Serialization technologies including Apache Avro, Parquet, Kryo, message pack, 2FA, Oauth 2.0, SSO, Devise, Omniauth, SAML

EDUCATION

M. A. Mathematics

    May 2011

Indiana University, Bloomington, IN                                                                           

Research Project

Conducted original research  on the flocking behavior of biological and behavioral systems.  Used network analysis to provide new measures of collective motion.  Produced simulations and gave a talk on results which resulted in receiving an A+ in mathematical biology. Resulted in paper Towards a more generalized theory of flocking behavior and its application to tumorigenesis.  Also studied alternative models to option pricing and applied probability theory.

B. A. Mathematics, Phi Beta Kappa, Magna Cum Laude, GPA 3.8 / 4.0

               May 2009

Wabash College, Crawfordsville, IN                                                                          

Minors:  Physics, Music

FEATURED EXPERIENCE

Senior Data Engineer                                                                  2021- present

Spotify, Stockholm, Sweden

Spearheaded a number of projects for Spotify including:

Senior Big Data Architect                                                                          Jan 2020 - 2021

Samtec Smart Platform Group, New Albany, IN

Worked architecting big data platforms for SPG's products, from start to finish. Including:

- building open source federated data pipeline tool "mason", which provides curated/abstracted data pipeline components with optimized data engineering.  Mason establishes a seamless process of configuring them to run on various big data back ends. Uses scala, and python with mypy and some functional type machinery to ensure correctness.

- using terraform and kops to provide push button deployment of kubernetes clusters without depending on EKS or other services. Including load balancing and autoscaling groups.

- deploying networks of distributed and adaptively scaling services on kubernetes using helm and developing helm charts towards that end. Includes: spark, dask, dask-gateway, jupyterhub, presto, livy

Senior Data Engineer                                                                       Dec 2018 - Jan 2020

Formation.ai, San Francisco, CA

Formation.ai is a bleeding edge machine learning enhanced platform that allows companies to infuse their marketing campaigns and consumer loyalty programs with hyper-personalization, by using contextual bandit theory to deliver customer specific games that learn the optimal way to interact with them over time.  Formation’s clients include several fortune 500 companies.

Major Projects

Pilot Data Governance Platform

Led the data engineering initiatives for formation’s new pilot program which was built around the concept of data lineage and data governance.  Pioneered a number of new concepts such as churn based programming and client specific undergeneralization which turned time for implementation from several months to less than 30 days. New data governance platform also has better transparency, maintainability and debugging capability, and leverages strategically placed services enhanced product definitions.  Worked with a team of 3-4 engineers and architected the system from end to end and saw through its completion. Work leveraged multiple distributed computing platforms including kubernetes, gRPC, apache spark, Apache Arrow and Dremio as well as functional programming in scala.

Senior Software Engineer                                                                Jan 2018 - Dec 2018

DemandJump, Indianapolis, IN

DemandJump is a cross channel customer acquisition platform focused on delivering competitive insights and predictive intelligence for the online marketing space.  I worked to translate data science to production features using Apache Spark, GraphX, Hadoop and Kubernetes.  Included pioneering work in developing a generalized Hybrid Transactional Analytic Processing graph data pipeline for cross channel analytics which allowed graph insights to be translated into distributed linear algebra computations for higher level math.

Major Projects

Email Recommendations

Built a data pipeline that expanded large Google adwords data into RDF format for importing into Dgraph.  Leveraging Dgraph’s ability to do quick graph traversal I then generated email recommendations based on collaborative filtering and other higher level concepts.  Resulted in significantly higher conversion rates over nearby competitor’s email recommendation systems.  Deployed the system using docker containers, kubernetes and spark cluster deployments to provide the system in a fault tolerant and horizontally scalable way.

Search Clustering

Built a data pipeline that expanded large Google adwords datasets into a graph format using Spark and GraphX and built a novel distributed hierarchical K-means clustering algorithm to establish hierarchies that represented different levels of commonality and categorizations in search traffic.  Leveraged several concepts from information theory, NLP, and distributed computing.  Also deployed the system using docker containers, kubernetes and spark clusters.

Network Effect

Built a data pipeline that analyzed the relative network effect of certain domains to other domains within a competitive ecosystem by leveraging Spark and GraphX to model the network and compute different types of self similarity such as Cossine Similarity and Jaccard Distance which utilized linear algebra optimizations and distributed computing concepts.  Then built a means for those results to be visualized leveraging elasticsearch. Deployed the system using docker containers, kubernetes and spark clusters.

Senior Software Engineer                                                        November 2016 - December 2018

Deepcrawl, London, UK

DeepCrawl is an enterprise scale web crawling technology. It acts like Google, goes through your site, tells you what is wrong (like the dreaded 404 pages), but also so much more.  I worked to consolidate their Ruby and Sinatra data infrastructure into a horizontally scalable infrastructure using Apache Spark Streaming, Mesos, Kafka and Cassandra. Work involved extensive use of functional programming in Scala, deconstructing complex Ruby code, and devops work with cluster orchestrators and containerization to build a fault tolerant cloud native infrastructure.

Major Projects

Streaming Data Infrastructure

I assessed Deepcrawl’s current data pipeline and determined that their current bottlenecks in processing could be solved by adopting a lambda style architecture leveraging Spark Streaming and Kafka to process fast data and Spark batch with Cassandra to finish the processing and store it in a denormalized fashion for rapid query.  This organized their current processing into a workflow of synchronous and asynchronous data pipelines, and built a fault tolerant horizontally scalable distributed processing framework around it. This involved using Mesos as the cluster orchestrator for Spark through DC/OS (now D2IQ).

Senior Software Engineer

      July 2015 - November 2016

Treasure Data, Mountain View, California 

Treasure Data is the world’s premier hosted big data analysis software as a service platform. Based out of Tokyo, Japan and Mountain View California Treasure Data is a small but global company making waves in the world of big data by embracing compelling open source technologies and close community partnerships and integrations.  I worked interfacing Treasure Data’s Java backend with their rails middleware using fluentd

Lead Software Engineer

June 2013 - June 2015

Localstake, Indianapolis, IN 

Localstake is an online investment platform focused on connecting a network of local companies to investors that are passionate about local business. With the help of a small engineering team which I managed, the Localstake software platform has been able to raise over 4.6 million dollars for businesses around the United States. I oversaw and implemented technology platform from the architecture and design of stages all the way to the front end user experience and even in managing the marketing and conversion rates of users. The Localstake platform has been a success largely my personal dedication to building generalized software that can satisfy fluid business requirements, by following a lean development model, and by utilizing a solid test driven development process.

 

Major Projects:

Compliance Engine

The compliance engine is a highly generalized customizable configuration for the Localstake platform which allows administrative users to specify a legal framework for an offering on the platform and then allow the framework to specify the manner which the offering is displayed and interacts with investors on the platform as to comply with securities laws and regulations.

Messaging System

Localstake has a custom built one to one messaging system which integrates with Sendgrid's inbound parse API to allow replies to emails to create message objects in the Localstake platform. This allows Localstake to then perform analytics and automated actions based upon the messages sent between potential investors and businesses.

Payment Processing Systems and Transactions

In the highly tumultuous and cutting edge landscape of payment processors which support crowdfunding laws Localstake had need of a payment processing system and way to manage transactions on the platform which did not heavily depend upon the particular API which was being leveraged for those transactions. In response to this I built a transactional system which is agnostic to the processor that is being used and allows for the easy replacement of payment processors if needed.

Automated Email Feedback System

While some third party vendors offer similar services Localstake had need of a custom email engagement system which tracks the invitation of potential investors to a particular offering. This system, which utilized the sendgrid response api, allowed Localstake administrators to build custom email templates which would then move investors along different stages of the conversion process automatically via configuration

Software Developer/Data Analyst

Fall 2011 - June 2013

iGoDigital, Castleton, Indiana

 

 

HONORS AND INVOLVEMENT

Recipient of the distinguished Mackintosh Fellowship for further education and outstanding academic achievement-'09

J. Crawford Polley Mathematical Writing Prize- '08

Phi Beta Kappa Prize for original and creative achievement in Junior piano recital - '08

President: Wamidan World Music Ensemble. Conducted field research in Uganda during the Summer of 2008 to provide better instruction in performance for future members and study the mathematical/linguistic structure of folk music.

Secretary: Society of Physics students, gave presentations on Quantum Computing and presented poster The Circular Motion of an Electron Beam in Real Helmholtz Coils, at the Wabash College Celebration of Student Research

Member:  Verge Indianapolis, a group dedicated Indianapolis technology startup companies

Member: Tau Kappa Epsilon fraternity, served on documents committee.

Eagle Scout:  Boy Scouts of America