software engineering

nnbench - PyTest for Production ML Models

I’m a core contributor to nnbench, an open source ML benchmarking framework with PyTest-like API. nnbench defines and parametrizes test cases, manages evaluation contexts, provides adapters to data artifacts, and streams results to various data sinks.

GitHub

lakeFS-spec - The lakeFS Filesystem in Python

lakeFS-spec is a Python interface for lakeFS data lake. It makes remote lakeFS objects available in local filesystem and in popular libraries (e.g. Pandas, Polars, DuckDB) – no custom code required. The libary includes caching and transaction context for failsafety. It was recognized in lakeFS Best of 2023. I’m a main contributor, presented it at PyData and co-authored a post on the lakeFS blog. You find links to both in the "outreach" section.

GitHub


academic

Pre-Study and Proposal for a PhD in Social Data Science at Oxford (offer)

I conducted a computational analysis of political conversations on reddit in order to quantify the temporal dynamics of polarization using traditional and deep learning natural language processing methods. With the proposal and other application documents I received and offer to join Oxford in fall of `22. I declined the offer because of insufficient funding to cover high overseas fees. I am nonetheless proud of the research I did for the proposal alone.

proposal.pdf

Physics Master Thesis

My master thesis is the result of a one-year graduate research project. I evaluated and extended an information theoretic measure, transfer entropy, to measure information flows ("causality") between time series. I used the method to measure the COVID-19 shock response of financial markets and placed the research with a talk at PSRC and a poster on the MECO46 conferences.

master_thesis.pdf

Physics Bachelor Thesis

I wrote my physics bachelor thesis on a performance study of silicon microstrip detectors at CERN's COMPASS experiment. The study identified problems in two of four detectors. After handing in my thesis I was involved with the project for a while to compare and choose from different replacement alternatives.

CERN Theses directory

Social Sciences Bachelor Thesis

For my social sciences bachelor thesis, i studied the stigma consciousness of the unemployed in Germany using a quantitative multi-year panel study. My analysis found that stigma consciousness is highest among those who lost high status jobs.



outreach

Machine Learning Skill Profiles - An organizational Blueprint for scaling Enterprise Machine Learning.

I am the main author of this whitepaper that is the result of a qualitative study and one-year working group with large corporations on how they organise they ML and MLOps efforts. We identified the different organizational structures and role descriptions that occur in companies successfully using ML.

skill-profiles.pdf

AI Tinkerers Talk: How small can Large Language Models be?

I presented a side-project of mine to an audience of about 100 people at AI Tinkerers Munich. In my project I experimented how small a transformer model can be while still producing (somewhat) coherent output. I used a paper "TinyStories" as a starting point and found that - with keeping the vocabulary small - a 33 million parameter model is able to hit the idiosyncratic language of certain memes. You can find the models on my Huggingface.

Huggingface

PyData Talk on lakeFS-spec

I presented the lakeFS-spec, a Python package I co-authored which exposes the lakeFS datalake to the local filesystem with a talk at PyData Heilbronn.

Talk on YouTube

lakeFS-spec Blog Post

I co-authored a technical post on the benefits and inner workings of lakeFS-spec that is published on the lakeFS blog.

Blog Post