Tyler Burns's Website

For Tyler's company website, go here.
For the legacy version of the company website, go here.

About me
New
Popular
Single-cell analysis
Natural language processing
Philosophy and rationality
Computing
Statistics
Health
Snapshots
Collections
Where I was featured
Fun stuff
Other contributions
Links and contact info

About me

I am an American CEO and computational biologist who lives in Berlin, Germany. I am best known for my 12 years of work in the single-cell field, and more recently, my work in biosecurity. I am informally known for my passion for fitness, which gives me the balance and energy to do the rest of the work.

I run a company called Burns Life Sciences Consulting, GmbH, which allows me to impact many more individuals and organizations than I otherwise could as an employee. The problem is most of my work is behind NDAs so I can't talk about it. There is another large amount of my work that is unpublished, simply because running a company has a lot of demands that don't give me the time.

This website is a place where I can openly share my ideas and non-NDA work, regardless of how "finished" it is. I have uploaded written work, code, posters and powerpoint slides of various projects I have driven or contributed to. I intend for a lot of my work here to be "long content" slowly evolving over years if not decades. What matters most is that my thoughts and the work I do that I find meaningful is uploaded here, so when I die (hopefully way down the line) others can easily build off of any worthwhile groundwork I have laid out.

If you have quesitons or comments, or you're interested in working or collaborating with me, just send me a message (see contact info below). I'm always happy to help.

New

We choose to think about stuff

Article.

A letter to my future self, proclaiming that despite advances in AI and a societal push to value agency over intelligence, such discussions are devoid of the topic of wisdom. That operating in the absence of AI trains the "relevance filter" that in turn culminates in wisdom. Thus, an interdependence with AI rather than a dependence on AI is what's called for. That agency requires wisdom. And no matter how good and fast the machines get, I will still pursue wisdom, and thus I will still think about stuff.

10,000 random research questions generated by a LLM

This project was inspired by the LLM-generated random numbers work, which showed that LLM's have "favorite" numbers. Here, we generate research questions and determine that at least within the field of the microbiome, Gemini 2.5 Flash Lite has favorite questions.

Jupyter Notebook as html

This is the notebook that gives you all the code necessary to make these question maps.

Example output: microbiome

This is a html file that is a map of microbiome questions, where questions that are similar to each other in context are grouped near to each other on the map.

LLM-augmented PCA loadings

This project was driven by my intern Arianna Aalami (think of her as the first author and me as the last author), a Stanford bioengineering graduate, with both wet-lab and dry-lab capabilities, interested in applying these to study the microbiome.

R Markdown

Here is the code, comments, and explanations needed to generated the reports below. In order to repeat this yourself, you need an OpenAI API key. We note that the cost to run all of these experiments in the markdown, using o4-mini, is around 30 cents.

PC loading by interpretation report

Here is the primary output from the aforementioned R Markdown, whereby we have the left column that shows a biaxial plot of PC1 and PCN where N is any number between 2 and 10. This already is something that is not often viewed in single-cell sequencing workflows. The right column shows the LLM-generated biological interpretation. The first few PCs are sanity checks in this regard, and the later PCs, while not yet rigorously checked, are thought provoking.

LLM consistency experiment

Here, we aimed to check whether the output of our LLM-based biological interpretation is consistent. We tried PC1 (which is straightforward), and PC20 (which we expect to be a bit messier). We note that with the o4-mini model the results were indeed relatively consistent between the multiple runs, with the 4o-mini model (does not do reasoning), the results for PC20 were quite inconsistent (data not shown). The results suggest a degree of consistency when using reasoning models, but we would still recommend doing these runs more than once, just in case.

Levels of analysis in bits and atoms

Article.

In the world of atoms, we have distinct fields of science that follow specific levels of analysis. Math leads to physics, then chemistry, biology, psychology, and sociology. But right now computer science is one single thing. But perhaps down the line, computer science will divide into distinct fields depending on levels of analysis too. From information theory, upwards to programming languages, algorithms, AI/ML, agentic AI psychology, and bot net sociology. This has implications for what it means when you tell a biology they should "learn how to code," with further implications in recruiting, problem solving, and education.

Cluster membership purity score for single cells

Markdown.

When clustering single-cell or spatial data, there is a built-in assumption that the clusters are final. That a CD4 T cell at a CD4/CD8 boundary is nonetheless a CD4 T cell. This may not be the case every time. Here, I developed a simple way to identify and interrogate these boundary regions. I do this by taking each cell's k-nearest neighbors (KNN) and calculating the Shannon entropy of each neighborhood. Zero entropy means you're in the thicket of a cluster. Higher entropy means you are at a boundary region.

At the per-cell level, you can interrogate these cells or factor this score into downstream analyses (e.g. ML algorithms). At the per-cluster level, you can calcualte a purity score based on the mean KNN entropy for each cell in the given cluster. This has applications like determining whether clusters should be merged.

Interview with Ramji Srinivasan, CEO of Teiko

Video.

This interview took place at AACR 2025 in Chicago. The interview was spur of the moment. We discussed a variety of topics related to flow/mass cytometry and single-cell analysis.

To give credit where credit is due, I talk about the MEM paper (Diggins et al, 2018). In the interview, I say that this paper algorithmically gives you a way to determine a gating strategy given a cluster. I'll correct that here and say that the MEM paper is able to convert clusters into descriptions of what markers are positive and negative. Later work, like hypergate (Becht et al, 2019), and gatefinder (Aghaeepour et al, 2018) do the algorithmic gating strategy maker step.

Cluster stability as an evaluation metric for single-cell workflows

Markdown.

Clustering is a critical piece of any single-cell analysis workflow, be it suspension or spatial, transcript or protein. But after clusters have been computed, how sure is any given user that the clusters most accurrately represent cell subset partitions? Are there any element of arbitrariness of a given clustering scheme? One way to get at that question is to run your clustering algorithm multiple times across multiple random seeds, and look at how similar any one scheme is to the others by stringing the cluster visualizations (in this case, the cluster centroids) together into an animation. Here, we do this with the PBMC 3k dataset and standard Louvain clustering. We show that some subsets are less stable than others, and stability will differ based on what resolution you choose for your clustering scheme.

How I made a command line chatbot

Article.

The typical way LLMs are called is through some sort of user interface. Here, I show you how to use the OpenRouter API to call various LLMs (eg. Claude, Deepseek R1, GPT-4o) by simply typing "chatbot" in the command line, followed by the desired model, followed by the prompt. It can be called from anywhere on your computer. I then show how I use this in a literate programming environment, where I often like to get and store real time feedback on whatever I'm writing about. This includes a section for Emacs users for how I got this working in Org-Mode.

Automated LLM-based annotation of single-cell sequencing data, directly in a R workflow

Markdown.

A typical single-cell sequencing pipeline involves clustering the data, and then using prior biological knowledge to annotate the clusters given unique marker expression. Here, I develop a workflow that calls a LLM directly in R, with a pre-specified prompt, to annotate the PBMC 3k data, storing the results as an R object, directly in a Seurat workflow, in R. I then run it multiple times to check for stabiity of the annotations. I find that it annotates the PBMC 3k dataset accurrately, with subtle errors (eg. not going deep enough in the naming of a subset) likely able to be overcome by prompt engineering and/or model selection.

A LLM's descent into madness

Article.

At the time of writing (February 2025) I am frustrated by the best of the LLMs giving me results that are still in my opinion superficial. So I wrote a script that would prompt a LLM to give me an answer on something, and then with each iteration in a loop, take a piece of the answer and go deeper into it. I thought it would either converge upon something fundamental, or continually drift into different topics. I was wrong on both. The model eventually descended into emoji-laden verse, and then word salad, and then a sort of letter salad, where it started making up words, disregarding grammar, and so forth. Something like E. E. Cummings. I encourage the reader to try it on their machine.

From gene lists to interactive contextual maps: enhancing g:Profiler interpretation

Markdown.

In my client projects, I often have to make sense of long lists of differentially expressed genes (DEGs). This typically leads to a long list of biologically relevant terms (eg. GO terms, pathways). This too is often overwhelming. Here, I make these terms easier to understand by producing a interactive contextual map of them, where terms that are similar to each other in context are physically near each other on the map. In this R Markdown, I show you both how its done, using python's sentence-transformers package directly in R. Furthermore, I cluster the map and show you how to automatically annotate the clusters by feeding the per-cluster terms into a LLM directly in R, and asking it to return an underlying theme for each cluster given the terms.

Comparing Color Palettes for scRNA-seq Data Visualization: The Case for Viridis as a Default

Markdown.

When using "color" as a dimension (for example, coloring UMAPs by marker expression), flow cytometry users often default to the "jet" color palette, and Seurat users often default to their default light-gray to blue color palette. Here, I present the case for the use of viridis as a default color palette. I show that viridis is more balanced and colorblind friendly than jet, with more resolution than the Seurat default. Using this color scheme will be beneficial for both data interpretation and inclusivity in our field.

Running UMAP on scRNA seq data without PCA reveals importance of cosine distnace in high dimensional data

Markdown.

What happens when you omit PCA from a Seurat scRNA-seq pipeline? Here, Using the flagship pre-annotated PBMC 3k dataset, I tested this by inputting the top 2000 variable genes directly into UMAP, bypassing PCA. The results showed less spatially resolved subsets, particularly among CD4 T cells. Furthermore, I found that in the no-PCA condition, performance (measured by how well cell subsets could be resolved) was substantially worse when using Euclidean distance rather than cosine distance. These differences went away when the top 10 PCs were used as input (which again is standard best practices). These results suggest that in this context, the choice of distance metric matters more in higher dimensions. Thus, I recommend defaulting to cosine distance or testing both metrics side by side when using UMAP or similar tools on high-dimensional datasets.

Die with zero ideas

Article.

At any given point, there are ideas in my head or rough pieces of text and code on my computer that could prove useful for someone in the world. To this end, I created this website and I began posting about the stuff I put on here. This has helped my life and career immensely. My general stance these days is I want all my useful stuff to be in the public domain by the time I die. This article details both the "why" and the "how" of this effort.

Nearest neighbor similarity between PBMC 3k dataset and its foundation model embedding

Markdown.

Universal Cell Embeddings (UCE) is a foundation model that takes a single-cell dataset as input and outputs a 1280 dimensional embedding that is relevant in the context of a large number of single-cell datasets that it was trained on. This model can in turn do things like label transfer between datasets. Here, I looked at the embedding within the PBMC 3k dataset and simply asked how many k-nearest neighbors per cell are shared between the original dataset and its UCE embedding. Specifically, I used the first 50 PCs of the PBMC 3k dataset's top 2000 most variable genes (standard practice), and the first 50 PCs of the UCE model. The results suggest that nearest neighbors are preserved at the labeled cell subset level but the resolution does not go beyond this. Note: I also compared the first 50 PCs of the PBMC 3k dataset with the entire 1280 dimensions of the UCE output, and the results were slightly worse but similar.

Annotate the PBMC 3k dataset

Markdown.

This markdown is for people with PBMC datasets, who want a convenient way to annoate them using the markers specified in Seurat's guided clustering tutorial. Here, we take the aforementioned tutorial and run the PBMC 3k dataset through it. Then, we have a block of code that loops through each of the populations identified in the tutorial and looks for the relevant markers in the clustering scheme from our run. This allows the user to quickly and conveniently figure out what cluster belongs to what population.

Flow cytometry color scheme for Seurat's FeaturePlot

Markdown.

Seurat's FeaturePlot has a built in color scheme that goes from grey (zero expression) through violet, to a deep blue for the highest expression. People like me who started with CyTOF in Cytobank or similar tools are used to a color scheme that starts with blue, then goes through cyan, green, yellow, orange, then red. Here, I re-make that color palette for use in Seurat's FeaturePlot. Then, I further show how to make the FeaturePlot output with the altered color palette independent of Seurat.

The R Rabbit Hole

Article.

What happens under the hood when you do something simple in R, like add two numbers together? Here, I start with just that. I then show that arithmetic in R is actually parsed as S-expressions. I then go into LISP, where S-expressions are a hallmark. I then move into the source code of R, where the S-expressions for arithmetic are written, in C. And from there, I go into Assembly, the human-readable version of machine code that C is compiled into. By breaking down these layers, and going down the rabbit hole, this exercise gets us to first principles. These are foundational concepts from which we can reason and solve problems more effectively.

Episodic memory is the new semantic memory

Article.

First, we valued having information. After the rise of the internet and search engines, we valued synthesizing information. After the rise of AI, I think our value as humans will be increasingly in having and synthesizing information from our episodic memory, our personal experience.

Buddhism is to mindfulness, as Christianity is to…

Article.

I make a connection between idealizing my father during the "Dad is superman" phase of my childhood, and the Christian practice of imagining the ideal form of good (perfect kindness, perfect virtue, etc) and trying to move in that direction. The phrase "what would Jesus do" is pretty much this. The same way we have taken mindfulness out of Buddhism, I think this practice can be taken out of Christianity and also practiced in a perfectly secular way. It has benefitted me my whole life, so it's worth trying on for size.

The most boring man in the world

Article.

I had a fascination with the Dos Equis "Most Interesting Man In The World" ad campaign, which ran through my 20s. In this article, I xexplore what it really means to be interesting. I conclude that a lot of the aspects of my life that are interesting have been a result of doing a boring slog of hard work for a long time. I conclude that part of being interesting is the willingness to be boring.

Running only the most variable genes through UCE leads to worse cell separation than running the whole dataset

Markdown.

In a standard single cell RNA sequencing analysis pipeline, one of the first things you do is find and use only the most variable genes, as measured by gene expression and dispersion. These variable genes are sufficient to be used in downstream analysis, like clustering and dimensionality reduction. Here, I test whether I can do the same thing for the Universal Cell Embeddings (UCE) foundation model, which in theory could save time and compute. I find that running only the variable genes through UCE leads to poorer cell type separation, as measured by both UMAP and PCA. This suggests that those who use UCE should use the full datasets, not filtered ones.

Zen and the art of driving stick

Article.

I find that if I'm driving stick rather than automatic, I'm much more connected to what I'm doing, much more satisfied in the moment, and I'm objectively a better driver as a result.This concept generalizes. Pick an endeavor. Complete the analogy: automatic transmission is to your endeavor as manual transmission is to X. If you know how to do X, do it when you can. If you don't know how to do X, then learn it. I give several examples of this in my life, and I conclude by encouraging others to embody this way of doing things.

Universal Cell Embeddings with two PBMC datasets: how to test whether it grokked integration

Markdown.

In this markdown, we import two PBMC datasets, the PBMC 3k and the PBMC 10k datasets. The 3k dataset is a flagship dataset used in the early days of Seurat. The 10k dataset is the default that is run through the model if you don't specify another dataset. Here, I show that if we look at a UMAP embedding, the datasets do not sit on top of each other. However, if we use my KnnSleepwalk package, we find that the distances on the UMAP are distorted. What we find is that, for example, the T cell island for the 3k datset sits much closer to the T cell island for the 10k dataset. This in turn suggests that we should be careful using UMAP to assess foundation models in single-cell and in any field.

UMAP does not capture the proper center and outer edges of human CNS portion of the Univesal Cell Embeddings (UCE) transformer foundation model

Markdown.

This is a jupyter notebook that looks at the Universal Cell Embeddings transformer foundation model for single-cell sequencing. It is part of an emerging sub-field of foundation model building within single-cell sequencing. The output of the model is a 1280 dimensional embedding. Here, to get a sense of the geometry of the embedding, I look at the center and the outer edges. I visualize this in the context of UMAP space, and find that UMAP does not properly capture center-ness. I further find that center-ness is positively associated with both frequency of cell subset and per-subset density. I conclude that center-ness is a worthwhile feature to look at in the context of these models, and that it is not something that UMAP can be relied upon to capture.

There's some signal there

Article.

This is an expression I use a lot in casual conversation. In essence, when I come across something that feels like its part of the path forward, but can't fully articulate why, I say "there's some signal there" and then I put it in my back pocket. This essay tells the story of how doing this from my teenage years onwards, has contributed to a lot of my good decisions thus far.

The limits of dimensionality reduction tools for single-cell analysis

Webinar.

This webinar is the latest iteration of my "dimensionality reduction interrogation" work, which has spanned six years. In this talk, I show the limits of these tools by looking at the K-nearest neighbors (KNN) of a given cell in the 2-D embedding space, and compare it to the K-nearest neighbors of that same cell in the high-dimensional feature space. I look at the averages across a given dataset for a given method to compare t-SNE, UMAP and PCA. I then color the maps by each cell's KNN preservation to look for patterns across the dataset. While I point out some general trends, I conclude that at least some of this depends on the dataset. Thus, I demo my free KnnSleepwalk tool, which allows users to deterine the performance of their t-SNE's and UMAPs for their data accordingly.

In this article, I summarize my journey from biologist at the beginning of grad school to bioinformatician at the end. On top of that, I show you some of the key insights that empowered me to get to where I am now. I link out to a number of references in bioinformatics and computer science that I find meaningful. In short, I hope that anyone in the life sciences reading this article can use it as a resource on their journey to learn bioinformatics.

It's more complicated than that

Article.

This is the observation I run into with just about every line of scientific inquiry. This is despite a revolution of new technologies that allow us to look at much more data, and new algorithms to make sense of these huge datasets. I repeat this phrase every time I start to feel like I've figured it all out.

Run CyTOF analysis with Seurat

Markdown.

Seurat is an R package that runs single-cell sequencing and related data. Here, I trick Seurat into thinking my CyTOF PBMC data is single-cell sequencing data. I find out that the effective dimensionality of my CyTOF data appears to be much less than the surface markers I am using (15). This is a counter-intuitive result because our features are carefully curated before the experiment is done.

Knn sleepwalk

Software.

A wrapper I wrote around the sleepwalk R package. Hover the cursor over any cell in your embedding, and it will show you the cell's k-nearest neighbors computed from the original feature space (as opposed to the embedding space). This allows you to test your assumptions around how exact a low-dimensional embedding (eg. t-SNE, UMAP) is.

TEDx Basel talk: my scrolling problem, and how I fixed it

YouTube video of my TED talk.

In this talk, I introduce the idea of the Scrolling Problem, which is the incompatiblity of my ADD brain and modern technology built around the infinite scroll. I talk about some work I'm doing to counteract that, which can be found here. It was originally Twitter, but I switched to RSS mapping, here, after Twitter started blocking scrapers.

The Scrolling Problem

Article.

The article behind my TEDx Basel talk. We spend a large fraction of our lives endlessly scrolling through our feeds, with no control over what hypernormal, outrage-inducing content will appear next. I call this the scrolling problem. I define it, and I have a crack at it by viewing my news feed as map with the help of an AI language model based on BERT.

Single-cell sequencing analysis: the importance of data integration

Markdown.

In flow cytometry and CyTOF analysis, we distinguish between "type" and "state" markers, so we can cluster on the former and analyze per-cluster expression changes in the latter. For single-cell RNA sequencing, we cannot make this distinguishment. Thus, we have to rely on data integration algorithms when we are analyzing pre-treatment and post-treatment datasets. I show how this is done, and I show how failure to do so can lead research teams to falsely interpret the data, and make false conclusions. Thus, understanding data integration is critical to keeping research teams on track.

Single-cell analysis

I started out analyzing CyTOF data, as I did my PhD in the lab of Garry Nolan from 2012-2017, where CyTOF was initially being developed and applied to immunology and cancer bilogy. I later broadened to single-cell sequencing and high-parameter imaging (both proteins and genes). The work below consists primarily of markdowns, with code and explanations that allow users to do things that have helped me a lot in my work, but I don't necessarily have the time to turn into publications.

LLM-augmented PCA loadings

R Markdown

PC loading by interpretation report

LLM consistency experiment

Cluster membership purity score for single cells

Markdown.

Cluster stability as an evaluation metric for single-cell workflows

Markdown.

Automated LLM-based annotation of single-cell sequencing data, directly in a R workflow

Markdown.

Comparing Color Palettes for scRNA-seq Data Visualization: The Case for Viridis as a Default

Markdown.

Running UMAP on scRNA seq data without PCA reveals importance of cosine distnace in high dimensional data

Markdown.

Nearest neighbor similarity between PBMC 3k dataset and its foundation model embedding

Markdown.

Annotate the PBMC 3k dataset

Markdown.

Flow cytometry color scheme for Seurat's FeaturePlot

Markdown.

Running only the most variable genes through UCE leads to worse cell separation than running the whole dataset

Markdown.

Universal Cell Embeddings with two PBMC datasets: how to test whether it grokked integration

In this markdown, we import two PBMC datasets, the PBMC 3k and the PBMC 10k datasets. The 3k dataset is a flagship dataset used in the early days of Seurat. The 10k dataset is the default that is run through the model if you don't specify another dataset. Here, I show that if we look at a UMAP embedding, the datasets do not sit on each other. However, if we use my KnnSleepwalk package, we find that the distances on the UMAP are distorted. What we find is that, for example, the T cell island for the 3k datset sits much closer to the T cell island for the 10k dataset. This in turn suggests that we should be careful using UMAP to assess foundation models in single-cell and in any field.

UMAP does not capture the proper center and outer edges of human CNS portion of the Univesal Cell Embeddings (UCE) transformer foundation model

Markdown.

How I transitioned from biologist to biology-leveraged bioinformatician

Article.

How X-shift works

Markdown.

X-shift is a popular clustering algorithm for CyTOF and related high-dimensional data that is related to mean-shift clustering. It is especially good for the detection of rare cell subsets. While X-shift is computationally intensive and written in java to overcome several engineering hurdles accordingly, here I show you a simplified version of X-shift written in R that leverages the igraph package. The purpose of this markdown is to show you how X-shift works in a language that is less verbose and more familiar to the average CyTOF user than java.

Single-cell sequencing: integrated vs not integrated

Markdown.

CyTOF mutual nearest neighbors experiment

Markdown.

Phenograph is a popular CyTOF clustering algorithm, which is really Louvain community detection of a K-Nearest Neighbor (KNN) graph. Of note, this is the primary clustering tool used in Seurat for scRNA seq data. Here, I make the KNN graph myself for CyTOF data, and contrast it with the mutual nearest neighbor (MNN) graph, where Cell 1 is connected to Cell 2 if and only if they both are part of each other's respective KNN. I find that clustering the MNN graph might provide a little more resolution than the KNN graph, if properly optimized.

Single-cell sequencing: Schulte-Schrepping et al. Cell 2020

Markdown.

In this markdown, I take a Seurat object provided by the aforementioned paper, and I use it to do perform visualizations, which include box and whisker plots. This markdown is an example of what kinds of things a single-cell sequencing bioinoformatics workflow might entail.

Single-cell sequencing pipeline, PBMC 3k in depth

Markdown.

I use the classic Seurat PBMC 3k vignette as a foundation to explore the guts of the high-level Seurat functions within. This includes normalizing and scaling the data myself, and reverse engineering the "Seurat" clustering algorithm. Regarding the latter, I show you how to visualize the KNN graph that serves as the basis for the Louvain clustering Seurat uses.

CyTOF analysis langauge tour in R Markdown

Markdown.

I typically analyze CyTOF data in R. However, there are plenty of reasons why one might want to analyze CyTOF data in other languages as well. Here, I show that you can use python, julia, C++, SQL, and Rust directly in R Markdown. I do most of my work in R Markdown these days, but I like the flexibility of being able to switch from one language to the other and back, all in the same literate programming environement.

CyTOF UMAP with Julia: an experiment

Markdown.

Here, we compare the UMAP implementation from R with the UMAP implementation from Julia. The Julia programming language is a much faster programming language, so I expected that we might be able to speed UMAP up. Accordingly, it did. Here, I show you how to import your data into R, move in into Julia, run UMAP, get it back into R, and plot it. All in a single R markdown.

Run CyTOF analysis with Seurat

Markdown.

Anatomy of a fcs file

Markdown.

You can parse a fcs file from scratch without flowCore. I read in a fcs file line by line, rather than using the standard read.FCS from flowCore. We can't read the data directly this way, but we can read in the header and the text. For the data, we read in the bytes, convert them into decimal, and then build the expression matrix.

Continuous Visualization of Multiple Biological Conditions In Single-Cell Data

First author pre-print.

Abstract: In high-dimensional single cell data, comparing changes in functional markers between conditions is typically done across manual or algorithm-derived partitions based on population-defining markers. Visualizations of these partitions is commonly done on low-dimensional embeddings (eg. t-SNE), colored by per-partition changes. Here, we provide an analysis and visualization tool that performs these comparisons across overlapping k-nearest neighbor (KNN) groupings. This allows one to color low-dimensional embeddings by marker changes without hard boundaries imposed by partitioning. We devised an objective optimization of k based on minimizing functional marker KNN imputation error. Proof-of-concept work visualized the exact location of an IL-7 responsive subset in a B cell developmental trajectory on a t-SNE map independent of clustering. Per-condition cell frequency analysis revealed that KNN is sensitive to detecting artifacts due to marker shift, and therefore can also be valuable in a quality control pipeline. Overall, we found that KNN groupings lead to useful multiple condition visualizations and efficiently extract a large amount of information from mass cytometry data. Our software is publicly available through the Bioconductor package Sconify.

I've been asked recently why this is still a pre-print. So I published the peer review for this manuscript with some commentary here.

High Throughput Precision Measurement of Subcellular Localization in Single Cells

First author publication.

Abstract: To quantify visual and spatial information in single cells with a throughput of thousands of cells per second, we developed Subcellular Localization Assay (SLA). This adaptation of Proximity Ligation Assay expands the capabilities of flow cytometry to include data relating to localization of proteins to and within organelles. We used SLA to detect the nuclear import of transcription factors across cell subsets in complex samples. We further measured intranuclear re-localization of target proteins across the cell cycle and upon DNA damage induction. SLA combines multiple single-cell methods to bring about a new dimension of inquiry and analysis in complex cell populations. © 2017 International Society for Advancement of Cytometry.

My summer students are co-authors on this paper! Undergrads and high school students. They worked very hard and learned a lot. I am proud of each and every one of them.

Expanding the Capabilities of Mass Cytometry Data Acquisition and Analysis

PhD Thesis.

My PhD thesis dissertation, from the laboratory of Garry P. Nolan at Stanford University School of Medicine.

In sum: I started by developing a method to enable flow and mass cytometry to detect and quantify nuclear localization, called Subcellular Localization Assay (SLA), which came out of a collaboration with the lab of Ola Soederberg at University of Uppsala, Sweden.

In parallel, I was taking computer science classes as a side hobby. I reached a point where I was trying to compare two t-SNE maps between unstimulated and simulated data, and I realized that there was a K-Nearest Neighbors based solution that I could implement with my newfoud computer science competencies. I therefore developed Sconify, a now BioConductor package that allows for these visualizations. There were many use cases, and I spent the remainder of my thesis developing this method further and doing various collaborations with it.

A visual interrogation of dimension reduction tools for single-cell analysis

Slide deck.

German CyTOF User Forum; Berlin, Germany; January 2020. In this talk, I measured the accurracy of dimension reduction tools (PCA, t-SNE, and UMAP) in terms of their nearest neighbor overlap. This is the k-nearest neighbors of a given cell in the original high dimension space, in comparison to the k-nearest neighbors of a given cell in the embedding. I show that the overlap here is much lower than my audience expected. I've given this talk many times since then, for my clients.

Neighborhood-based analysis of self-organizing maps

Slide deck.

Laboratory of Yvan Saeys, VIB Ghent, Belgium. June 2018. This slide deck summarizes some work I did with Sofie Van Gassen, developer of FlowSOM and all-around awesome person. We were looking at what is called the U-Matrix, a way to visualize the self organizing maps that FlowSOM produces. The question was what insights could we derive from using the U-Matrix to visualize the output of very large FlowSOM clusterings (eg. a 100 x 100 grid rather than the default 10 x 10). So far as I know, this is not explored in any major CyTOF publication, so any CyTOF users who use FlowSOM (most people at the time of writing) should have a look at this. There are visualizations in here that are useful but remain unpublished.

A history of mass cytometry data analysis, and where the field is going

Slide deck.

German Rheumatism Research Center; Berlin, Germany; March 2019. I talk about how CyTOF data analysis developed from its inception at the beginning of 2010 to now. In doing so, I provide a template for proper CyTOF data analysis in terms of how we got there. In doing so, I test various assumptions: I show visualizations of data transformations other than asinh(x/5), and I show what a SPADE tree looks like with completely random inputs. I like to show these slides to people new to CyTOF data analysis to properly orient them.

A comprehensive interrogation of the t-SNE algorithm for mass cytometry analysis

Slide deck.

German Rheumatism Research Center; Berlin, Germany; May 2018. This talk was a response to a member of the research institue who was simply not convinced that t-SNE was providing the accurracy that the avearge CyTOF user thought. In this talk, I show that he was right. This being said, I provide recommendations for how to properly use t-SNE for CyTOF analysis.

Nearest neighborhood comparisons across biological conditions in single cell data

Slide deck.

Invited Speaker, German CyTOF User Forum; Berlin, Germany; February 2018. This is the talk version of my 2018 Sconify paper, that ended up being the final chapter of my PhD thesis. There are two aspects to this talk. The first is making visual comparisons of unstimulated and stimulated CyTOF data when looking at measurements of phosphoproteins. This was easily done on SPADE trees, but not t-SNE maps, until I started making k-nearest neighbor based comparisons. The second aspect of this talk is using the same nearest neighbor based comparisons to investigate batch effects in CyTOF data. I note that batch effects were only heavily discussed among CyTOF users starting near 2020 (in my circles), and this work goes back to 2016.

Determining which distance metrics are ideal within a mass cytometry data analysis pipeline

Poster.

CYTO Conference; Prague, Czech Republic; May 2018. Abstract: Due to the rise of high-dimensional single cell technologies in the past few years, there has been an increasing number of both computational methods and workflows to analyze the new wealth of data. However, non-intuitive properties of high-dimensional space can give rise to analysis artifacts, collectively known of as the “curse of dimensionality.” Increasing dimensions differentially affect the performance of distance metrics, and there is no clear consensus about which distance metrics to use for which analysis strategies. While the influence of many tool-specific parameters has been evaluated, we study here the impact of commonly used distance metrics on the outcome of dimensionality reduction and clustering.

Fine-Tune viSNE to Get the Most of Your Single-Cell Data Analysis

Guest blog post.

This is a guest blog post I wrote for Cytobank. The formatting of the post has since been messed up (image links are broken) since Beckman acquired Cytobank and moved all the web content over. Until it gets fixed, I'm linking you to the original PDF. At the time of writing, there was still a lot of work to be done in terms of really understanding dimension reduction for CyTOF data. As such, I spent a lot of time adjusting inputs (eg. number of cells) and parameters (eg. perplexity) to understand how that affects the resulting map.

Dimension reduction add noise

Software.

If you have one or two bad markers in your panel (noise), does it completely ruin your t-SNE/UMAP visualizations? According to my analysis so far, no. I take whole blood CyTOF data (22 dimensions) and add extra dimensions of random normal distributions, running t-SNE after each new column has been added (I've done UMAP too). What I have found:

A few dimensions of noise do not catastrophically affect the map. Lots of noise dimensions do.
The embedding space shrinks with increased number of dimensions. You have to hold the xy ranges constant to see this.
When you have many dimensions of noise, the map starts to look trajectory-like (look at the end of the gif), which could affect biological interpretation.

Dimension reduction island placement

Software.

This project asks the following question: if you run t-SNE or UMAP over and over for 100 times or more, how different does each map look from each other map? Is each map radically different? Is each map similar? Are there pockets of stability?

The spoiler alert is that the island placement of UMAP appears to be more stable than that of t-SNE, but t-SNE does display pockets of stability. This can be more easily seen by ordering the t-SNE runs by similarity.

Knn sleepwalk

Software.

A wrapper I wrote around the sleepwalk R package, that I in turn made into a package, so users even with limited bioinformatics experience can utilize it. Hover the cursor over any cell in your embedding, and it will show you the cell's k-nearest neighbors computed from the original feature space (as opposed to the embedding space). This allows you to test your assumptions around how exact a low-dimensional embedding (eg. t-SNE, UMAP) is.

Bioconductor package Sconify

Software.

Official description: This package does k-nearest neighbor based statistics and visualizations with flow and mass cytometery data. This gives tSNE maps"fold change" functionality and provides a data quality metric by assessing manifold overlap between fcs files expected to be the same. Other applications using this package include imputation, marker redundancy, and testing the relative information loss of lower dimension embeddings compared to the original manifold.

CyTOF analysis pipeline

Markdowns.

CyTOF analysis has come a long way. Along with single-cell sequencing analysis, a lot of it is high-level functions that do what needs to be done. I prefer a guts-level analysis, where I can see the low-level the details of how my data are being manipulated. This is important for understanding and innovation.

One fcs file

Keeping it to one fcs file, we can focus on what happens when a fcs file is read into R, how it is transformed, and what the best practices of clustering, dimension reduction, and visualization are. These foundations can be built upon when looking at multiple fcs files to determine where there are differences in your control versus experiment group.

Multiple fcs files

This markdown uses the diffcyt package to help us do statistics between groups, though I show you how to do per-cluster statistics yourself. We make box plots group-level comparisons for clusters we care about. We also color our dimension reduction maps by the p-value information. This pipeline requires a sample metadata file, as well as a marker file. I show you what these look like directly in the pipeline.

KNN sleepwalk examples

Software.

Some examples of output for my KNN sleepwalk tool. These are interactive, and are here to give the user intuition around the nature of dimension reduction maps. From the README: "My wrapper allows for the visualization of a given cell's K-nearest (and K-farthest) neighbors. In other words, the cursor is on a given cell, and the cells on the map that change color correspond to a pre-specified number of nearest neighbors in the original high-dimensional space." See notebooks in my repo to see the data and code. What to do with the visuals below:

K-nearest neighbors (KNN) will give you intuition around how exact the embedding is.
K-farthest neighbors (KFN) will give you intuition around how well the embedding preserves global structure.

CyTOF PBMCs

The dataset is internal, from the German Rheumatism Research Center in Berlin. These take a bit to load after you click on them, but its worth the wait.

single-cell RNA sequencing PBMCs

The dataset is from this vignette. The dimension reduction was done on the top 10 principal components of the top 2000 most variable genes.

Distance matrix metric correlations

Markdown.

Which distance metrics are right for your data analysis. While I've created a poster on this here, this is a stab at it from a different direction. I make synthetic CyTOF-like data, varying the dimensionality from 2 to 1000. I make a distance matrix for each distance metric used, and then correlate each one to that of the Euclidean distance matrix, which is often a default. The results are counter-intuitive.

asinh(mean(x)) vs mean(asinh(x))

Markdown.

If you want the means of your markers per cluster, be careful how you export the data. If you export the means of the raw values per cluster, and take the asinh(x/5) transform of that, the values will be different than if you take the means of the asinh(x/5) transformed data per cluster. The latter is the right way to do it. But don't take my word for it. Look at the markdown yourself.

Data transformations for CyTOF

Markdown.

CyTOF data are transformed using the inverse hyperbolic sine (asinh) of the data divided by 5 (aka scale argument of 5). But does it have to be like that? What happens if we use a scale argument of 1? 500? What if we do a log transform? How does t-SNE look on untransformed CyTOF data?

Natural language processing

A lot of this work is related to natural language embeddings, or taking anything from words to paragraphs and converting them into spatial coordinates that group by context. My most popular work on this is summed up in The Scrolling Problem, which culminated in a TEDx Basel talk in 2023.

10,000 random research questions generated by a LLM

Main writeup

This report shows you how to generate the questions, embed them, cluster them, reduce them to UMAP coordinates, and plot the results, saving it as a html.

R Markdown

This piece is the code to take the above output and label the clusters.

A LLM's descent into madness

Article.

From gene lists to interactive contextual maps: enhancing g:Profiler interpretation

Markdown.

In my client projects, I often have to make sense of long lists of differentially expressed genes (DEGs). This typically leads to a long list of biologically relevant terms (eg. GO terms, pathways). This too is often overwhelming. Here, I make these terms easier to understand by producing a interactive contextual map of them, where terms that are similar to each other in context are physically near each other on the map. In this R Markdown, I show you both how its done, and how to use python's sentence-transformers package (and python in general) directly in R.

How I made a command line chatbot

Article.

Graph visualization of my website

Visualization.

My website functions a bit like a personal wiki, with content linking internally to other content. Here is a graph that shows an updated version of what links to what, so the reader can get a feel for what ideas I express and how they relate to each other. It is colored by the number of links.

One million words: a tech-enabled review of 15 years of journaling

Article.

I started a typed journal back in 2009. It recently hit one million words. It is difficult to review that many words by reading it top to bottom, so I took some AI tools I developed over the past few years and utilized them here. I take you through the structure of the journal, the BERT spatial embedding method that underlies the journal analysis, and the results. I conclude by encouraging you to keep a journal and to use these methods to analyze your journal. I note that these methods are applicable to any sort of note taking that you're doing.

CNN, FoxNews, and AP: a News Space study

Markdown.

In this study, we take news articles that correspond to CNN, FoxNews, and AP, from their Twitter handles, and their BERT embeddings, and produce a map of news space. We figure out what areas of news space are heavy in one news source or the other (perhaps corresponding to political bias). We find that Fox News in general reports heavily on the topic of politics in comparison to CNN and AP, and that while AP is supposed to be neutral, there are still regions of news space that are heavy in AP content. Interactive maps are included for the user to explore.

How I curate content

Article.

We should all be active content curators. We should all be actively discussing how we curate our feeds. We should not rely on social media's recommendation algorithms to do this for us. Accordingly, here is how I curate my content. I hope this gives you some ideas, and I hope this encourages you to share your content curation strategies.

The Scrolling Problem

Article.

GPT-3 simulating students

Article.

This is a report I wrote for my uncle, who is a professor at the University of Michigan Ross School of Business. The concern was that generative language models would be able to simulate student's responses to essay questions good enough that tech-savvy students would simply offload their homework to GPT-3. I explore this option using my early access to GPT-3, with a conclusion heavily inspired by an article by Gwern. You pretty much have to at this point.

The Context Problem in Bioinformatics

Article.

In the age of big data, my bioinformatics analyses often lead to output that is still too much for a human to extract insight from. My use case here, common in my work: what GWAS traits do two or more genes have in common? I produce a context map of GWAS traits using an AI language model based on BERT. I then subset the map by traits associated with the genes the user inputs, coloring the points accordingly. One can quickly know what contexts, rather than traits, the genes share.

What would Marcus Aurelius say?

Article.

I turn the Meditations by Marcus Aurelius into a semantic search based language model, where I ask a question and it returns the most relevant passages in the book as answers. This helps me with the study of stoic philosophy, but this approach can be used in any sort of book that is structured as aphorisms.

How to utilize scientific literature trends to gain intuition around a topic

Medium post.

The scientific literature is overwhelming, and knowing how to utilize text mining and analytic tools can help you efficiently get what you want out of a literature search. Here, I utilize the PubMed API to find publication rates for particular topics. I show that among other things, single-cell sequencing began out-pacing mass cytometry in 2016. Insight like this helps you see how crowded a field is and especially identify trends.

How to identify thought leaders and visualize their influence

Medium post.

Understanding how authors of a given field are connected can help you identify key individuals to pay attention to. Here is how I utilize the PubMed API to build co-author networks, which lead me to identify thought leaders in a given domain. In this article, focusing on mass cytometry, I identify two types of thought leaders: one exclusive to a particular sub-domain, and one who spans across multiple sub-domains. It is important to know both types when approaching a new topic.

Using and mining pre-prints to stay ahead of your field, with the help of Twitter

Medium post.

I explain why pre-prints are important to staying ahead of the technology and general paradigms in your field, with single-cell analysis as an example. I then show how I utilize the Twitter API to harvest and rank tweets from automated pre-print linking bots from bioRxiv to determine what pre-print articles are being talked about (and therefore what you should probably pay attention to).

RSS map

Software.

Associated with The Scrolling Problem. An app that converts an RSS feed into a semantic map where articles that are similar to each other in context are near each other on the map.

(temporarily suspended) Gwasmap

Software.

Associated with my article The Context Problem in Bioinformatics. Given one of more genes, what are the GWAS associations? These are placed onto a semantic map where associations that are similar to each other are grouped near each other on the map. Thus, if gene 1 is associated with Alzheimer's disease and gene 2 is associated with age-related cognitive decline (different but related disease) the associations for each gene (colored accordingly) will show up near each other.

Ask Marcus Aurelius

Software.

Associated with What Would Marcus Aurelius Say. This project turned the Meditaitons by Marcus Aurelius into a semantic map that can be queried, such that the user can ask a question, and the software will return the most relevant passages in the Meditations.

Find your biases

Software.

Write your thoughts into the text box, and the app will give you a list of cognitive biases that match the thoughts. The app does this using an AI embedding model to embed both your input and Wikipedia's list of cognitive biases, and then perform a nearest neighbor search.

Twitter archive and embed

Software.

This is one of the main tools that I wrote and use to address the scrolling problem. I gave a TED talk on this project, and in progress of preparation, Twitter decimated my ability to get data. But here is what I've got.

A pipeline that takes as input a list of twitter user names that you supply. First, it scrapes the entire twitter history for the given names. Second, it uses BERT to make a topic-based high-dimensional embedding of every tweet per user name. If these two steps had already been done for a given user, it will update with the new tweets. Then, the user selects a subset of users to visualize. For these users, the BERT embeddings are converted into a UMAP, which is then clustered and annotated with extracted keywords per cluster. Finally, the results are visualized in an interactive user interface.

DuckDuckGo 2-D Search

Software.

For web searches of broad topics, where you need the first hundred results rather than the first page. Type in your search term of interest, and it will give you an interactive context map of search results and a results table with clickable links.

Preprint server archive

Software.

A searchable and sortable table of every biorxiv and medrxiv pre-print to date ([2022-11-17 Thu 13:43]). Specifically, every time a paper is uploaded to one of these pre-print servers, it is automatically tweeted out from the respective twitter handle. As such, the table contains the paper title along with various tweet metadata (eg. likes) to allow users to understand which papers are potentialy important.

Likes vs retweets

Markdown.

Search term: single cell sequencing OR single-cell sequencing

We find three regions:

High retweets/likes: open academic student and postdoc positions
Medium retweets/likes: papers, projects, data (the stuff you're probably looking for)
Low retweets/likes: memes, status updates, fun stuff

Question graph

Markdown.

You are only as good as the questions you ask yourself and others. My uncle told me that many years ago when I was getting started with my career and it stuck. This has been relevant to me in terms of having and maintaining good friendships, being a good husband, being a good family member, being a good businessman, and when I was in graduate school, being a good scientist, and simply being an interesting person. I have a very large list of questions now that is very overwhelming. So I turned them into an embedding using the BERT language model, turned that into a nearest neighbor graph, and then derived insight from looking at the questions in terms of "communities."

Philosophy and rationality

We choose to think about stuff

Article.

Die with zero ideas

Article.

Buddhism is to mindfulness, as Christianity is to…

Article.

The most boring man in the world

Article.

I had a fascination with the Dos Equis "Most Interesting Man In The World" ad campaign, which ran through my 20s. In this article, I explore what it really means to be interesting. I conclude that a lot of the aspects of my life that are interesting have been a result of doing a boring slog of hard work for a long time. I conclude that part of being interesting is the willingness to be boring.

There's some signal there

Article.

I saw the northern lights

Article.

A reflection of the first time I saw the northern lights, in Iceland in 2019. A reminder that for whatever moment you're immersed in, really take it in before you reach for your camera or whatever else.

Fear of the un-word is the beginning of wisdom

Article.

An un-word is a word that points to that which cannot be put into words. We see examples of this in religion, where words like Tao and God are meant to point us to a vastness that is beyond anything we can possibly understand. The Christian concept of fear of God, as seen through this lens, reflects the horror and awe that comes from admitting ignorance and embracing the unknown. This is the beginning of wisdom.

Limbic language learning

Article.

In the years I've lived in Germany, I have realized that a lot of my success in speaking the language has come from connecting my brain's emotion center (limbic system) with my language center. In short, I think that anyone learning a foreign language should start speaking that language with emotion sooner than later. Here, I go into personal experiences and practical advice for what I call limbic language learning.

Getting life done

Article.

There are two modes that we operate in: the doing mode and the getting-done mode. The doing mode is like a hike, where the focus is on the hike itself and not point A to point B. The is opposed to a commute, the getting-done mode, where you're focused on getting from point A to point B. Here, I argue that the doing mode is being wrestled away from us, and we are wasting our lives in the getting-done mode. We are going to deeply regret this.

Episodic memory is the new semantic memory

Article.

Zelda, the hero instinct, and narratives

Article.

I take the classic game Zelda: A Link to the Past, and draw parallels between the gameplay and many aspects of my life. I talk especially about our "hero instinct," in terms of how we really vibe with hero-centric games like Zelda, and I go into the general concept of narratives. How do these mesh with the complex, interconnected modern world, in the workplace and beyond?

Enjoyment arbitrage: you can do what you love, if everyone else hates it

Article.

I think it is possible to do what you love, if you put yourself into an environment where others simply don't want to do what you do. I show how this works in my world, where many people are simply not interested in learning or doing bioinformatics at my level of depth and involvement.

Fight complexity with complexity

Article.

A new paradigm that seems to be emerging from the bottom up, linking my work on dimensionality reduction interpretation with GPT-based interpretations of the human brain and cancer immunotherapy. We use something complex to understand or fight something complex. This is opposed to the older ideal of having perfect mechanistic understanding of what we're doing.

Finite and infinite-life games

Article.

I make a comparison between the older 2-D platforming games from the 16-bit era and a newer game called Celeste. The key difference is that in Celeste, the gameplay is incredibly difficult, but you have infinite lives. I argue that this type of gameplay is an efficient route to flow state. I describe how this type of gameplay mirrors a lot of problem solving in my professional life. I end by saying that Celeste gameplay is an empowering mental model for doing things outside your competence and comfort zone.

Replace the word "value" with "beauty"

Article.

A dialectic between my rational and my emotional/spiritual side that took a while to build up. In my professional life, I think in terms of value (value-add, ROI, etc). But if we get rid of the word "value" in all my rational calculations and replace it with "beauty," it solves a much larger swath of problems and helps me make decisions that allow for, well, a beautiful life lived.

The virtue of depth

Article.

It starts as a lament. In the real world I'm pulled many more directions than in graduate school. As such, I cannot always go deep with respect to whatever I'm doing. In my longing for depth, I can describe what depth is to me. The way the modern world is set up, I think a lot of us are missing depth in our lives. In this regard, I argue that depth should be a virtue that we strive for.

But what is Occam's razor really?

Article.

Occam's razor states that for a given phenomenon, the simplest explanation is the most likely explanation. I explore this with a fun example from my life. I then look at a computational formalization of this, which I use today for sensemaking.

What I learned about problem solving from my thesis lab

Article.

A collection of stories from my time in graduate school. The people in my thesis lab had one perplexing thing in common. They would come in as biologists and then literally invent new hardware and software, without any prior relevant background. They would just figure it out as they went. I learned the ways of the lab and learned several themes around how to solve problems, some of which fly in the face of traditional mainstream advice. So I figured I'd write them down.

Hacking: examples of seeing through and unseeing in my life

Article.

A lot of hacking is seeing through and unseeing the everyday abstractions we pretend to be true. Hacking is sometimes malicious (the Hollywood stereotype), and sometimes it is productive (known as innovation or ingenuity). It is not limited to computers, and it is definitely not limited to coding. Here, I lay out my favorite framework for what hacking is, and I provide examples that range from business to sports to computing.

Learning how to code improved how I think

Article.

Learning computer science improved both my focus and my thinking, which has contributed to a lot of my success from the end of graduate school until now. This matters because I think one should still learn the basics of how to code even if AI automates all of it in the future. This is because inherent in computer science education is the concept of computational thinking, a skill which you should have whether you use it to code, organize your thoughts, or prompt the latest AI. Even the first few months of computer science drastically helped me improve my thinking in this regard. This essay goes into the concepts around computational thinking, and tells you how you can learn it too, in a shorter time than you think.

It's more complicated than that

Article.

The way is the way

Article.

I spent a large portion of my life being goal-driven. I have realized more recently that being focused on the process rather than the goal is more beneficial in many respects. This essay is about my journey to that realization.

Making sense of the (messy) real world

Article.

Finding truth in the real world is much different than finding truth in grad school. Grad school had me working on non-controversial, dry topics that few people in the world worked on. The real world is a lot messier. I talk about the idea of collecting opposing perspectives, steelmanning them, and putting them in dielectic to find higher truth. It's simple in theory, but hard in practice.

The Tao of problem solving

Article.

One of the key components to my method of solving problems is to get into the flow state. When I'm there, some or all of the problem at hand solves itself. And it feels great. I show examples of me doing this. I talk about Taoism as an ancient philosophy built around flow state, but at the macro level rather than the "within-game" level. This is the ideal of being in a perpetual flow state that lasts one's entire lifetime.

The beauty is truth delusion

Article.

This is the idea that data visualizations that look prettier than others don't necessarily convey more truth. I use SPADE and t-SNE as examples that can produce this delusion. This article serves as a call to action for the bioinformatics community to help users distinguish between truth and beauty as data visualization tools come out and become widely used.

Zen and the art of driving stick

Article.

How I'm applying the mindset around sustainability to everyday life

Article.

My current plan for the uncertainty we face due to the pandemic and the events we have seen after that. I talk about having goals around miminizing rather than maximizing, and about the systems thinking and sustainability based mindset that one sees in subjects like permaculture. This is helping me be more effective, and figure out unique new ways I can add value to the world.

Computing

Me nerding out on general computer science topics.

Levels of analysis in bits and atoms

Article.

The R Rabbit Hole

Article.

The Lisp machine of Babel

Fiction.

I am learning Lisp at the moment. In learning about the history of the language, I realized that the story of Lisp is analogous to the Tower of Babel. I am not the first person to realize this by any means, but I saw it in a way that I haven't seen anywhere else. So I decided to put it into words here.

Metaprogramming in R

Markdown.

When I started learning Lisp, I learned of the concept of metaprogramming. This means using code to change the programming language itself. A practical example of this in English is using "they/them" to denote gender-neutral singular pronouns, overriding the plural default. Here, I show you how to change the syntax of R to fit your fancy. I show you how to change the "+" operator in ggplot2, and to zero-index vectors, as they're done in many other programming languages. The goal of this markdown is to get you to see through and unsee the arbitrary constraints that any language, spoken/written or programming, will give you.

Elementary cellular automata rule space

Markdown.

I lay out the rule space of elementary cellular automata as an eight dimensional dataset. I perform UMAP on rule space and color by complexity measures, the most interesting being the labeled Wolfram class of each rule. Class 3, the most chaotic behavior, shows up as little pockets in rule space. Class 4, where things like Turing completeness happen, shows up at or near these pockets, surrounded by Class 2 (repetitive, orderly). This supports the idea of Class 4 being "at the edge of chaos."

Explainable AI and understanding ourselves

Article.

I make the connection between understanding a black box AI algorithm (a hot topic) and understanding ourselves. I make the argument that accordingly, we as humans are prepared to take on this task. I discuss natural language explanations, which is what happens when you attach a language model to an AI system. Ideally, you can ask it why it did what it did.

Logic gates

Markdown.

What are the fundamental units of a computer? Logic gates. I show what these are. I then show that they can be created with combinations of a single type of logic gate: NAND (Not AND). In other words, you can make a general-purpose computer if you had enough NAND gates and wires. In the spirit of this, I combine NAND gates to create a calculator that can add large numbers. One of the key points in this exercise is that it does not take much to get from NAND gates to complex computations.

1-D Cellular Automata

Markdown.

Here, I write some code to produce each of the 256 Wolfram cellular automata rules, and visualize the output.

Explore Wolfram Rule 110

Markdown.

Here, I write some code to produce Rule 110, a Class 4 1-D cellular automata. I then enhance the gliders to make them easier to see. I explore how the output changes if I make the rule probabilistic (eg. 99.99% chance the rule will be followed.

Statistics

There is a lot that you can figure out on your own if you simulate coin tosses and dice rolls.

Coin toss series 1: The law of large numbers and the central limit theorem

Markdown.

I taught one of my high school summer students the basics of probability by simulating coin tosses in R. Here, we "discover" the law of large numbers and the central limit theorem using simulated coin tosses.

Coin toss series 2: Runs of luck

Markdown.

Here, we build on the initial piece in the series by looking at the properties of runs of luck. If we flip a coin a million times, how often will we get 10 heads in a row? How many times do we need to flip a coin to get 20 heads in a row on average? Related to sports. How often, statistically, would you expect Steph Curry to make 10 three pointers in a row given his 3-point shot percentage?

Coin toss series 3: Fair versus unfair coins

Markdown.

Here, we examine the properties of unfair coins, where the odds of getting heads or tails does not equal 50%. Can we figure out whether a coin is a fair coin?

Coin toss series 4: Random walks

Markdown.

Here, we show that if we simulate flipping coins, but we keep a record of the number of heads and the number of tails, we end up doing a random walk. We visualize these walks (they look somewhat like stock market data), and ask questions like how often a random walker crosses zero.

Coin toss series 5: Dice rolls

Markdown.

Here, we do an abstraction of the coin tosses we have been simulating, by coinsidering dice of three or more faces. We simulate these dice rolls and examine their properties. How often does a six sided dice land on the number 3? We can figure that out with simple math, but if you roll a dice 1000 times, and you do that again, and you do that again, what will be the standard deviation of the number of times the dice lands on 3?

Coin toss series 6: Is this sequence random?

Markdown.

Here, we look at fair coin tosses, unfair coin tosses, and random walks, and explore the randomness of the sequences by doing convolutions on the sequences with kernel size 2.

Health

I value having good health, and I have been fortunate enough to work out and eat right for several decades, and I have worked as a certified personal trainer in the past at three gyms. Accordingly, I want to get my perspective off my chest and out there for others.

The boring diet: how I prevent food addiction

Article.

I talk about the hypothesis that the existence and wide availability really good-tasting food is a contributor to the obesity epidemic. I talk about experiences where I've had food that tastes so good that it makes me feel uneasy, like I'm going to become addicted. I call this hyper-yummy food. From this comes the idea of making your diet more boring, less yummy, as a first step to taking control of your nutrition, as opposed to cutting calories.

The Tao of good health

Article.

My approach to health and fitness is not the goal-directed approach that seems to be prevalent everywhere these days. Rather, it's more of a flow-based approach rooted in Taoism, and taking elements from modern books like James Clear's Atomic Habits. This has worked for me for decades, and allows me to get my dopamine from the process of working out rather than the outcome. This is a highly sought after place to be, so I want to share my method in case this helps anyone else find that place too.

How to get fit in 20 years

Article.

Over the years, I have seen people who want to get in shape develop fitness goals that are too much over too little time, which leads to burnout. Here, I provide the opposite perspective, which has worked for me in my adult life. Take your fitness goals and ask: if I had a year to achieve this rather than a month, how would I do it?

Psychotherapy should be as normal as going to the gym

Article.

There is a renewed interest in Stoic philosophy as of late. It's a great set of tools that I have used for dealing with hardship. But it's 2000 years old. Where are the modern Stoics? They exist, but under a different name: psychotherapists. If psychotherapy was the successor to Stoicism, and there is so much interest in Stoicism, then shouldn't there be an equal amount of interest in clinical psychology and how it can help us? Shouldn't it be something we learn early, and make these tools part of our daily routine, as the Stoics do?

Just paint

Article.

An anecdote from my aunt evolves into a motivational article (mainly written for myself). I describe the art and science of how to start a project and how to keep it going. I talk about how I build psychological momentum. I discuss the concept of Long Content, and how it relates to the neuroscience of dopamine optimization.

Snapshots

Think of these as both newsletters and time capsules. They are not exhaustive, but they do represent the bigger insights and ideas on my end from that time period. I'll note that I was going to do this monthly, but life caught up with me and I stopped early. I leave these articles here as a snapshot of a particular period of time that was actually pretty interesting in terms of the long arc of human history: AI (particularly generative AI, like LLMs) was really taking off. Maybe at the time of reading this, AI has plateaued, or maybe it is still exponentially increasing, with all the debate around it that it had here, or maybe some alien is reading this a hundred thousand years after some rogue AGI killed us all. But either way, these are time capsules that maybe I'll add to here and there for the rest of my life.

May 2024

May 2023

April 2023

March 2023

Collections

These are growing lists of things that I find important.

Question Bank

Collection.

I find that asking good questions is key to orienting you in the right direction. As I get older, I am focusing less on answers and more on questions. Accordingly, here is a growing collection of questions that I have found helpful over the years. It is divided into two categories: lists of questions that have been collected by others, and my independent collection. Most of these questions are attributed to others, as opposed to being unique to me.

Social media posts

Collection.

Social media is a way for me to get things off my chest that I think are valuable to the world. Oftentimes, I will have an idea, or a piece of software, or a gif, that will give people unique intuition around a perticular topic, for example in bioinformatics. Rather than spending countless hours turning it into a paper or what not, it is easier to just turn it into a social media post. Having done this for a few years now, I have a collection of social media posts. Rather than having to scroll through each of them yourself, I have organized the ones that matter by topic here.

Where I was featured

Interview with Ramji Srinivasan, CEO of Teiko

Video.

This interview took place at AACR 2025 in Chicago. The interview was spur of the moment. We discussed a variety of topics related to flow/mass cytometry and single-cell analysis.

The limits of dimensionality reduction tools for single-cell analysis

Webinar.

Developer Stories Podcast: Part 2: Be the flame, not the moth

Guest on a Podcast.

Here is part 2 of my appearance on the Developer Stories podcast. In part 1, I talked about my transition from biology to programming and bioinformatics in graduate school. In this podcast, I talk about my life after graduation, which started with a big move from Palo Alto, California to Berlin, Germany. I talk about life out here, projects I'm working on, being self-employed, and starting my own company.

Developer Stories Podcast: Part 1: Heavenly light emanating from line 37

Guest on a podcast.

I talk with software developer Vanessa Sochat about my time in graduate school. I started out as a wet-lab biologist. But after taking an intro CS class for fun, I realized that I really enjoyed the dry-lab side of things, and my career trajectory changed accordingly. Have a listen for more details. This is part 1 of 2.

TEDx Basel talk: my scrolling problem, and how I fixed it

YouTube video of my TED talk.

TEDx Basel: caricature of my talk

Work of art.

While I was giving my TED talk, unbeknownst to me an artist in the audience was drawing it out in real time, complete with pen and watercolor. He did this for each of the speakers. I was extremely impressed with what he was able to do given the very limited time. My talk is not yet on YouTube, but if you want the gist of it, have a look at this picture.

Life and love in Berlin during the Coronazeit

Feature.

An article I wrote in 2020 for the annual Krupp Internship e-newsletter. It was written just after the first wave of the COVID-19 pandemic. It serves as a time capsule for that period, in which many of our basic assumptions about the stability and the future of the modern world were upended. I enjoy looking at it again now and then, as it captures a very unique state of mind. It also captures my wedding, which took place the day before the first lockdown, and included toilet paper as a wedding present.

Tyler Alumni Im Portrait

Feature.

"Im" is short for "in dem" in German. Not a typo. An article I did for the Stanford Krupp Internship Program, which had huge impact on my life and career trajectory. In a nutshell, I was pre-med prior to my medical internship at the Charite Hospital in Berlin in 2007. Through the internship, I realized that I wanted to do research rather than clinical work. I got my PhD and came back to Berlin to work on the Charite Hospital campus once again, this time as a researcher! I will be forever grateful for the Krupp Internship program and Stanford's Bing Overseas Study Program.

Fun stuff

Speech memorization helper

Web page.

Take the text you want to memorize and paste it into the box. The text will be split up into individual sentences. The first sentence will be displayed. Recite as much of the subsequent sentences as you can from memory. Buttons will allow you to move to the next sentence or the previous sentence all the way to the end of the text.

Conway's Game of Life Cellvivor

Game.

A game within Conway's Game of Life. You are a blue square that you can move (with arrows), and your goal is to make contact with a "goal" square, colored green, while avoiding all the squares in the Game of Life automata that come at you. Each level up leads to a denser game board. You get five seconds of invincibility (you're colored red) in the beginning of each level, that allows you to get out of the way of any Game of Life objects near you.

Breakout

Game.

A game that will always be of significance to me, because it was the assignment in my intro CS class that made me realize that I really enjoy coding. A simple implementation of breakout that runs on a single html page.

Conway's Game of Life

Web page.

I first came across Conway's Game of Life when I was 16. It completely changed the way I think about how the world works. Or, it helped me realize the way I inherently think about the world. One of those. I consider this the first major event that moved me into the world of computer science later in life. I was above to program this up for the first time in my second intro CS class (Stanford CS106B, C++).

Rules for the grid:

If one cell is alive, and it has 2 or 3 live neighbors, it stays alive.
If one cell is alive, and it has fewer than 2 live neighbors, it dies as if by starvation.
If one cell is alive, and it has greater than 3 live neighbors, it dies as if by overpopulation.
If one cell is dead, then it becomes alive if it has exactly 3 live neighbors, as if by reproduction.

I added a "rule probability" box, that sets the probability that a given rule will proceed for a given cell in the grid at a given iteration. I don't see this in standard game of life implementations, but biological life has a bit of randomness involved, so why not do the same for this? Aside from that, I have added the ability to modify the rules for the grid. This includes the ability to determine how may layers out you look for nearest neighbors. Note that when you tinker with these settings, most of the automata you get will either be too orderly or too chaotic. The Game of Life rules are a delicate balance between the two.

I have also added the ability to modify the size of the grid.

Mandelbrot Set

Web page.

I first came across this in one of my old math books, perhaps in middle school. I just thought of it as a strange cool looking thing at the time, but what I didn't appreciate until later was how simple it was to implement. This is a single html page, under 80 lines in total. Click on any point on the screen to zoom. Note that you do lose resolution if you zoom in long enough.

Other contributions

Former computational biologist, now guest researcher, at the German Rheumatism Research Center in Berlin, Germany. I will always stay connected to my academic roots.
Developing software to interrogate and visualize the local similarities between original manifolds and lower dimensional embeddings. Important for anyone wanting to determine which of these methods is the right tool for the job.
Solved a long-standing data visualization problem for mass cytometry, and developed a Bioconductor package for it, with a visual description here, and this publication…
Set the best practices in my PhD thesis lab for learning bioinformatics as a classical biologist.
Helped develop a wet-lab implementation for a cutting-edge bioinformatics concept, which became part of a patent.
Led an international collaboration between my thesis lab (USA) and a laboratory in Uppsala, Sweden, whose biochemical foundations turned into this publication, and helped this one.
Mentored several high school students and undergraduates throughout my thesis work, teaching them biology, computer science, statistics, and importantly how to integrate these fields.
Built a website for my PhD program using HTML, JavaScript, and SQL, that helped first year students connect with current and previous members of a given research laboratory (I'd link it but you have to be a Stanford student to view it).

Tyler Burns's Website

Table of Contents

About me

New

10,000 random research questions generated by a LLM

LLM-augmented PCA loadings

Popular

Single-cell analysis

LLM-augmented PCA loadings

CyTOF analysis pipeline

KNN sleepwalk examples

CyTOF PBMCs

single-cell RNA sequencing PBMCs

Natural language processing

10,000 random research questions generated by a LLM

(temporarily suspended) Gwasmap

Likes vs retweets

Philosophy and rationality

Computing