I’m thrilled to be writing about Nested Knowledge’s recent open source contribution, the R package “xrf”. We love analytics and rely on open source at all levels of our tech stack, so it felt natural to release this project to the data science community.

What is xrf?

xrf, an acronym for “eXtreme RuleFit”, is an R package for building predictive models. The RuleFit algorithm (described by Friedman and Popescu) is a method of fitting linear models with learned:

xrf is “eXtreme” in that the popular software “eXtreme Gradient Boosting” (XGBoost) is used to build the model.


Most readers will be familiar with the image of a least squares fit for bivariate data:

In this case, a straight line fit (blue) is reasonable because x & y show a strong linear association. However, much real world data does not exhibit linear association. A contrived example: the amount of fun theme park attendees have as a function of their height:

In this example, there is a minimum height to ride at 55 inches; attendees below that height don’t get to ride and don’t have much fun. The straight line fit is no longer convincing. In order for a model to capture this effect, it may discretize the rider height into two categories (< 55, >= 55). The model fit may then look like:

Discretization is the act of splitting a continuous variable into two or more discrete bins. xrf learns useful discretizations of predictors, often ones more complicated than the above example.

Variable Interaction

When the effect of one or more predictors depends on the values of a different set of predictors, the combined set of variables interact. Continuing the above example, let’s add in another predictor: whether the attendee has eaten ice cream. We’ll visualize it as color, green for having ice cream and red for not:

Attendees who have ice cream and then go on rollercoasters get nausea, which leads to not having as much fun. Attendees who don’t ride a rollercoaster tend to have more fun when having ice cream because it’s delicious! In other words, the effect of ice cream on fun depends on whether the attendee was on a rollercoaster. Since the height of an attendee (>55 inches) indicates whether they can ride a rollercoaster, height and ice cream consumption form an interaction. A modeling of an interaction effect may look like:

In combination with discretization, xrf discovers interaction effects. Although the above example is an interaction of two predictors, xrf may discover interactions between an arbitrarily large number of predictors.

Tree Learning

The magic behind xrf’s ability to learn interactions and discretizations is the decision tree. Without venturing into algorithmic details, a decision tree is a sequence of yes or no questions that result in a final prediction. The above model can be encoded as a decision tree as follows:

Under the hood, xrf uses a decision tree ensembling approach, tree-based gradient boosting, to fit many trees like this to input data. Discretizations and interactions are extracted from the trees and given to a linear model.

In Action

At NK, we use xrf to predict patient outcomes over a variety of diseases, therapies, and background characteristics. Most recently, we have been focused on predicting the impact of thrombectomy devices on neurologic outcomes for ischemic stroke. xrf is a particularly desirable tool because it:

We use these characteristics to extract meaningful insights from our data for clinicians and researchers. These insights inform our development of visualizations or directly result in a data presentation (e.g. variable importance).

A Continued Culture of OSS at Nested Knowledge

At NK, open source is our bread and butter. The prominence of high quality OSS, like Semiotic for visualization and Postgres for data management, makes our engineering team highly productive. As we continue to build our products on open source, we strive to kick back all our bug fixes, documentation updates, and feature developments in kind.