The Problem

First, a concise definition: Given a set of hyperrectangles, find any set of hyperrectangles which occupy the same space with no overlap & contain at least all the original boundaries. A hyperrectangle is the space defined by the cartesian product of ranges. With less jargon, axis-aligned rectangles/prisms/volumes in arbitrary dimensions.

Take an example - 2 dimension rectangles with some overlap:

Here’s a solution:

This is a valid solution because:

Motivation

Before describing a solution to this (possibly vacuous seeming) problem, I’d like to motivate it with an example.

The RuleFit algorithm (my implementation here) is a predictive modeling method. At a very high level, its purpose is to produce a set of conjunctive ranges on the predictor set (imagine them as a set of real valued vectors) which explain variation in the response vector. Take for example a predictor set of {age, weight, height} with a response of jumping height (i.e. we’re predicting how high someone can jump). One might imagine that lower age, lower weight, and larger height produce larger jumping height, and vice versa; RuleFit may pick up on such a pattern & produce, for example, the following set of “rules”:

  • {height:[130-200] & weight:[50-60] & age:[10-25]}
  • {height:[175-225] & weight:[55-75] & age:[20-35]}
  • {height:[100-250] & weight:[70-100] & age:[45-55]}
  • etc.

with the idea that a person qualifying for one or more of these rules provides useful information for predicting jumping height. In reality, each rule is assigned a real value (“effect”) which are summed up to produce the jumping height prediction.

Notice that many of these ranges are overlapped, which means a person may qualify for many rules. That makes interpretation & inference somewhat challenging, since we cannot longer can glance at a single rule as a single “nugget” of information. The practitioner is forced to contextualize rules & compute the high-dimensional intersections, which a human mind (or at least mine) isn’t good at. Wouldn’t it be great if these rules were disjoint, so that a person can only qualify for one rule? Solving the overlapped hyperrectangle problem does just that.

(Note: I’ve made some over-simplifications of RuleFit and encourage you to check out the above link if you’re interested.)

Solution

A conceptually simple solution is to:

To prove it’s a solution:

Without ado, we step into some code.

Representation

Hyperrectangles will be represented in dimension-range format. In this representation, an n-dimensional hyperrectangle consists of n dimension-ranges. For this exercise, hyperrectangles are given an id and all placed into one data frame. Here we define and visualize a 2 dimensional example:

example_2d <- data.frame(
    dimension = rep(c('x','y'), 3), 
  volume_id = rep(1:3, each=2), 
  min = c(1,1,
          2,2,
          2.5,3), 
  max = c(3,5,
          5,5,
          4,7)
) 

Extending Boundaries

This is as simple as picking “dimension=value” as a fixed point (while letting all other dimensions be free) for all range bounds.

build_fully_partitioned_space <- function(volumes) {
  volumes %>% 
    mutate(bound = min) %>%
    select(dimension, bound) %>%
    rbind(
      volumes %>% 
        mutate(bound = max) %>%
        select(dimension, bound),
      stringsAsFactors = FALSE
    )
}

Visualizing our example:

Recreating Hyperrectangles

The next step is to take the orthogonal hyperplanes from the above step and form then into hyperrectangles abutted against one another.

generate_volumes_from_partitioned_space <- function(partitioned_space, id_starter = 1) {
  if (nrow(partitioned_space) == 0) {
    return(data.frame())
  }
  
  # pick an arbtirary first dimension
  dimension_of_interest <- partitioned_space$dimension[1]
  dimension_bounds <- partitioned_space %>% 
    filter(dimension == dimension_of_interest) %>%
    # this is a small optimization - equal bounds are redundant
    distinct() %>%
    arrange(bound)
  
  # there should always be 2 or more, since each bound corresponds to hyperrectangle edge
  stopifnot(nrow(dimension_bounds) > 1)
  
  # subspace meaning everything outside the dimension of interest
  partitioned_subspace <- partitioned_space %>% filter(dimension != dimension_of_interest)
  # recursively build ranges from the subspace before tacking on ranges for the dimension of interest in this stack frame
  subspace_volumes <- generate_volumes_from_partitioned_space(partitioned_subspace, id_starter = id_starter)
  
  # "expanded" by the dimension of interest, that is 
  expanded_volumes <- data.frame()
  for (bound_ix in 1:(nrow(dimension_bounds) - 1)) {
    # note that we are iterating on the sorted bounds
    lower_bound <- dimension_bounds$bound[bound_ix]
    upper_bound <- dimension_bounds$bound[bound_ix + 1]
    
    if (nrow(subspace_volumes) == 0) {
      # case this is the first dimension - there's nothing to add onto
      volume_id <- paste0(id_starter, '_', dimension_of_interest, '_', bound_ix)
      new_dimension_bounds <- list(volume_id = volume_id, 
                                   min = lower_bound, 
                                   max = upper_bound,
                                   dimension = dimension_of_interest)
    }
    else {
      # case this is after the first dimension - create a new volume for each subspace volume with the new bounds added (cartesian product)
      new_dimension_bounds <- lapply(unique(subspace_volumes$volume_id), function(volume_id) {
        list(volume_id = paste0(volume_id, '_', dimension_of_interest, '_', bound_ix), # TODO this form of creating an ID could get costly in higher dimensions
             min = lower_bound, 
             max = upper_bound,
             dimension = dimension_of_interest)
      }) %>% bind_rows() %>%
        rbind(subspace_volumes %>%
                mutate(volume_id = paste0(volume_id, '_', dimension_of_interest, '_', bound_ix)),
              stringsAsFactors= FALSE)
    }
    
    expanded_volumes <- rbind(expanded_volumes, new_dimension_bounds,
                              stringsAsFactors = FALSE)
  }
  
  return(expanded_volumes)
}

Visualizing our results (with some difficulty, there’s a lot of colored rectangles):