# Funnel Charts in R with funneljoin - Gaps and Deadlines

## funneljoin

In my previous blog post, I explored the awesome funneljoin package. Check that post out for an overview of the package and some examples.

In this post, I’m going to investigate business problems that can’t be modeled with gaps, and how we can work around that.

Gap

• if the next event must happen within a window of time, use gaps.

Example:

if a customer clicks on an ad, and then purchases the product 30 days later the company may not want to attribute that purchase to the ad.

• if the next event must occur before a specific time, use deadlines.

Examples:

Turbo Tax wants to know the conversion rate between uploading your W-2s and filing your taxes. But they only care if you file your taxes before April 15.

A football coaches need to print off opponent reports before their upcoming game.

In the last blog post, I showed how funneljoin supports gaps. Let’s figure out how we can model deadlines using funneljoin.

library(dplyr)
library(tidyr)
library(purrr)
library(funneljoin)
library(lubridate)

logs
## # A tibble: 1,759 x 3
##    date        user event
##    <date>     <int> <chr>
## # … with 1,749 more rows

We want to separate the logs based on their deadline, which is the week of the upload. We can do that with lubridate::floor_date and tidyr::nest.

nested_events <- logs %>%

nested_events
## # A tibble: 4 x 2
##   <date>     <S3: vctrs_list_of>
## 1 2020-01-05 18267 , 18267 , 18267 , 18267 , 18267 , 18267 , 18267 , 18267…
## 2 2020-01-12 18274 , 18274 , 18274 , 18274 , 18274 , 18274 , 18274 , 18274…
## 3 2020-01-19 18281 , 18281 , 18281 , 18281 , 18281 , 18281 , 18281 , 18281…
## 4 2020-01-26 18288 , 18288 , 18288 , 18288 , 18288 , 18288 , 18288 , 18288…

This gives us a set of events for each deadline period.

Now, we want to apply the funnel and summary for the events in each deadline period. We’ll use purrr::map to calculate the conversions for each set of events.

customer_funnel <- function(.data) {
.data %>%
funnel_start(
moment = "event",
tstamp = "date",
user = "user"
) %>%
funnel_steps(
moment_types = c("submit", "print"),
type = "any-firstafter",
)
}
nested_events %>%
mutate(conversions = map(
events,
~customer_funnel(.) %>%
summarize_conversions(date_print)
)) %>%
unnest(conversions) %>%
select(-events)
## # A tibble: 4 x 4
##   <date>        <int>          <int>         <dbl>
## 1 2020-01-05      300            132         0.44
## 2 2020-01-12      250            184         0.736
## 3 2020-01-19      174            150         0.862
## 4 2020-01-26      118            118         1

In this case, the conversion percentages are the same whether we use gaps or deadline.

However, this isn’t real data. The simulated data I created doesn’t cover every scenario. With real data, we need to carefully think if we want gaps or deadlines.

## What I’d like to see

As much as I love funneljoin, I think the

nest -> funnel -> summarise -> unnest

workflow is a little clunky.

I think it would be cool if funnel_start respected a grouped data frame, when looking for the next step in the funnel.

For example, we can pass a grouped data frame to funnel_start:

logs %>%
mutate(week = week(date)) %>%
group_by(week) %>%
funnel_start(
moment = "event",
tstamp = "date",
user = "user"
) %>%
funnel_steps(
moment_types = c("submit", "print"),
type = "any-firstafter"
) %>%
select(date_upload, date_print)
## # A tibble: 5,476 x 3
##              <dbl> <date>      <date>
##  1               1 2020-01-06  2020-01-24
##  2               1 2020-01-06  2020-01-31
##  3               1 2020-01-06  2020-01-24
##  4               1 2020-01-06  2020-01-31
##  5               1 2020-01-06  2020-01-24
##  6               1 2020-01-06  2020-01-31
##  7               1 2020-01-06  2020-01-24
##  8               1 2020-01-06  2020-01-31
##  9               1 2020-01-06  2020-01-24
## 10               1 2020-01-06  2020-01-31
## # … with 5,466 more rows

But funneljoin finds the firstafter print date outside of the week of the upload (the group parameter for the data-frame).

I want to find the firstafter inside the week of the upload (the group parameter).

## Conclusion

Gaps or deadlines depend on the business we are trying to model. While gaps are very easy to use with funneljoin, we can still analyze conversions in a deadline-based funnel with a little extra work.