As paid search marketers, we’re always looking for better ways to give our clients the best analytics and most accurate forecasting for their KPI’s and business goals. There are many approaches to forecasting; however, we’ve found that regression analysis has always been extremely effective. In this article, we will take a closer look at simple linear regression, which allows us to find and study the relationships between two variables: one dependent and one independent. We’ll also take a look at how the historical relationship between both variables is able to help in predicting future values. I promise not to get too technical!
Let’s take one of our clients that advertises on TV and also leverages paid search online as an example. They are interested in a few things:
1) Is there any relationship between how much they spend on TV advertising and their paid search brand impressions?
2) If there is a relationship, how strong is it?
3) How much of an increase in paid search brand impressions can be expected if there is an increase in TV advertising by X?
In this example, TV advertising is the independent variable and paid search brand impressions is the dependent variable. In other words, TV advertising doesn’t depend on paid search brand impressions; however, paid search brand impressions depend on TV advertising. To answer the questions above, I’ve created a scatter plot in Excel using sample data:
Before reading too much into the chart, let’s first identify and explain what each element is:
- y = 14.739x – 15667 (this is the statistical model to predict how the independent variable will affect the dependent variable
- y = dependent variable (Brand Impressions)
- 14.739 = slope of line
- x = independent variable (TV Advertising)
- -15667 = y-intercept (where the slope intersects with y-axis)
- Blue squares
- “Observation” points of data (can be predicted or actual historical data)
- Black ascending line
- Represents the slope and a predictor of the values
- R2 tells us how well the regression line fits the data
- The value of R2 ranges from 0 to 1. An R2 closer to 1 means the regression line fits the data well and a great model to use. An R2 closer to 0 means the line doesn’t fit the data well and would be an unreliable model.
- The closer the “observations” (blue squares) are to the line, the more accurate the model will be
Hopefully I haven’t lost you yet! Now that all of the definitions are laid out, we can now answer the questions proposed earlier.
1) Yes, there is definitely a relationship between how much is spent on TV advertising and paid search brand impressions. How do we know? Read answer number 2!
2) The relationship between both variables in this model is extremely strong. As we saw from R2, it’s at 0.93, which is basically saying that 93% of the variation in the data can be explained by the model.
3) We can leverage the equation at the top of the chart to figure out the incremental lift in paid search brand impressions as TV advertising changes
- y = 14.739x – 15667 (this is saying that for every $1 spent in TV advertising, the client will get an estimated 15 impressions. For example, if the client spends $5,000 in TV advertising, they could expect to get 58,000 impressions (14.739 x 5,000 – 15667)
There are a ton of other great examples that you can apply regression analysis to with paid search – and it’s not just limited to using one variable! We will be following up on multiple linear regression analysis, where there are many independent variables affecting the dependent variable. For example:
- How a coupon dropped on Monday at 9AM in December affects conversions and revenue
- How quality score and max bid affect position
- How TV advertising on Wednesday at noon will affect impressions and clicks
For more information regarding regression analysis, I’d recommend checking out the Wikipedia page.
There are also some great introductory videos from a YouTuber.