Chloe Hsu

Predicting Retail Gas Price Trend

Consider the following scenario. Suppose I drive about the same distance every week and I have enough gas left in the tank for another week. If I know the gas price is going down next week, I could save some money by waiting till next week to fill up the tank.

With this motivation, it’d be nice to predict retail gas price trends. Compared to crude oil, retail gas price changes more slowly (larger autocorrelation), so it might be easier to predict than crude oil.

This project compares three models for predicting weekly trend of retail gas prices:

  1. Multivariate Rolling Regression
  2. ARIMA
  3. Logistic Regression

The predictors include: crude oil and gasoline spot prices, as well as crude oil and gasoline stocks. All data are from U.S. Energy Information Administration.

Model Comparison:

Logistic regression is the best model for classifying retail gas price trend. Under reasonable assumptions, an average driver would save about $65 in 2011-2016 in Los Angeles.

Table of Contents in Jupyter Notebooks

  1. Multivariate Rolling Regression Model for National Retail Gas Price
  • Set-up
  • Differentiation
  • Predictors
    • Correlation with Predictors
    • Select Predictors by LARS Path
    • Correlation betweeen Predictors
    • Selected Predictors
  • Rolling Regression
    • Change of Regression Coefficients over time
  • Test in Cross-Validation Period
    • Metric 1: Correlation of Predicted and Actual Log Return
    • Metric 2: Prediction of Price Trend
  1. ARIMA Model for National Retail Gas Price
  • Set-up
  • Differentiation
  • Autocorrelation
  • Partial Autocorrelation
  • Model Choice: ARIMA(3,1,0)
  • Fit Model
  • Test in Cross-Validation Period
    • Metric 1: Correlation of Predicted and Actual Log Return
    • Metric 2: Prediction of Price Trend
  • Comparison to Multivariate Rolling Regression
  1. Logistic Regression Model for National Retail Gas Price Trend
  • Set-up
  • Binary Classification Problem
  • Features
  • Logistic Regression Path
  • l2 Regularization Parameter
  • Test in Cross-Validation Period
    • Prediction
    • Recall
    • Accuracy
  • Comparison to Multivariate Rolling Regression and ARIMA
  1. Saving Money from Predicting Los Angeles Gas Price Trend
  • Set-up
  • Correlation between National and Local Gas Price Move Direction
  • Logistic Regression Model for Local Price Move
    • Features
    • Logistic Regression Path
    • l2 Regularization Parameter
  • Test in Cross-Validation Period
    • Prediction
    • Recall
    • Accuracy
  • How Much Money Can I Save?
    • Is it even possible to do much better?

Python Jupyter notebooks on github