Featured Publications | All Publications | Google Scholar | Scopus

Notable Papers

  • Tao Hong and Shu Fan, "Probabilistic electric load forecasting: a tutorial review," International Journal of Forecasting, vol.32, no.3, pp 914-938, July-September, 2016. (working paperScienceDirect)
  • Tao Hong, Pierre Pinson, Shu Fan, Hamidreza Zareipour, Alberto Troccoli and Rob J. Hyndman, "Probabilistic energy forecasting: Global Energy Forecasting Competition 2014 and beyond," International Journal of Forecasting, vol.32, no.3, pp 896-913, July-September, 2016. (working paperScienceDirectcompetition data) 
  • Pu Wang, Bidong Liu and Tao Hong, "Electric load forecasting with recency effect: a big data approach," International Journal of Forecasting, vol.32, no.3, pp 585-597, July-September, 2016. (working paperScienceDirect) 
  • Tao Hong, Pu Wang and Laura White, "Weather station selection for electric load forecasting," International Journal of Forecasting, vol.31, no.2, pp 286-295, April-June, 2015. (working paperScienceDirect
  • Tao Hong, Pierre Pinson and Shu Fan, "Global energy forecasting competition 2012," International Journal of Forecasting, vol.30, no.2, pp 357-363, April-June, 2014. (ScienceDirectcompetition data
  • Tao Hong, Jason Wilson and Jingrui Xie, "Long term probabilistic load forecasting and normalization with hourly information," IEEE Transactions on Smart Grid, vol.5, no.1, pp.456-462, January, 2014. (IEEE Xplore

A summary of my scholarly journal papers

I'm an evidence-based guy. I don't believe any methodology, technique or model is good until I see it working well in action, such as being adopted by the power companies, winning a competition, or beating others on benchmark data sets. Therefore, I'm a big advocate for forecasting competitions. I founded the Global Energy Forecasting Competition, and chaired all three (GEFCom2012, GEFCom2014 and GEFCom2017) sponsored by IEEE Power and Energy Society. After each competition, I worked with the other organizers to publish the findings. The introduction papers for GEFCom2012 and GEFCom2014 have been published already (Hong, Pinson and Fan, 2014; Hong, et al., 2016), while the one for GEFCom2017 is currently under construction. I have also been encouraging my students to participate in those competitions. After the competitions, I sometimes worked with them to publish their winning methodologies. For instance, Jingrui Xie and I published her methodology used in the load forecasting track of GEFCom2014 (Xie and Hong, 2016). 

Because the future is uncertain, a probabilistic forecast can better describe the uncertainties than what a point forecast can do. I have made a big effort in the advancing the state-of-the-art of probabilistic load forecasting. GEFCom2014 had a track on probabilistic load forecasting (Hong, et al., 2016). My review paper (Hong and Fan, 2016) was on probabilistic load forecasting too, though one third of the review was on point load forecasting. The rationale was to try to leverage the point load forecasting literature when developing probabilistic load forecasting models. In (Hong, Wilson and Xie, 2014), we demonstrated the advantage of using hourly data to build point forecasting models for long term probabilistic load forecasting. We then conducted several further investigations into the probabilistic forecasting framework, such as simulating residuals (Xie, et al., 2017), generating temperature scenarios (Xie and Hong, in press), and selecting the underlying point forecasting models (Xie and Hong, in press). Note that the residual simulation method was part of Jingrui Xie's winning solution at GEFCom2014 (Xie and Hong, 2016; Hong, et al., 2016). We also invented another method to produce probabilistic load forecasts: applying quantile regression averaging (QRA) to a group of point forecasts (Liu, et al., 2017). 

I have a genuine interest in meteorology and its applications in energy forecasting. Temperature and its variants, such as lags and summary statistics, are the most frequently used weather variables in the load forecasting literature. I brought a term "recency effect" from psychology to describe the effect of recent temperature on the load. I discovered that regression-based load forecasting models may be built in a "verbose" way by including hundreds of temperature variables, such as lagged temperatures, daily average temperatures and their higher order terms and interactions with calendar variables, so that a higher accuracy can be achieved (Wang, Liu and Hong, 2016). A byproduct of this recency effect paper is a set of so called sister models, which are built under similar structure but promoted through different model selection processes. We can take advantage of these forecasts by combining them to further enhance the point forecast accuracy (Nowotarski, et al. 2016), or to generate probabilistic forecasts (Liu, et al., 2017). Another temperature-related discovery is that we can improve load forecast accuracy by leveraging a large amount of weather stations, for which I invented a data-driven weather station selection methodology (Hong, Wang and White, 2015). I have also investigated in other weather variables, such as relative humidity and wind speed (Xie, et al., 2018; Xie and Hong, 2018). The common finding is that using composite variables with predefined parameters embedded in the calculations, such as temperature-humidity index and windchill index, is not as good as separating the individual components and letting the parameters come out of the estimation process. 

Calendar information also plays an important role in load forecasting models, but calendar variables have not attracted as many attentions as weather variables. Many people have been using 12 months from the Gregorian calendar to capture the seasonal patterns occurred annually. I was wondering if a different calendar system would work better, such as the 24 solar terms from the ancient China. Applying both calendars to the load of a U.S. region, we found that the 24 solar terms did lead to more accurate forecasts (Xie and Hong, 2018). 

During the past few decades, the load forecasting field has been flooded with many papers that report various techniques and their combinations or hydrids. Most of them have never been used in practice and will never be used either. Although I try not to have my papers on such a list, I do feel sorry for the readers including myself when I see a highly cited paper presenting a poor usage of some good techniques. Sometimes I can't help but write a paper to show how a technique should have been used. For instance, one of my first journal papers proposes an effective way of using fuzzy regression for short term load forecasting (Hong and Wang, 2014). Unfortunately, due to the difficulties in its implementation, this technique is unlikely to be adopted in practice. 

Being a forecasting practitioner, I believe the ultimate place for my models to be is the production environment of the power companies. Thus, I'm very interested in working with my clients on their specific business problems. One of those problems is retail energy forecasting. Electricity retailers usually have very volatile customer base, which significantly affect their long term demand. To resolve the issue, we dissect the retail energy forecasting problem into two subproblems, customer count forecasting and load per customer forecasting. The latter one is no different from a traditional load forecasting problem. To forecast customer count, we brought in survival analysis (Xie, Hong and Stroud, 2015). 

I've never seen perfect data quality in real-world applications. Most of the time we have to clean up the data before running some load forecasting models. Otherwise, the models have to be executed with the data quality issues. How do those load forecasting models work if the data is bad, very bad, or extremely bad? We performed a benchmark study to compare four representative load forecasting models under various data conditions (Luo, Hong, and Fang, 2018). We found that none of them can produce accurate forecasts when the data is severely attacked, though multiple linear regression and support vector regression models are more robust than artificial neural networks and fuzzy regression models. We also investigated in anomaly detection, with the focus on the recent load values for very short term load forecasting (Luo, Hong, and Yue, 2018). Our proposed model-based anomaly detection method with an adaptive threshold outperforms its counterparts.