Publications
Notable Papers
- Tao Hong and Shu Fan, "Probabilistic electric load forecasting: a tutorial review," International Journal of Forecasting, vol.32, no.3, pp 914-938, July-September, 2016. (working paper; ScienceDirect)
- Tao Hong, Pierre Pinson, Shu Fan, Hamidreza Zareipour, Alberto Troccoli and Rob J. Hyndman, "Probabilistic energy forecasting: Global Energy Forecasting Competition 2014 and beyond," International Journal of Forecasting, vol.32, no.3, pp 896-913, July-September, 2016. (working paper; ScienceDirect; competition data)
- Pu Wang, Bidong Liu and Tao Hong, "Electric load forecasting with recency effect: a big data approach," International Journal of Forecasting, vol.32, no.3, pp 585-597, July-September, 2016. (working paper; ScienceDirect)
- Tao Hong, Pu Wang and Laura White, "Weather station selection for electric load forecasting," International Journal of Forecasting, vol.31, no.2, pp 286-295, April-June, 2015. (working paper; ScienceDirect)
- Tao Hong, Pierre Pinson and Shu Fan, "Global energy forecasting competition 2012," International Journal of Forecasting, vol.30, no.2, pp 357-363, April-June, 2014. (ScienceDirect; competition data)
- Tao Hong, Jason Wilson and Jingrui Xie, "Long term probabilistic load forecasting and normalization with hourly information," IEEE Transactions on Smart Grid, vol.5, no.1, pp.456-462, January, 2014. (IEEE Xplore)
A summary of my scholarly journal papers
I'm an evidence-based guy. I don't believe any methodology, technique or model is good until I see it working well in action, such as being adopted by industry organizations, commercialized by vendors, winning competitions, or beating others on benchmark datasets. Therefore, I'm a big advocate for forecasting competitions. I founded the Global Energy Forecasting Competition, and chaired all three (GEFCom2012, GEFCom2014 and GEFCom2017) sponsored by IEEE Power and Energy Society. After each competition, I worked with the other organizers to publish the findings. The introduction papers for GEFCom2012 and GEFCom2014 have been published already (Hong, Pinson and Fan, 2014; Hong, et al., 2016), while the one for GEFCom2017 is currently under construction. I have also been encouraging my students to participate in those competitions. After the competitions, I sometimes worked with them to publish their winning methodologies. For instance, Jingrui Xie and I published her methodology used in the load forecasting track of GEFCom2014 (Xie and Hong, 2016).
Because the future is uncertain, a probabilistic forecast can better describe the uncertainties than what a point forecast can do. I have made a big effort in the advancing the state-of-the-art of probabilistic load forecasting. GEFCom2014 had a track on probabilistic load forecasting (Hong, et al., 2016). My review paper (Hong and Fan, 2016) was on probabilistic load forecasting too, though one third of the review was on point load forecasting. The rationale was to try to leverage the point load forecasting literature when developing probabilistic load forecasting models. In (Hong, Wilson and Xie, 2014), we demonstrated the advantage of using hourly data to build point forecasting models for long term probabilistic load forecasting. We then conducted several further investigations into the probabilistic forecasting framework, such as simulating residuals (Xie, et al., 2017), generating temperature scenarios (Xie and Hong, TSG 2018 May), selecting the underlying point forecasting models (Xie and Hong, 2018 Nov), and combining probabilistic load forecasts (Wang, et al., 2018 in press). Note that the residual simulation method was part of Jingrui Xie's winning solution at GEFCom2014 (Xie and Hong, 2016; Hong, et al., 2016). We also invented another method to produce probabilistic load forecasts: applying quantile regression averaging (QRA) to a group of point forecasts (Liu, et al., 2017).
I have a genuine interest in meteorology and its applications in energy forecasting. Temperature and its variants, such as lags and summary statistics, are the most frequently used weather variables in the load forecasting literature. I brought a term "recency effect" from psychology to describe the effect of recent temperature on the load. I discovered that regression-based load forecasting models may be built in a "verbose" way by including hundreds of temperature variables, such as lagged temperatures, daily average temperatures and their higher order terms and interactions with calendar variables, so that a higher accuracy can be achieved (Wang, Liu and Hong, 2016). A byproduct of this recency effect paper is a set of so called sister models, which are built under similar structure but promoted through different model selection processes. We can take advantage of these forecasts by combining them to further enhance the point forecast accuracy (Nowotarski, et al. 2016), or to generate probabilistic forecasts (Liu, et al., 2017). Another temperature-related discovery is that we can improve load forecast accuracy by leveraging a large amount of weather stations, for which I invented a data-driven weather station selection methodology (Hong, Wang and White, 2015). I have also investigated in other weather variables, such as relative humidity and wind speed (Xie, et al., 2018; Xie and Hong, SU 2018). The common finding is that using composite variables with predefined parameters embedded in the calculations, such as temperature-humidity index and windchill index, is not as good as separating the individual components and letting the parameters come out of the estimation process.
Calendar information also plays an important role in load forecasting models, but calendar variables have not attracted as many attentions as weather variables. Many people have been using 12 months from the Gregorian calendar to capture the seasonal patterns occurred annually. I was wondering if a different calendar system would work better, such as the 24 solar terms from the ancient China. Applying both calendars to the load of a U.S. region, we found that the 24 solar terms did lead to more accurate forecasts (Xie and Hong, MPCE 2018).
During the past few decades, the load forecasting field has been flooded with many papers that report various techniques and their combinations or hydrids. Most of them have never been used in practice and will never be used either. Although I try not to have my papers on such a list, I do feel sorry for the readers including myself when I see a highly cited paper presenting a poor usage of some good techniques. Sometimes I can't help but write a critic to show how a technique should have been used. For instance, one of my first journal papers proposes an effective way of using fuzzy regression for short term load forecasting (Hong and Wang, 2014). Unfortunately, due to the difficulties in its implementation, this technique is unlikely to be adopted in practice.
Being a forecasting practitioner, I believe the ultimate place for my models to be is the production environment of the power companies. Thus, I'm very interested in working with my clients on their specific business problems. One of those problems is retail energy forecasting. Electricity retailers usually have very volatile customer base, which significantly affect their long term demand. To resolve the issue, we dissect the retail energy forecasting problem into two subproblems, customer count forecasting and load per customer forecasting. The latter one is no different from a traditional load forecasting problem. To forecast customer count, we brought in survival analysis (Xie, Hong and Stroud, 2015).
I've never seen perfect data quality in real-world applications. Most of the time we have to clean up the data before running some load forecasting models. Otherwise, the models have to be executed with the data quality issues. How do those load forecasting models work if the data is bad, very bad, or extremely bad? We performed a benchmark study to compare four representative load forecasting models under various data conditions (Luo, Hong, and Fang, 2018). We found that none of them can produce accurate forecasts when the data is severely attacked, though multiple linear regression and support vector regression models are more robust than artificial neural networks and fuzzy regression models. Then what can we do if the data is really bad? One way is to apply robust models. We tested three robust regression models and found that they were outperforming the same underlying models built upon conventional load forecasting techniques (Luo, Hong, and Fang, in press). Another approach is to detect anomalies and cleanse the data before using it. Focusing on the recent load values, we proposed a model-based anomaly detection method with an adaptive threshold for very short term load forecasting (Luo, Hong, and Yue, 2018). Focusing on the long-sequence anomalies, we proposed a descriptive analytics based method (Yue, Hong, and Wang, in press).
As of November 2019...
I have 29 research articles published.
These 29 papers are in the following journals:
- IEEE Transactions on Smart Grid (11)
- International Journal of Forecasting (9)
- Journal of Modern Power Systems and Clean Energy (2)
- International Journal of Electrical Power & Energy Systems (1)
- Energy Policy (1)
- Information Sciences (1)
- Operations Research Letters (1)
- Sustainability (1)
- Energy (1)
- Fuzzy Optimization and Decision Making (1)
This is the breakdown by year:
- 2019: 7
- 2018: 9
- 2017: 3
- 2016: 5
- 2015: 2
- 2014: 3