Data Models in Power Systems Software

Data models are one of the most overlooked software design aspects in Power Systems software, and in a field that relies on computer simulation, bad designs lead to unnecessary complexity and bad performance. The need for some sort of canonical data model to share data dates back to the 1950’s with the first applications of a digital computer to solve the “load flow” problem and appeared in the literature in a discussion on how to implement a digitial load flow calculator ¹. With the widespread adoption of computer based load flow calculations, myriad solution methods and models led to a problem of data and model exchange, as illustrated by the following quote:

With the growth in complexity of the interconnected power systems in the 1960’s came a corresponding growth in the number of load flow programs being used and in the number of study groups using those programs. This growth resulted in a need to exchange data at an increasing rate.

Working Group on Common Format For Exchange of Solved Load Flow Data 1973

Historically, electric power systems modeling has been the source complex data of requirements. Most importantly, there has been an explicit division between power systems models based on their scope. Different models require different simplifications to obtain system insights that engineers require. These engineering simplifications and assumptions have also carried over to other fields like energy policy and economics. In this regard, the most significant model that has informed data processing and sharing is the “load flow” problem with extension to the singlpe period “economic load flow” problem.

Leon Kirchmayer, in his seminal work about power system economic operation ², provides a detailed account about the early use of computers for the economic optimization of power systems. These rudimentary computational systems were limited to punch cards as the medium to load data. As a result, the first data models were merely column indexes to physical quantities.

Punch cards evolved to become fixed position and fixed order file data models. The first generally agreed data model for power systems computational analysis: the IEEE Common Format published in 1973 ³. The common format data file had lines of up to 128 characters, the lines are grouped into sections with section headers and data items are entered in specific columns. It provided a standard format to store and exchange data based on the original punch card specification, emulating the physical storage medium that preceded.

Although since 1973 there has been a significant increase in computational power, algorithm development, and novel applications of computers to the analysis of electrical power systems, tabular data models still dominate the field. All major data formats and models for commercial and academic power systems software have employed tables with custom specifications to store and exchange system data. In the context of open-source modeling, the data format used in Matpower is standard for encoding system data sets due to the popularity of MATLAB in power system researchers’ circles.

The need to share information evolved in the early 1990s with the advent of automation, and spurred by increasingly complex data needs for power systems operations. The industry required standardized models to exchange more extensive information, resorting to an object-oriented data model. The CIM was developed and later made a standard maintained by the IEC Technical Committee 57 Working Group 13. The aim was to provide a standard definition for power system components geared towards automated EMS, SCADA systems, and asset-management databases. Automation-oriented modeling makes CIM challenging to implement for modeling purposes and is not widely used in any modeling software available today. It is available in only a few commercial power system software and the only open-source parsing implementation is the iTesla library.

One of the key qualities of electric power systems modeling is the rigid separation between steady-state and dynamic modeling practices. Simulation tools have kept separate data models between the two classes of models, and a few commercial providers dominate the market for dynamic modeling. As a result, the dynamic data model is dependent on the software available for the researcher. Such artificial separation hinders cross-domain research and further limits the development of newer models. Some efforts to develop open data models geared towards dynamic modeling such as PSAT have been limited to teaching and are no longer maintained. The data model implemented in python is described partially in ⁴ but has had little uptake.

With the advent of new algorithms, models, and programming languages, as well as broad access to computers, new software tools and data formats proliferated. Milano provides a detailed taxonomy of available commercial and open-source data sources up to 2010⁴. The review includes 17 data models categorized by the supported mathematical models and file format restrictions.

Recently, new static modeling tools such as Pandapower, PyPSA, and PowerModels.jl have used data models largely based on MATPOWER’s original schema. In the dynamic modeling domain, the tool ANDES implemented a data model using symbolic libraries. Also, the OpenModelica library with the capabilities to parse PSS/e and CIM are available. However, developing extensions require some source code modification, and cannot be integrated with steady state models.

The review so far highlights the progress coming from the power systems community, given a more widespread adoption of certain “standard” practices. Several commercial software applications’ dominate in other modeling communities, and each relies on its proprietary data format. Such is the case of production cost modeling, which requires a richer data model to handle large amounts of time series data. Significant efforts has been put towards develop to process XML proprietary data formats into open data sets, but have not resulted in a more systematic approach.

When augmentations are required, MATPOWER provides certain flexibility to augment the data though its “extensions”, this is the most commonly used approach. Extending MATPOWER’s data requires creating makeshift relationships between the user-added arrays and the arrays already in the model. Fixed location and length representations are not inherently designed to store data with mixed data representations and hierarchical structures. Tables are difficult to extend beyond their original design. For instance, adding a new feature implies adding a new column for the totality of the category. To the authors’ knowledge, the production cost modeling community does not have a similar effort as the power systems community, and in most cases, data models used in cost production modeling are extensions of power systems data models. Moreover, the growing importance of data provenance and reproducibility demands solutions that reduce to a minimum the need the develop ad-hoc data models.

In recent years, there has been increasing multi-sector modeling of energy systems. Initiatives like OpenGenome powergenome, Spine spine-toolbox, and the Open Energy Platform have focused on integrating power systems data into broad energy infrastructure models. These initiatives exploit modern computing concepts and architectures like REST API, portable databases, and version control to provide users with a more straightforward pathway to integrate decision models with data. Importantly, these initiatives seek to contribute curated datasets as part of their repositories. Commonly multi-sector projects focus on long-term planning and strategic decision-making, which require economic data on top of the technical device-level data. These techno-economic modeling communities make outstanding contributions by exploiting modern concepts in data management for large systems. For instance, the Open Energy Platform implements advanced table format data sets to facilitate the inspection of datasets.

As a consequence of these data representations’ explosions, model developers devote significant resources to parsing and data model conversion. In most cases, these efforts are developed to serve within the analytical model’s scope. Creating a standard data model and dedicated tools for data management across domains is critical to improving electric energy systems’ modeling practices.

If you found this content useful, please cite:

PowerSystems.jl — A power system data management package for large scale modeling

@article{lara2021powersystems,
  title={Powersystems. jl—a power system data management package for large scale modeling},
  author={Lara, Jos{\'e} Daniel and Barrows, Clayton and Thom, Daniel and Krishnamurthy, Dheepak and Callaway, Duncan},
  journal={SoftwareX},
  volume={15},
  pages={100747},
  year={2021},
  publisher={Elsevier}
}

J. M. Henderson, “Automatic Digital Computer Solution of Load Flow Studies [includes discussion],” in Transactions of the American Institute of Electrical Engineers. Part III: Power Apparatus and Systems, vol. 73, no. 2, pp. 1696-1702, Jan. 1954. ↩
Kirchmayer, Leon K. Economic operation of power systems. Vol. 707. New York: Wiley, 1958. ↩
W. Group, “Common Format For Exchange of Solved Load Flow Data,” in IEEE Transactions on Power Apparatus and Systems, vol. PAS-92, no. 6, pp. 1916-1925, Nov. 1973, doi: 10.1109/TPAS.1973.293571. ↩
Milano, Federico. Power system modelling and scripting. Springer Science & Business Media, 2010. ↩ ↩²