The Aggregated Challenges of Regulating Energy Usage Data

A daily stream of articles and thought pieces trumpet the promise of big data and analytics to transform the utility industry. A 2015 report, Knowledge is Power: How Improved Energy Data Access Can Bolster Clean Energy Technologies and Save Money, laid out a series of ways that energy data can create economic and environmental benefits, including up to $1 trillion in efficiency-related energy savings over 10 years. But how do you establish a regulatory approach for balancing customers’ interest in keeping their individual energy usage information private with the same customers’ desire for good public policy, a healthy environment, and robust markets?

Several public utility commissions (PUCs) have dealt with the question of who should get access to energy data and why. Commissions in California and Colorado initially led the way on energy data access and privacy issues, but Illinois, Vermont, North Carolina, and New York have also considered data aggregation standards. Texas, Oklahoma, and Ohio, among other states, have evaluated data privacy requirements specifically associated with smart meters. Yet for certain utilities in Alaska and Florida, energy usage data is not treated as confidential.

These regulatory proceedings have frequently explored customers’ rights to receive their own data and transfer it to third parties, and the rights of contracted entities like energy efficiency providers to receive customer data. However, discussions about what data can or should be released publicly have left something to be desired. Generally, customer energy data that can be publicly released without specific consent needs to be “aggregated” (combined with other data) or “anonymized” (stripped of unique identifiers). Aggregating and anonymizing data allows for it to be used publicly without pinpointing specific customers. This type of data can play an important role in climate action planning for state and local governments, helping them to measure progress toward clean energy goals and identify new areas of action. It can also help emerging energy businesses evaluate new products and services.

A recent example of aggregated data reports are the Community Energy Reports (CERs) that the Colorado Public Utilities Commission now requires Colorado investor-owned utilities to publish annually. These CERs provide information about how much electricity and natural gas cities and counties in Colorado use, as well as how many residents and businesses participate in voluntary solar, energy efficiency, and green pricing programs. Xcel Energy published its first set of annual CERs in July 2016.

Developing these CERs was a learning experience and the results of this effort may not be fully apparent for a year or more, as Colorado communities begin to work the data into their climate action planning efforts. The process yielded five recommendations that are worth consideration for any energy regulatory proceeding considering data access and privacy standards.

  • Recommendation 1: Involve the Right Skill Sets
  • Recommendation 2: Define the Data
  • Recommendation 3: Define What Privacy Means
  • Recommendation 4: Establish a Clear Process for Data Requests
  • Recommendation 5: Consider Who Should Enforce the Rules


Recommendation 1: Involve the Right Skill Sets

First and foremost, statisticians and data scientists must be involved in any discussion like this. Where PUCs are in the lead, such as through a rulemaking, often there are other state agencies that can provide them with guidance, and sometimes lend staff expertise if needed. Departments of public health, education, and revenue are often tasked with maintaining sensitive data about taxpayers, medical records, and educational records, while still producing public information for evaluation of their programs by the public. They often document their internal practices and may be able to share them with other government staff.

Recommendation 2: Define the Data

Talking about data in the abstract doesn’t work. A critical first step is to define a series of “use cases” for the types of data that are available or may be requested. For example, do rules relate to a business’s energy consumption, or the locations of all solar installations in a particular community? Is a data requestor looking for critical infrastructure data, or the number of energy efficiency rebates provided for low-income weatherization programs? Moreover, is the data granular—either temporally (15-minute demand vs. annual consumption) or geographically (in a block, a neighborhood, or a county)? Is it from a month ago or five years ago? What are the benefits that access to that data can create, and how granular does the data need to be to lead to those benefits?

Answering questions like these helps to identify the risk associated with the data. Risk is the likelihood of a negative outcome, and how negative it is can vary by frequency (many people are impacted) or magnitude (the harm is especially great). While there are no doubt risks from exposing an individual customer’s or business’s 15-minute data, ask if those same risks apply when looking at annual totals for a neighborhood, or the total dollars spent on energy efficiency rebates in a community.

Moreover, it is important to ask whether the data is publicly available in other venues. For example, Xcel Energy agreed with local governments that aggressive data aggregation standards should not prevent a utility from releasing community-wide data about local solar and energy efficiency investments. On the other hand, several parties recently filed a motion to compel Xcel Energy to disclose exactly this type of solar data because it was applying data privacy rules to information requested through discovery, despite that information being publicly available in a different proceeding. Parties should be very specific about what data the privacy rules will or should apply to, rather than leaving it open for debate.

Recommendation 3: Define What Privacy Means

This is a critical — but often overlooked — step in the process. What does it mean to have one’s privacy violated, when it comes to energy data? Is it about learning that someone is a customer of an energy program? Is it about knowing their exact energy consumption minute-by-minute? Is it about a competitor being able to guess how a business uses energy?

Depending on how privacy interests are defined, there may be different approaches to aggregate or anonymize data. For example, a 2014 Pacific Northwest National Laboratory report defined the privacy risk for tenants in commercial buildings as the likelihood that their energy usage was roughly similar to the average of all tenants’ energy usage within the building, and so could be guessed easily by a building owner. They found that with 2-3 meters, there was a higher risk of tenants’ energy use approximating the average, but the risk decreased steeply at 4-5 meters. Accordingly, many local government programs that require building energy benchmarking have worked with utilities to aggregate four or five tenants’ data together where building owners are the ones making the requests.

The Energy Information Administration (EIA), on the other hand, uses several methods to protect market data, including one known as the “P Percent Rule.” The P Percent Rule roughly means that data is only released if a company’s next-largest competitor could not guess their electricity use within a certain level of accuracy.

Currently, the most commonly used data aggregation standard (at least, among states that have considered this issue) is the “15/15 Rule,” adopted by California and Colorado. The 15/15 Rule states that data cannot be released if there are fewer than 15 entries within the dataset, or one entry comprises more than 15% of the aggregated data. However, the American Statistical Association Committee on Privacy and Confidentiality states that it is overly restrictive, and recommends other approaches (including P Percent) that are based on a statistical analysis of the underlying data at issue. Defining the data and what it means for it to be “disclosed” helps establish what sort of protective measures are required, and how to calibrate them to balance privacy with access.

Recommendation 4: Establish a Clear Process for Data Requests

Data requires context to be meaningful. Information about a community’s energy consumption is not meaningful if some data is redacted in one year and not in another, and there is no basis to know whether there was even a redaction. Ideally, requests for aggregated data should be set up to be consistent, with understandable metrics and a clearly defined order of operations for aggregation. Xcel Energy and Colorado local governments collaborated to create a process that allows cities and counties to submit GIS shapefiles to be used to develop the CERs, which means both entities start from a common baseline, instead of leaving questions about how closely utility records track with city boundaries. The process addressed many other questions, including:

  • Whether customer counts were based on the end of the year or the yearly average;
  • Whether industrial customer data should be removed or merged with commercial customer data if it violates an aggregation rule; and
  • Whether commercial and industrial electricity use should be separated out based on tariffs, or based on NAICS codes.

These are the kinds of tricky decisions that Colorado local governments and Xcel Energy made in the process of establishing the CERs and establishing this process helps to create consistency and shared understanding of what the data reports are and aren’t.

Unfortunately, the rulemaking in which these decisions were made did not last long enough to allow for the creation of other reports that local governments could request, such as monthly reports, or reports by neighborhood. Instead, Colorado’s rules explicitly authorize local governments to ask for other data reports if they want to, but leave Xcel Energy solely responsible for determining whether those reports are “overlapping.” This approach led to challenges early in the data rules’ implementation—for example, Xcel treated two requests for energy use by two unconnected neighborhoods as potentially overlapping simply because they were from the same local government requestor. However, hopefully this has been tightened going forward.

Recommendation 5: Consider Who Should Enforce the Rules

Given these issues, are utilities the right entities to be implementing data privacy rules and responding to data requests?

In the absence of clear PUC direction and policy, utilities may not have the right incentives to respond to data requests in ways that promote reasonable access. Their software may be too old to process and combine data quickly, and outside the load forecasting department, their staff may not have data science expertise. Utilities also have liability concerns, and they may not want to be tasked with policing non-disclosure agreements (NDAs) or other tools designed to ensure that parties who request data use it consistent with the terms of the request. Finally, utilities may simply not want to give their data away, even if they haven’t figured out quite how to leverage it. Public Service Company of New Mexico (PNM), for example, proposed a Customer Analytics Initiative (PDF @ p. 557) to better segment potential energy efficiency customers and target them with improved messaging and products. Utilities’ access to customer data to market their own programs, and their disincentives to share it with competitive industries like solar and storage, or local governments trying to run energy programs, have come up in California’s Distribution Resource Plan and New York’s Reforming the Energy Vision proceedings.

With these challenges, are there entities other than utilities that are better-positioned to implement energy data rules? In fact, there are two other options. First, state PUCs or energy offices could develop in-house statistical branches (or add to existing statistical staff). This is common among other state agencies — for example, the Colorado Department of Public Health and the Environment has a Health Statistics and Evaluation Branch with public datasets available for download. The U.S. Census Bureau is an example of a federal agency tasked with protecting customer privacy while making massive amounts of data available to the public and to researchers, but there are so many federal statistical agencies that there is a federal website dedicated to documenting their data and practices.

Second, a research center or university could take over data management. A virtue of these institutions is that they may have access to advanced computing power and staff from diverse disciplines. California has considered the possibility of transferring responsibility to a research center, and Colorado’s General Assembly attempted — but failed — to pass legislation requiring this in 2011.

Transferring at least some responsibility for managing data requests to these organizations has several benefits. First, they are able to marshal statisticians to set internal rules and policies about data requests. Second, they can be tasked with the objective of creating clear and consistent data sets that are available on a regular basis, which reduces the risk of re-identification through unusual requests. Third, they could aggregate non-energy data sets with energy datasets, which would generally be outside utilities’ purview. Fourth, they are able to work individually with researchers or institutions to provide data under NDAs and other tools, and to enforce those tools. Finally — and most critically — there is no competitive risk for these entities to obtain, process, and protect data.

Conclusion

The debate over how to handle energy data in an increasingly digital world is being had at PUCs around the country, often with less than satisfactory results, for utilities, customers, or third parties. An overly restrictive approach will create a “tragedy of the data commons,” where useful data is lost even if no one would have been harmed by its release. Applying the recommendations above — which are drawn from both experience and good practices by state and federal statistical agencies — can create practical, workable results. Stakeholders at PUCs will need to think carefully about how to articulate their objectives within data privacy proceedings, and become very concrete about what opportunities certain practices may create, or what customers may lose when certain practices hinder markets and public policy.

Kelly Crandall is a Senior Rates and Research Analyst for EQ Research, LLC. She provides clients in the government, business, and nonprofit sectors with expert witness and policy advising services, with a focus on grid modernization and data privacy. Prior to joining EQ Research, Ms. Crandall was the Energy Strategy Coordinator for the City of Boulder, where she worked with staff at several other cities and counties to establish rules and procedures that would require Colorado utilities to provide annual reports on energy consumption to support communities’ climate action plans.