First of two stories.
At 3:57 p.m. on Aug. 14, 2003, a fog of ignorance and confusion enveloped a critical crossroads of the eastern U.S. power grid.
Don Hunter, reliability coordinator at a Midwest grid control center in Indiana, buzzed operators at FirstEnergy in Akron, Ohio, the power company running generating plants and transmission in eastern Ohio.
Grid and power plant operators in an around Ohio were seeing increasing instability in the power network, and the FirstEnergy operators had been fielding anxious calls from Hunter and others for an hour. "I think we've got something seriously sick," one caller told the FirstEnergy control room.
This time, Hunter asked about the status of a particularly critical high-voltage line southeast of Cleveland. "We have no clue," FirstEnergy's Jerry Sanicky responded, according to a transcript of the phone call. "Our computer is giving us fits too. We don't even know the status of some of the stuff around us." Neither FirstEnergy operators or its neighbors called for emergency actions to save the grid.
What the operators couldn't see was a succession of transmission line outages that began that afternoon when one line heated by the current it carried sagged into a tree underneath -- a tree that FirstEnergy had failed to properly trim. The outage sent more power onto adjoining lines, and more cables sagged, contacting other trees the company had not maintained, and shorted out.
At 4:06 p.m., the failures of one line after another finally became an uncontrollable chain reaction, sending a massive surge of power around Lake Erie into New York, Ontario and Michigan. It triggered grid protective devices that shut down generators, substations and power lines, throwing 50 million people in all or parts of eight states and Ontario, Canada, into a 19th-century world without electricity. Some parts of the United States did not get power restored for four days.
The blackout response
The blackout prompted changes in federal law and regulations and industry practices meant to prevent such widespread outages.
In 2005, Congress passed the Energy Policy Act authorizing the Federal Energy Regulatory Commission to issue mandatory reliability standards, backed up by maximum fines of $1 million per violation. Before that, grid reliability had been a hit-or-miss responsibility left up to the power industry.
Ten years later, grid reliability has a higher priority, said FERC Chairman Jon Wellinghoff. FERC and the power industry have wrestled over the standards-setting process, with continuing frustration on both sides. The rules are a reality, however.
"We didn't have the authority until 2005 to put rules in place," Wellinghoff noted. Some utilities followed the voluntary standards that existed in 2003, but others didn't. "Now it is codified into a rule," he said in an interview, as he prepares to leave FERC with the conclusion of his term.
FERC made an example out of Florida Power and Light Co., following a 2008 blackout in the state blamed on an engineer's unauthorized actions. FPL, while not acknowledging it violated reliability rules, agreed to pay a record $25 million civil penalty. "Now that we have those kinds of penalties, and rules and procedures in place, it is ratcheting things up for the entire industry on their response and how their response can be enforced," Wellinghoff said.
"I think we are at least holding our own right now," Wellinghoff said.
"One of the most fundamental changes over the past 10 years is making the reliability standards mandatory," said Jeff Dagle, an engineer with the Pacific Northwest National Laboratory who served on the U.S-Canadian task force that investigated the blackout. Before 2003, "there was more of a mixed bag in terms of utilities at the top of the game and those that had a lot of issues. Today, you have a more uniform requirement to maintain minimum reliability requirement or face penalties."
"Reliability had been something that had been left to the engineers to deal with. This forced it higher on everyone's agenda," said David Cook, senior legal counsel of the Atlanta-based North American Electric Reliability Corp. (NERC), the industry organization selected by FERC to draft the reliability standards and oversee compliance under the 2005 law.
"The presence of the standards really caused people to take a new look at how they were dealing with these issues. And that's a good thing," Cook said.
Along with mandatory rules, the grid's technology has improved, as some utilities make use of advanced monitoring devices called synchrophasors that track changes in voltages and other grid conditions in milliseconds, Dagle said.
However, the grid itself keeps becoming more complex, experts say. More wind and solar generation is in the mix, sometimes requiring fast access to backup power when the wind dies or clouds move in. The shale gas revolution and emissions regulations are accelerating a switch from coal-fired generators to gas turbines, changing the geography of power supply. Overshadowing these challenges is the threat of cyberattacks on grid command computers and devices.
"Part of me is saying it's kind of a miracle we don't have more problems on the reliability side" because of the expanding challenges facing the grid, Dagle said. That is countered with FERC and NERC being much more aggressive trying to enforce standards -- that and the technology that is becoming available.
"They pull against each other; maybe net we're more or less ahead," Dagle said.
The initiating cause of the 2003 blackout was human -- not technical -- failures that ran from control room operators to FirstEnergy management, the investigators determined.
There were not even voluntary standards on vegetation management then, and cutting corners on that responsibility was a convenient way for some companies to reduce operating costs, experts say.
The FirstEnergy control room had an alarm system to alert operators when major lines went down. But it had crashed more than an hour before the outage and the operators said they hadn't noticed. More fundamentally, the company did not have emergency procedures in place to deal with such an emergency nor operators trained to respond, the investigation found.
The Midwest Independent Transmission System Operator (MISO) in Carmel, Ind., the regional reliability coordinator where Hunter worked, was also fog-bound. A grid-monitoring computer had crashed early in the afternoon. A technician fixed the system but went out to lunch, forgetting to turn on the control that refreshes data every five minutes -- a requirement for MISO to track the fast-moving crisis.
In the days immediately following, grid managers pointed fingers over who and what caused the outage. They didn't know.
A 228-page investigative report by a U.S. and Canadian task force issued in 2004 became the primer for grid operating changes overseen by FERC.
The FERC-NERC standards process produced a new standard on tree trimming, one of its most effective outcomes, Wellinghoff said. "We haven't had a vegetation contact grow-in in two years," Cook said.
Another change zeroed in on a technical issue: the settings on many of the protective relays on power lines that trip circuit breakers when they detect high current and low voltage after a short circuit. The relays had not been set with a cascading blackout scenario in mind, and they disconnected some grid units that didn't need to be shut down, Cook said.
"They perceived things happening on the system that weren't really happening," he said. "That had the effect of spreading the cascade further."
FERC-NERC standards on relay settings have been made more precise to address that issue, he said. "Operating training is getting more attention now," he said, another standards focus.
But the standards process has frustrated FERC and NERC, officials concede. Instead of vesting sole responsibility for reliability with FERC, authors of the 2005 act heeded the power industry's push for divided roles. FERC could call for new standards to address a reliability risk. The standards would have to be written by NERC and submitted to FERC. If the commission disapproved it would have to pitch the issue back to NERC. It couldn't write new rules on its own.
Initial inspections by NERC led to a rash of violations, major and minor. A FERC review counted 3,300 active violations in September 2011, with new ones coming in faster than old ones were being cleared.
The process focused too much on "chasing each individual violation to ground almost without regard to how significant it was," Cook said.
"We have taken some significant strides in reducing it [the backlog]," Cook said. A year ago, NERC switched to a "find, fix and track" process for low-level issues that calls on companies to make fixes without recording them as violations. That leaves more time for the significant issues, Cook added. "They are working the pile down," he said.
Cook notes that NERC has changed its standards approval process to speed it up in response to FERC's complaints. "I know the commission is frustrated with the pace of some of the standards efforts," Cook said. "We've made changes. ... It's too soon to see results from that."
NERC remains an industry organization seeking common ground among power companies with widely different profiles and agendas. It stands in between FERC, pressing for faster action, and the industry members, which are wary about top-down operational mandates. FERC and NERC do not regulate local power distribution utilities, the last link in the chain -- state commissions do.
"Whenever you have something go through a consensus building process, you're not going to get everything that everybody wanted in the standard. That's too high a bar. And I think the commission understands that," Cook said. "I think we're in a better place with the commission than we were."
Wellinghoff said FERC's staff has been working closely with NERC to achieve an orderly work flow on standards and to get results that the commission is seeking, he said. "I think our relationship with NERC has improved substantially," Wellinghoff said. "We are getting to a point of doing as much as we can with the system, given its limitations."
History repeats itself
Despite all the work on grid reliability since 2003, a cascading outage that knocked out power for 2.7 million customers in San Diego, Calif., and its surrounding region on Sept. 11, 2011, revealed some of the same human failings that triggered the earlier outage, according to an FERC and NERC staff investigation.
It began when an Arizona Public Service technician was sent to remedy a problem at a substation on a 500-kilovolt transmission line that provides vital power supply from Arizona generators to San Diego. Although the technician had done the same procedure a dozen times, this time he accidentally omitted two steps that had to be taken in sequence. The high-voltage line tripped.
The operators erroneously thought they could quickly remedy the problem, not realizing the scope of the outage conditions, the investigators said. The same relay settings issues that magnified the 2003 outage did the same this time, automatically shutting down key equipment so quickly that operators couldn't respond, initiating cascading outages.
The network's planners had created seriously flawed contingency plans that failed to predict the impact of the outage as it developed. Just as in 2003, control rooms didn't know what was happening, the investigation concluded. When the disruption began, operators at the Imperial Irrigation District -- a key part of the transmission network between Arizona and San Diego -- were not actively paying attention to the real-time contingency analysis monitor and didn't recognize the need to act. And the monitor's alarms were not audible, the investigators found.
"The outages on the transmission level have been primarily because errors by maintenance and operations people in the field, and to some extent by lack of systems visibility," Wellinghoff said. "That was primarily the cause of the error in San Diego," he said, calling it a disappointment.
"I'm not sure why the planning didn't take place," he added. FERC is concentrating on learning and applying the lessons from the San Diego investigation to prevent cascading outages in the future, he said.
"The engineering problems are soluble, and there's a lot of work going on do to that," said Nancy Brockway, an independent consultant and former commissioner with the New Hampshire Public Utilities Commission. "The problems we have are human problems, and they aren't going away."
Next: Can technology advances since 2003 stay ahead of grid challenges?