Americans have come face-to-face with the technology behind air travel in the United States, as thousands of flights were canceled or delayed this week after a massive, devastating computer outage at the Federal Aviation Administration .
As the country settles down again, bewildered air travelers may be wondering why planes suddenly seem to be plagued with devastating IT problems.
According to current and former industry insiders, government reports, and outside analysts, the answer lies not only in aging hardware and software, but also in institutional failures that make technology updates more difficult. is also included.
Over the years, and in the face of explosive demand for air travel, the system has become increasingly sophisticated with far more points of failure than many consumers realize. Instead, bureaucratic snafus and deferred maintenance have contributed to an increasingly fragile system.
Southwest Airlines’ recent system-wide collapse of several days — in the middle of a winter storm, and during the most important travel period of the year — and Wednesday’s massive flight disruptions compounded many of these problems. , but they are only the latest symptom of a long-standing and very complex problem.
The glitch at the heart of this week’s headaches was a corrupted database file in the pilot’s advisory system. The system alerts you to a variety of hazards that can affect your flight, from closed runway notices to the presence of nearby construction equipment. This is known as NOTAM. The corrupted files were also present in the FAA’s backup system, a source familiar with the matter told CNN.
Authorities moved to restart the main NOTAM system early Wednesday morning, but were unable to fully restore it by the time rush hour began on the East Coast, leading to an FAA ground shutdown. A senior official told CNN Wednesday that there was no evidence of wrongdoing in the case, the details of which the FAA later officially confirmed.
“The FAA continues a thorough investigation to determine the root cause of the Notice to Air Mission (NOTAM) system outage,” the agency said in a statement Wednesday night. has traced the outage to database file corruption, and there is no evidence of a cyberattack at this time.The FAA is working to further identify the cause of this issue and to prevent this type of disruption from happening again. We are working diligently to take all necessary steps to
The FAA said Thursday night that the data files were “corrupted by personnel who did not follow procedures.”
The NOTAM problem comes days after the FAA said an “air traffic computer problem” was responsible for the multi-hour flight delay to the Florida airport on January 2. It is seen as a key component of the FAA’s efforts to modernize U.S. airspace.
In the case of Southwest Airlines, an outdated scheduling system that could not automatically adapt to the disruptions caused by severe winter weather required painstaking manual intervention, exacerbating the airline’s weather-related problems. bottom.
Despite efforts to modernize equipment, in some cases, airlines and the U.S. government may still rely on years or even decades-old technology.
The FAA’s software that failed this week is 30 years old and will take at least six years to update, a US government official told CNN on Thursday.
The notice issued by the FAA’s NOTAM system is “Jurassic,” said Kathleen Bangs, a former airline pilot and aviation expert. “This overburdens pilots with pages and pages of less urgent notifications written in archaic code that sometimes fills in some of the critical safety information that pilots really need. It’s a clumsy system with a lot of
The FAA acknowledges the antiquity of the NOTAM system. In a recent budget request to Congress, the agency asked for money to help “eliminate failed vintage hardware” behind it.
In early 2012, the FAA decided to replace the aging legacy voice switches used in air traffic control communications with new internet-based communications technology. However, the FAA currently intends to keep the old switches in use until at least 2030, according to a Department of Transportation inspector general report last year.
The ERAM air traffic system, which was at the center of the turmoil on January 2nd, is much newer and only became fully operational in 2015. Replacing another system that has already been in operation for over 40 years. The FAA is currently working on ERAM hardware and software updates. There have been at least seven ERAM failures since 2014, a track record that has prompted congressional scrutiny. However, it may not be until 2026 that he completes the ERAM upgrade, according to a 2020 report.
On the other hand, many of the IT systems that airlines rely on were custom built a long time ago, some running on legacy mainframe computers, and are expected to process vast amounts of information. It wasn’t designed for, say aviation experts.
Seth Miller, IT consultant, aviation journalist, and editor of travel magazine PaxExAero, said: “These are old, old systems.”
As a result, a severe crisis could easily overwhelm these fragile settings, according to aviation industry officials who spoke on condition of anonymity to discuss the matter more freely.
“These systems were built when airlines were smaller and weren’t built to handle large amounts of data at once,” the official said. “If we had a big winter storm over the year-end and New Year holidays, we wouldn’t be able to handle a lot of changes at once because it’s on a system that wasn’t built to handle large moving datasets. .”
Industry experts say the age of technology is not inherently the problem. That’s what the times mean. Unable to scale to meet new demands, and lacking adequate support as the rest of the world moves on, Miller said he wanted bespoke technology as opposed to off-the-shelf solutions. The problem worsens with use, and it requires more and more specialized parts and know-how to maintain it.
Constantly trying to integrate old and new systems in real time can be a catastrophic mistake as the global aviation industry never sleeps.
All flight delays and cancellations tend to create similar experiences for air travelers, but the underlying causes of outages can be very different. More things can go wrong than expected. The complexity of the airline industry is striking, highlighting the lack of easy solutions to IT-related travel disruptions.
Takeoffs involve a complex stew of information, and disruptions anywhere in that information supply chain can cause delays, according to industry experts.
Vulnerabilities are magnified by the vast number of companies involved in the ecosystem (not just airlines, but their vendors, and their vendors’ vendors).
“There are so many different systems talking to each other,” said Ross Feinstein, a former spokesman for American Airlines and the Transportation Security Administration.
For example, Feinstein said the TSA scrutinizes airline manifestos. “If the TSA is suspended, the booking review process will stop, which means passengers will not be able to check in and get boarding passes. We may not be able to retrieve the data.”
In 2019, a computer issue with a third-party company that provides a flight planning tool that helps airlines calculate aircraft weight and balance caused delays for several airlines nationwide.
In 2021, the shutdown of Saber, one of the world’s largest airline booking companies, has caused global turmoil.
The interconnected nature of the aviation sector involving dozens of countries, companies, institutions and databases creates multiple points of failure. Backups and redundancy help, but it’s still a very complex system of systems.
Beneath the surface-level symptoms of aviation IT problems lie deeper, thornier, more human challenges.
Consider the FAA’s attempt to replace voice switches in air traffic. A dispute between the FAA and its potential vendors over contract requirements was the main cause of the collapse, according to Inspector General’s report. The controversy focused on possible software flaws in the new switches and whether the vendor could still deliver a good product on time.
The root of the problem was not a technical problem per se. It was a procurement issue. But it has had a lasting impact on FAA technology. The contract’s eventual end means he will have to spend more than $270 million by 2030 for the FAA to continue using aging legacy voice switches.
“Continued reliance on these switches risks disrupting communications,” concludes the report.
The debate over 5G wireless technology near airports has seen a similar move, threatening to wreak havoc last year. A bureaucratic department and years of deferred avionics upgrades have led to a crisis that US aircraft do not have the technology to deal with potential he 5G interference.
Meanwhile, the FAA continues to be headed by an Acting Chief Executive, with no Senate-approved chief. It has real-world implications for IT upgrades and other projects, according to the people, who asked to remain anonymous to discuss the matter more freely.
“It’s very difficult to set direction and vision when you don’t know if you’re going to be there for a week or 18 months,” the person said.
Meanwhile, much of the airline industry’s outstanding technical debt may stem from a string of mergers and bankruptcies after 9/11, when many airlines focused more on finances than on technology upgrades, industry officials said. said the person.
That bureaucratic myopia is the unique cause of today’s technological fatigue in the aviation industry.In some circumstances, institutional inertia and commercial priorities outweigh costly and tedious infrastructure investments. I’m here.
But as systems become more interconnected and digitized, when things go wrong they can fail in ever more disastrous ways.
Aviation experts say only more investment and better planning can solve the challenge.
“[The FAA] We are doing more with less and need more money to modernize,” said Feinstein. “In Washington, from the next 24 hours, he will talk about this issue for 48 hours and forget about it.
–– CNN’s Pete Muntean, Gregory Wallace and Marnie Hunter contributed to this report.