NASA needs 215 more petabytes of storage by the year 2025, and expects Amazon Web Services to provide the bulk of that capacity. However, the space agency didn’t realize this would cost it plenty in cloud egress charges. As in, it will have to pay as scientists download its data.
That omission alone has left NASA’s cloud strategy pointing at the ground rather than at the heavens.
The data in question will come from NASA’s Earth Science Data and Information System (ESDIS) program, which collects information from the many missions that observe our planet. NASA makes those readings available through the Earth Observing System Data and Information System (EOSDIS).
To store all the data and run EOSDIS, NASA operates a dozen Distributed Active Archive Centers (DAACs) that provide pleasing redundancy. But NASA is tired of managing all that infrastructure, so in 2019, it picked AWS to host it all, and started migrating its records to the Amazon cloud as part of a project dubbed Earthdata Cloud. The first cut-over from on-premises storage to the cloud was planned for Q1 2020, with more to follow. The agency expects to transfer data off-premises for years to come.
NASA also knows that a torrent of petabytes is on the way. Some 15 imminent missions, such as the NASA-ISRO Synthetic Aperture Radar (NISAR) and the Surface Water and Ocean Topography (SWOT) satellites, are predicted to deliver more than 100 terabytes a day of data. We mention SWOT and NISAR because they’ll be the first missions to dump data directly into Earthdata Cloud.
The agency therefore projects that by 2025 it will have 247 petabytes to handle, rather more than the 32 it currently wrangles.
NASA thinks this is all a great idea: in its documentation for the migration, it said:
Researchers and commercial users of NASA Earth Science data will have increased opportunity to access and process large quantities of data quickly, allowing new types of research and analysis. Data that was previously geographically dispersed will now be accessible via the cloud, saving time and resources.
And it will – if NASA can afford to operate it.
And that’s a live question because a March audit report [PDF] from NASA’s Inspector General noticed EOSDIS hadn’t properly modeled what data egress charges would do to its cloudy plan.
“Specifically, the agency faces the possibility of substantial cost increases for data egress from the cloud,” the Inspector General’s Office wrote, explaining that today NASA doesn’t incur extra costs when users access data from its DAACs. “However, when end users download data from Earthdata Cloud, the agency, not the user, will be charged every time data is egressed.
“That means EDSIS wearing cloud egress costs. Ultimately, ESDIS will be responsible for both cloud costs, including egress charges, and the costs to operate the 12 DAACS.”
And to make matters worse, NASA “has not yet determined which data sets will transition to Earthdata Cloud nor has it developed cost models based on operational experience and metrics for usage and egress.
Scientific data may become less available to end users if NASA imposes limitations on the amount of data egress for cost control reasons
“As a result, current cost projections may be lower than what will actually be necessary to cover future expenses and cloud adoption may become more expensive and difficult to manage.”
There’s more. The watchdog concluded: “Collectively, this presents potential risks that scientific data may become less available to end users if NASA imposes limitations on the amount of data egress for cost control reasons.”
And to put a cherry on top, the report found the project’s organizers didn’t consult widely enough, didn’t follow NIST data integrity standards, and didn’t look for savings properly during internal reviews, in part because half of the review team worked on the project itself.
The result is three recommendations from the auditors:
- Once NISAR and SWOT are operational and providing sufficient data, complete an independent analysis to determine the long-term financial sustainability of supporting the cloud migration and operation while also maintaining the current DAAC footprint.
- Incorporate in appropriate agency guidance language specifying coordination with ESDIS and OCIO early in a mission’s life cycle during data management plan development.
- Ensure all applicable information types are considered during DAAC categorization, that appropriate premises are used when determining impact levels, and that the appropriate categorization procedures are standardized.
At least NASA seems to have bagged a good deal from AWS: The Register used Amazon’s cloudy cost calculator to tot up the cost of storing 247PB in the cloud giant’s S3 service. The promised pay-as-you-go price for us on the street was a staggering $5,439,526.92 per month, not taking into account the free tier discount of 12 cents. The audit, meanwhile, suggests an increased cloud spend of around $30m a year by 2025, on top of NASA’s $65m-per-year deal with AWS.
You don’t need to be a rocket scientist to learn about and understand data egress costs. Which left The Register wondering how an agency capable of sending stuff into orbit or making marvelously long-lived Mars rovers could also make such a dumb mistake.
It turns out NASA makes plenty: your humble vulture found this story after looking into Tuesday’s audit of the agency’s development work on its mobile launchers – the colossal vehicles designed to assemble, transport, and launch SLS and Orion rockets and capsules.
That audit found the project “has greatly exceeded its cost and schedule targets in developing ML-1. As of January 2020, modification of ML-1 to accommodate the SLS has cost $693 million — $308 million more than the agency’s March 2014 budget estimate — and is running more than 3 years behind schedule.” ®