AWS Batch Speeds Up Mirvie's Bioinformatics Processing

Cloud303 Helps Integrate Complex Nexflow Jobs For Pregnancy Health Research Company

 Migration  Modernization  HIPAA



Summary

Mirvie is a pregnancy health research company using incredible new RNA research to predict problems in pregnancy earlier than ever before. Before working with Cloud303, they were using servers at their offices to perform complex computation, but with the amount of new testing samples coming in constantly growing, they were looking for a scalable option that would allow their research to grow with their company .


Industries:
Life Sciences
Regions: 
NALADEMEAAPAC
AWS Segment: 
EnterpriseSMB


Our Customer

One in five pregnancies is impacted by unforeseen complications, yet only 20% of these complications can be detected using the generalized risk assessments used by most medical practices. Mirvie has developed groundbreaking RNA research and is using it to predict complications months before they occur. Using just a simple blood test, Mirvie’s platform reveals the unique biology of each pregnancy by analyzing tens of thousands of RNA messages that drive changes throughout pregnancy. Having this information opens a window for preventative, personalized pregnancy care that hasn’t been previously possible.

The Challenge

Before working with Cloud303, Mirvie was running servers locally at their offices in San Francisco. Due to the growth of the company and the amount of samples they were receiving, this setup was no longer going to meet their needs. They were looking to switch to a scalable platform that would allow them to grow easily and naturally.

Why Mirvie Chose AWS?

AWS was an obvious solution for Mirvie. The scalability that AWS provides, while keeping costs low by only using only the necessary resources at any given time, is exactly what Mirvie was looking for.

Why Mirvie Chose Cloud303?

This was an extremely complicated migration, and Cloud303 has become known at AWS for their ability to solve challenging problems for their customers. From a privacy / security standpoint, everything had to be HIPAA compliant, which Cloud303 also had extensive experience with.

 
 
      Phil Supinski     Sujaiy Shivakumar
CEO/Solutions Architect      CTO/Solutions Architect

AWS Services Employed:
 EC2 ECS VPC

Cloud303's Solution

Cloud303’s goal was to get Mirvie up to the cloud, using an S3 bucket for storage and AWS Batch to orchestrate their computing environment to run NextFlow jobs for their bioinformatics pipeline, and to do all of this under the HIPAA compliance umbrella so that their clients’ data was protected at the highest level.

Cloud303 started by taking Nextflow scripts that Mirvie was already using and putting them into the cloud to run through AWS Batch. Since the goal of this deployment was cost optimization in addition to greater power and efficiency, the goal was to leverage spot instances, rather than on-demand instances, to perform compute tasks due to their excellent value. NextFlow is designed for parallel scientific computing, but it is not normally able to cope with compute nodes appearing and disappearing, as they can do when working with spot fleets. By utilizing S3 to preserve the application’s state, Cloud303 designed a head node with the ability to resume processes and retry jobs that were dropped, thereby allowing NextFlow to run in an environment of unknown consistency, vastly increasing both its flexibility and affordability as a parallel computing platform in the cloud. 

Nodes share an EFS volume so they have a common ephemeral data directory to work with, though staged files is copied locally to increase overall speed. The whole workload is encrypted using customer-managed KMS keys (S3, EFS, local volumes). Secrets are managed by SSM Parameter Store. When files for a job are submitted, a text file must be included as well including various details necessary for the job to be completed successfully. S3 Events are used to monitor for those file submissions and, when those files are uploaded, a Lambda function is triggered to configure the job and get all the data where it needs to be. By using S3 Events and Lambda in this way, there is no need for a persistent running server to monitor for new jobs, which further helps with cost savings. 

Once the job is complete, output files are stored in an S3 bucket where they can be viewed and downloaded by whomever needs them. 

By utilizing spot instances and removing the need for continuously running servers, Cloud303 created a parallel computing deployment that was a great deal less expensive than even one running on on-demand EC2 instances, let alone dedicated on-prem servers. 

Mirvie has a twofold need for robust audit tracking and logging: their own internal needs and HIPAA’s compliance requirements. So from the beginning of the process until the end, Cloud303 made sure that Mirvie was auditing everything, and keeping logs of everything that was happening on the cloud. To achieve this, Cloud303 implemented AWS Cloudtrail and AWS Config to make sure that all API calls, as well as all configuration changes, were recorded. All of those logs recorded in the individual services, along with application logs, are sent to CloudWatch Logs and then to S3 for long-term storage. They remain in standard storage for 30 days in case they need to be queried by Athena, and then they are moved to long-term storage in Glacier, where they remain for 6 years, in accordance with HIPAA regulations. The buckets also have S3 Server Access Logs enabled so any attempts to view log data are recorded. 

CloudWatch Alarms were also utilized to create budget alarms so Mirvie could be confident in their budget without having to constantly proactively check their bill as the month goes on. 

CodePipeline was deployed to automate the development process, sending built containers of the runtime environment to Elastic Container Registry (ECR) and the necessary NextFlow job files to S3. 

One important aspect of this build was strict version control so jobs could be closely tracked. Commit IDs from the source repository were used to tag builds and NextFlow code so every file was associated with a unique identifier that could be monitored. CodePipeline is also part of this effort - updating parameters in SSM Parameter Store to help keep track of the current build.

Results/Benefits

Now that Mirvie has migrated to AWS, they are able to keep up with the amount of samples they are receiving from the doctors using their tests. They’ve been able to successfully increase the number of clients they can accommodate without worrying if the resources they need will be available to them. And despite this sharp increase in the size of the workload, their platform is still affordable due to its design. The initial goal was to utilize on-demand instances alone to save compared on dedicated servers, but the result using spot instances as well was to save a great deal more.


AWS Programs/Funding Used:
Partner Opportunity Acceleration Funding"MAP" Migration Acceleration Program