Cloud303 Designs Computational Pipelines
For Interline's Data Storage and Processing Needs
Life Sciences Pharmaceutical Startup Utilizes Leading Cloud Provider to Develop a Modernized Solution
HIPPA Serverless Modernization
Summary
Interline is a drug discovery startup focused on systematically elucidating protein communities for identifying new medicines. Their discovery platform is used to identify new medicines targeting genetically validated signaling pathways and to ensure that their drug candidates comprehensively correct dysfunctional disease networks. Advances in multiple disciplines are aligning to expose previously hidden dimensions of protein communities in their native context. Interline is using this information to develop new medicines for genetically validated signaling pathways.
Industries:
Life Science Pharmaceutical
Regions:
NAMER
AWS Segment:
Startup
Our Customer
Interline is a drug discovery startup focused on systematically elucidating protein communities for identifying new medicines. They are developing a precision medicine platform to map and modulate protein communities, leveraging recent advancements in genomics, proteomics, structural biology, and computational chemistry.
The Challenges
Interline faced a number of initial pain points as they began to build out their MVP and data lake which focused on High Performance Computing (HPC) involving genomics processing. The following are two of the biggest issues they faced.
Inconsistent data
Interline Therapeutics' data was heterogeneous and arrived from multiple sources. As a result, the data input was in multiple formats. This created difficulties when attempting to connect and centralize all of the separate data sources including genomics and LIMS.
Proper Movement and Storage of Data
Interline was also unsure of the best way to move and store their data. They were considering options such as having multiple Amazon Simple Storage Service (S3) buckets for different types of data or having fewer S3 buckets for map reduce. Additionally, they were considering using a graph database such as Amazon Neptune, but were unsure if that was the best fit for their needs.
Why Interline Chose AWS?
AWS was a great solution for Interline for a few reasons. First, as their data needs were initially relatively modest but set to scale quickly, moving to a scalable solution within AWS would be ideal due to its unparalleled ability to scale in a number of ways this paired with the fact that it would allow them to only pay for the capacity they need.

Why Interline Chose Cloud303?
Cloud303 turned out to be a great fit for Interline because of their extensive experience with Health and Life Science companies. Cloud303's reputation as the number one AWS Well-Architected Review partner boded well for Interline's aspiration to roll out the HPC pipeline, which had to be HIPAA-compliant. Led by Cloud303 Principal Solutions Architect and Life Sciences Team Lead, Tim Furlong, this experience also meant they were extremely knowledgeable on what requirements were needed to ensure HIPAA compliance in the cloud. Including but not limited to, how to effectively track the data required for HIPAA compliance on the AWS account level, as well as tracking data and logs from within the application itself. All of which were defined and discussed extensively during the Well-Architect Review during the Assess Phase of development.


Tim Furlong
Principal Solutions Architect/Life Sciences Team Lead
AWS Batch AWS Lambda Amazon Simple Storage Service (S3) Amazon Redshift Amazon FSx for LusterAWS Transfer Family CloudFormation Secrets Manager Key Management Service (KMS)
Cloud303's Solution
To help Interline overcome these challenges, Cloud303, an AWS Advanced Consulting Partner, was brought in to assist with the deployment of their HPC pipeline on AWS. Cloud303 delivered three accounts to Interline: a Pipeline Account to process all of the data required for their project, a Storage Account to store all of the files and metadata needed for further studies, and an Audit Account to centralize logging for auditing purposes. Additionally, a Root Account was set up as the main AWS Organizations account for users to login into all other accounts. This account also created the entry and exit point for all internet traffic inbound and outbound.
To facilitate the compute workload, Cloud303 deployed AWS Batch and used AWS Lambda functions to invoke the pipelines and process data. All of their files were stored in an S3 Bucket with Cross Region Replication to ensure a disaster recovery plan was in place. Amazon Redshift was used to store all of the metadata necessary for Interline, and Amazon FSx for Luster was used as scratch storage for the pipelines, storing temporary files and enabling parallel computing if required in the future.
To better support Interline's drug discovery process and handle the complexities of life sciences data, Cloud303 integrated proteomics tools such as Rosetta and Alphafold into the AWS-based HPC solution. This allowed Interline to systematically process and analyze large amounts of genomics and proteomics data, providing valuable insights into protein communities and their interactions.
By building a robust HPC pipeline on AWS, Interline was able to efficiently process and analyze life sciences data, accelerating their drug discovery efforts and enabling them to quickly develop new therapeutic candidates.
For the benchling pipeline, Cloud303 set up an Event Bus to push events from Benchling and trigger AWS Batch for the Ingestion stage. This process would take the transaction id to query the Benchling APIs that retrieved the uploaded metadata for the new DNA sequence, the file, and selected data. This process then stored this data in an S3 Bucket. Additionally, any recent metadata events were configured to trigger the AWS Lambda function that launched the AWS Batch pipeline for the processing stage. During this stage, all necessary data would be pulled from S3, using
the scripts provided by Interline run its computation, and then store the results into a folder named "result" within the same S3 bucket.
For any other manual data ingestion, Cloud303 provided Interline with an SFTP server using AWS Transfer Family. All connections to the SFTP server were made through a VPN Server deployed by Cloud303. All users could be added or removed using CloudFormation Templates. All credentials were stored in Secrets Manager, encrypted with custom Key Management Service (KMS) keys provided by Cloud303. These keys were configured with a rotation policy to rotate every 60 days. All logs produced by the SFTP server were pushed into the Audit account on an application logging bucket for retention to satisfy HIPPA Compliance.
Results/Benefits
By working with Cloud303, Interline successfully built and deployed a life sciences-focused HPC pipeline on AWS. This allowed them to automate their bioinformatics pipeline, seamlessly process vast amounts of genomics and proteomics data, and develop new therapeutic candidates more efficiently.
After the solution was implemented, Interline saw a dramatic improvement in both their efficiency and capacity for data processing. Prior to the collaboration with Cloud303, Interline took an average of 24 hours to process a certain volume of data. With the newly established HPC pipeline on AWS, this processing time was reduced to just 6 hours - a remarkable 75% improvement in processing speed. This expansion in capacity, coupled with increased speed, means that Interline can now handle larger datasets more swiftly, a critical factor in speeding up their drug discovery process.
Furthermore, the implementation of Cloud303's solution led to significant cost savings and an increase in data accuracy. Operational costs were reduced by 30%, freeing up substantial resources that could be redirected towards their core research and development efforts. At the same time, the efficient use of AWS services and the establishment of a more organized data pipeline led to a reduction in data errors. The error rate dropped by 40%, resulting in cleaner, more reliable data for Interline's critical work in drug discovery.
With the implementation of this robust, scalable, and cost-effective solution, Interline was not only able to improve their operational efficiency but also ensure HIPPA compliance, maintaining the integrity and confidentiality of the data processed. This has enabled Interline to remain focused on their core mission of developing new medicines, secure in the knowledge that their data is handled in a compliant and efficient manner.
Additionally, the multi-account structure provided by Cloud303 ensured application scalability for any future additions required by Interline, allowing them to adapt to the evolving needs of life sciences research easily. With the peace of mind that their data was being handled and stored in a compliant and efficient manner on AWS, Interline could focus on their core mission of drug discovery and advancing precision medicine.
Phase 1: "POC" Proof of Concept Phase 2: Jumpstart