Senior Data Engineer, Data Curation Job at Formation Bio, New York, NY

SzJ0S05yUktjTkhjUG1KcWlNZEttc2tXTkE9PQ==
  • Formation Bio
  • New York, NY

Job Description

Senior Data Engineer, Data Curation

Formation Bio is a tech and AI driven pharma company differentiated by radically more efficient drug development. Advancements in AI and drug discovery are creating more candidate drugs than the industry can progress because of the high cost and time of clinical trials. Recognizing that this development bottleneck may ultimately limit the number of new medicines that can reach patients, Formation Bio, founded in 2016 as TrialSpark Inc., has built technology platforms, processes, and capabilities to accelerate all aspects of drug development and clinical trials. Formation Bio partners, acquires, or in-licenses drugs from pharma companies, research organizations, and biotechs to develop programs past clinical proof of concept and beyond, ultimately helping to bring new medicines to patients. The company is backed by investors across pharma and tech, including a16z, Sequoia, Sanofi, Thrive Capital, Sam Altman, John Doerr, Spark Capital, SV Angel Growth, and others.

As a Senior Data Engineer at Formation Bio, you will focus on building the semantic layer that makes diverse data pillars interoperable, consistent, and actionable. You'll work across healthcare (EHR, claims, real-world data), commercial/pharma (pricing, formulary, market data), biomedical (scientific and trial data), and finance (operational and business datasets) to design models that unify disparate sources into a common language for analytics, decision-making, and AI applications.

While ingestion pipelines are part of the work, your primary responsibility will be transforming both structured and unstructured data into scalable, ontology-driven data models that teams can trust and reuse. This includes everything from traditional relational datasets to text-heavy unstructured sources that feed NLP, embeddings, and semantic search.

This role requires partnering closely with engineers, analysts, data scientists, and business stakeholders to ensure every data pillar is represented in a robust semantic foundation that supports today's needs and tomorrow's AI-native platforms.

Responsibilities
  • Semantic Modeling & Ontologies : Build and maintain SQL/dbt models that unify datasets across healthcare, commercial/pharma, biomedical, and finance domains, leveraging ontologies (e.g., SNOMED CT, ICD, RxNorm, HL7 FHIR, OMOP).
  • Structured + Unstructured Data Integration : Design models that handle not only structured datasets but also unstructured data sources (e.g., documents, free text, biomedical literature), preparing them for AI-driven applications.
  • Data Layer Architecture : Own and evolve the semantic layer that transforms raw data into consistent, reusable models powering analytics and advanced AI.
  • Ingestion & Integration : Contribute to pipelines that bring in data from APIs, partner feeds, flat files, and unstructured text, ensuring inputs are reliable, well-documented, and metadata-rich.
  • Data Quality & FAIR Principles : Apply FAIR principles to ensure data is traceable, interoperable, and reusable across structured and unstructured domains.
  • Cross-functional Collaboration : Partner with commercial, scientific, finance, and healthcare stakeholders to align semantic models with real-world use cases.
  • Enablement & Documentation : Document data standards and reusable modeling patterns to empower downstream teams and reduce cognitive load.
  • Future-Proofing : Anticipate how today's semantic modeling will support tomorrow's AI workflows such as NLP, embeddings, knowledge graphs, and retrieval-augmented generation.
About You

Required Experience:

  • 5+ years of experience as a Data Engineer, Analytics Engineer, or similar role in healthcare, pharma, biotech, finance, or other highly regulated industries.
  • Deep expertise in at least one data domain (e.g., healthcare/EHR/claims, commercial/pharma, biomedical/scientific, or finance), with a track record of translating complex, domain-specific datasets into consistent and usable models.
  • Strong SQL and data modeling skills, with proven experience designing semantic or analytical layers.
  • Exposure to additional domains beyond your core area of expertise, and the ability to learn and adapt to new datasets quickly.
  • Experience working with both structured data (e.g., relational tables, APIs) and unstructured data (e.g., documents, free text, biomedical literature, healthcare notes).
  • Familiarity with healthcare/life sciences ontologies (SNOMED CT, ICD, RxNorm, LOINC, HL7 FHIR, OMOP, Mondo) and/or financial/commercial taxonomies.

Preferred Experience (Valued but Not Required):

  • Hands-on experience with Snowflake, dbt, Dagster, and modern data stacks.
  • Experience with unstructured data workflows (NLP, embeddings, semantic search, knowledge graphs).
  • Understanding of regulatory and compliance considerations in healthcare, pharma, or finance.
  • Practical use of metadata management and data catalog platforms.
  • Hands-on experience structuring dbt projects with testing, quality checks, and reusable design patterns.

Key Attributes:

  • Curious & Investigative Always looking deeper into how and why datasets work the way they do.
  • Structured & Methodical Brings rigor to semantic modeling, ontology mapping, and data quality management.
  • Collaborative Partner Works seamlessly across pillars, enabling others while owning core responsibilities.
  • Adaptable Leverages deep domain expertise while learning quickly in unfamiliar data areas.
  • Enablement-Minded Strives to reduce complexity for downstream users by standardizing and documenting.
  • Future-Oriented Builds today's models with tomorrow's AI-native and data-driven applications in mind.

Formation Bio is prioritizing hiring in key hubs, primarily the New York City and Boston metro areas, with additional growth in the Research Triangle (NC) and San Francisco Bay Area. Please only apply if you reside in these locations or are willing to relocate.

Compensation: The target salary range for this role is: $180,000 - $230,000. Salary ranges are informed by a number of factors including geographic location. The range provided includes base salary only. In addition to base salary, we offer equity, comprehensive benefits, generous perks, hybrid flexibility, and more. If this range doesn't match your expectations, please still apply because we may have something else for you.

You will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status.

Job Tags

Relocation,

Similar Jobs

Heirloom Cloud Corporation

Voice Over Talent Job at Heirloom Cloud Corporation

 ...secure home for stories. Were building an empathetic brand that feels warm, trustworthy, and human.The RoleWere searching for the voice of the Heirloom branda versatile storyteller who can deliver friendly, clear, and emotionally resonant reads across product... 

WILLIAM H BUNCH AUCTIONEER & APPRAISER LLC

Product Photographer Job at WILLIAM H BUNCH AUCTIONEER & APPRAISER LLC

 ...images for all sales on the company server according to our photographic protocol. Requirements Candidate must have solid product photography experience, working knowledge of studio lighting, macro photography experience, post processing proficiency, and a strong... 

Sanford Health

RN Inpatient - North Network Job at Sanford Health

 ...with colleagues, including physicians, to plan, implement and evaluate care * Empathetically cares for patients during all stages...  ...and would like to request an accommodation for help with your online application, please call or send an email to . Sanford... 

Helix Human Services

School Principal Job at Helix Human Services

 ...responsible for providing supervision and professional leadership to special education/ behavioral health school staff and for oversight of all...  ...As part of the management team, the principal will provide administrative oversight, curriculum and program development, including... 

Natuvion

SAP FICO Consultant Job at Natuvion

Join to apply for the SAP FICO Consultant role at NatuvionJoin to apply for the SAP FICO Consultant role at NatuvionJob DescriptionSAP FICO ConsultantNatuvion focuses on helping companies achieve success by moving business-critical data and processes from one technology...