Overview
We started the Tuva Project out of frustration. As data scientists working in healthcare, we were spending all of our time cleaning and transforming healthcare data and almost zero time discovering actual insights. For example, we saw our teams doing the same types of activities over and over:
  • Building measures
  • Scraping and organizing terminology datasets
  • Pre-processing raw data (e.g. merging claims into encounters)
We realized that every healthcare data team is rebuilding these same things over and over from scratch. The goal of the Tuva Project is to commoditize this core healthcare data infrastructure, so that healthcare data engineers can focus on solving more complex, higher-value problems, and get closer to generating actual insights from healthcare data.
The Tuva Project has four main components:
  1. 1.
    Common Data Model
  2. 2.
    Common Terminology
  3. 3.
    Data Marts
  4. 4.
    Documentation

Common Data Model

At the core of every good healthcare analytics stack is a common data model. Whenever you have multiple data sources you need a common data model to normalize the sources into. Otherwise it's impossible to scale your analytics. You can read more about our common data model here.

Common Terminology

Terminologies (classifications) and ontologies (hierarchical relations of terminologies) are vital for nearly all healthcare analytics. Unfortunately healthcare terminology datasets are scattered all over the web, maintained by a variety of different organizations, updated on different time intervals, and not always formatted for a data warehouse. The Tuva Project organizes dozens of publicly available terminology datasets so that your data team doesn't have to. You can read more about our common terminology datasets here.

Data Marts

Data marts contain logic that transforms raw healthcare data into new types of data needed to answer common healthcare analytics questions. Common examples include measures and groupers. A data mart can be queried directly, wired up to a dashboard, or fed into a machine learning model for training/deployment. Data teams can easily build on top of and combine data marts to answer more nuanced questions. However for most questions data teams won’t need to do a ton of additional transformation prior to analysis - that’s the point of the data mart! You can read more about our data marts here.

Documentation

You're reading the documentation! We found that clear and comprehensive documentation was often lacking within our healthcare data teams. With the Tuva Project we're documenting every aspect of the project here, so that data teams can easily understand exactly how their source data is being transformed.
Last modified 3mo ago