About the DC Data Explorer

Douglas Connect is undertaking a major effort to simplify and modernize data access to scientific data sources, currently focusing on the field of toxicology. Three of the most popular toxicologic data sources are already publically available for consumption: The EPA's in vitro ToxCast/Tox21 database, the EPA's in vivo ToxRefDB database and the NIBIOHN's toxicogenomics Open TG-Gates database.

We have made the data accessible online via the internet in real time as a REST style API. Such APIs can easily be consumed by a wide range of workflow tools (e.g. KNIME, Garuda) and programming languages (e.g. R, Python or Javascript).

Our guiding principles are:

  • Expose the data faithfully and completely — no missing information, no transformations or edits to the data, reuse existing fieldnames, ...
  • Provide means for easy data access and exploration — provide powerful mechanisms for filtering/searching of data and aggregations
  • Open source the implementation so the results can be audited, from downloading the official release to the data arriving at the client

And while APIs are great, we also realize that it is often people first, not the machines, that need to understand and explore the data. For this, we are developing accompanying web based user interfaces which scientists use to quickly explore, search, compare and finally select and export the data they need. And yes, we provide means to use the selected data anywhere from Excel to piping it downstream (via APIs) to various modeling and processing tools.

APIs and the Data Explorer

So how exactly do we expose hundreds of millions of data points in such a way that allows users to quickly drill down to exactly what they need? Instead of relying on black-box search box, we heavily aggregate all the data which in turn allows us to construct dynamic user interfaces which users use to filter the data by selecting values from the calculated aggregations. And yes, we always combine this with a powerful search box as well.

Learn more:

  • Check out our recent blog post on our data APIs.
  • A short presentation from OpenTox Euro 2016 is available that explains the rationale for exposing the ToxCast data as an API.
  • The ToxCast/Tox21 API is already open source and available at GitHub (including the code for both the API itself and the importer that parses the official upstream data release). The APIs for ToxRefDB and Open TG-Gates will be open sourced in the near future.