The AIRR Community has established the AIRR Data Commons (ADC) [Christley et al.], a network of geographically distributed AIRR compliant repositories that adhere to the AIRR Standards. The AIRR Community and the ADC adheres to the FAIR principles of data sharing (Findable, Accessible, Interoperable, and Reproducible). The AIRR Data Commons web API is a web based query API that makes AIRR-seq studies and their associated annotated sequence data in the ADC findable and accessible (the FA in FAIR). Because the ADC API utilizes the MiAIRR Standards and AIRR file formats, the ADC also promotes and facilitates interoperability and data reuse (the IR in FAIR), thereby supporting both reproducibility and meta-analysis.
The AIRR Data Commons has grown from just under 400 million BCR and TCR sequence annotations (unique rearrangements) in late 2018 to its current size of nine distributed repositories with 89 studies, 9869 sample repertoires, and 5.2 billion sequence annotations available for data exploration and download. More recently both clone and cell data have been added to the ADC, with an initial 2 studies comprising 67000 clones and 133 paired chain B/T cells, including gene expression. Of the nine distributed community repositories, there are four in Canada (the iReceptor Public Archive (IPA), iReceptor COVID-19, Roche/King College London, Type 1 Diabetes (https://thesugarscience.org/t1d-tcr-repository/), and Human Pancreas Analysis Program (https://hpap.pmacs.upenn.edu/) repositories managed by iReceptor), one in the US (managed by VDJServer), one VDJBase repository at Bar Ilan University in Israel, one in Germany at DKFZ, and another in Germany at the University of Muenster. The AIRR Data Commons can also be searched interactively using a web user interface using either the iReceptor Gateway or VDJServer Community Data Portal.
In 2023, over 338 unique users used the AIRR Data Commons through the iReceptor Gateway, generating over 250,000 queries to the ADC repositories, with 133 users downloading over 1.5 TB of compressed data (over 17 billion sequences). The iReceptor Gateway and the AIRR Data Commons has been cited over 100 times in the literature as an important biodata resource (See iReceptor LitMap for a visualization: https://app.litmaps.com/shared/4a8cef43-3443-462d-94b1-8ed84def2888). VDJServer has 39 studies, 3408 repertoires, and ~2.5B rearrangements. This includes 11 10X Genomics single-cell studies and 5 COVID-related studies. It also contains a large study from the Human Vaccines Project with roughly one billion unique TCR rearrangements.