Identifying cancer cells from calling single-nucleotide variants in scRNA-seq data

Motivation: Single-cell RNA sequencing (scRNA-seq) data are widely used to study cancer cell states and their heterogeneity. However, the tumour microenvironment is usually a mixture of healthy and cancerous cells and it can be difficult to fully separate these two populations based on transcriptomi...

Descripción completa

Detalles Bibliográficos
Autores: Marot-Lassauzaie, Valérie, Beneyto Calabuig, Sergi, Obermayer, Benedikt, Velten, Lars, Beule, Dieter, Haghverdi, Laleh
Tipo de recurso: artículo
Estado:Versión publicada
Fecha de publicación:2024
País:España
Institución:Varias* (Consorci de Biblioteques Universitáries de Catalunya, Centre de Serveis Científics i Acadèmics de Catalunya)
Repositorio:Recercat. Dipósit de la Recerca de Catalunya
OAI Identifier:oai:recercat.cat:10230/68875
Acceso en línea:http://hdl.handle.net/10230/68875
http://dx.doi.org/10.1093/bioinformatics/btae512
Access Level:acceso abierto
Palabra clave:Cèl·lules canceroses
RNA
Descripción
Sumario:Motivation: Single-cell RNA sequencing (scRNA-seq) data are widely used to study cancer cell states and their heterogeneity. However, the tumour microenvironment is usually a mixture of healthy and cancerous cells and it can be difficult to fully separate these two populations based on transcriptomics alone. If available, somatic single-nucleotide variants (SNVs) observed in the scRNA-seq data could be used to identify the cancer population and match that information with the single cells' expression profile. However, calling somatic SNVs in scRNA-seq data is a challenging task, as most variants seen in the short-read data are not somatic, but can instead be germline variants, RNA edits or transcription, sequencing, or processing errors. In addition, only variants present in actively transcribed regions for each individual cell will be seen in the data. Results: To address these challenges, we develop CCLONE (Cancer Cell Labelling On Noisy Expression), an interpretable tool adapted to handle the uncertainty and sparsity of SNVs called from scRNA-seq data. CCLONE jointly identifies cancer clonal populations, and their associated variants. We apply CCLONE on two acute myeloid leukaemia datasets and one lung adenocarcinoma dataset and show that CCLONE captures both genetic clones and somatic events for multiple patients. These results show how CCLONE can be used to gather insight into the course of the disease and the origin of cancer cells in scRNA-seq data. Availability and implementation: Source code is available at github.com/HaghverdiLab/CCLONE.