Identifying cancer cells from calling single-nucleotide variants in scRNA-seq data

Motivation: Single-cell RNA sequencing (scRNA-seq) data are widely used to study cancer cell states and their heterogeneity. However, the tumour microenvironment is usually a mixture of healthy and cancerous cells and it can be difficult to fully separate these two populations based on transcriptomi...

ver descrição completa

Detalhes bibliográficos
Autores: Marot-Lassauzaie, Valérie, Beneyto Calabuig, Sergi, Obermayer, Benedikt, Velten, Lars, Beule, Dieter, Haghverdi, Laleh
Formato: artículo
Estado:Versión publicada
Fecha de publicación:2024
País:España
Recursos:Varias* (Consorci de Biblioteques Universitáries de Catalunya, Centre de Serveis Científics i Acadèmics de Catalunya)
Repositorio:Recercat. Dipósit de la Recerca de Catalunya
OAI Identifier:oai:recercat.cat:10230/68875
Acesso em linha:http://hdl.handle.net/10230/68875
http://dx.doi.org/10.1093/bioinformatics/btae512
Access Level:acceso abierto
Palavra-chave:Cèl·lules canceroses
RNA
id ES_9dbf2ff4f5f07bf20fa67c0519dafe94
oai_identifier_str oai:recercat.cat:10230/68875
network_acronym_str ES
network_name_str España
repository_id_str
spelling Identifying cancer cells from calling single-nucleotide variants in scRNA-seq dataMarot-Lassauzaie, ValérieBeneyto Calabuig, SergiObermayer, BenediktVelten, LarsBeule, DieterHaghverdi, LalehCèl·lules cancerosesRNAMotivation: Single-cell RNA sequencing (scRNA-seq) data are widely used to study cancer cell states and their heterogeneity. However, the tumour microenvironment is usually a mixture of healthy and cancerous cells and it can be difficult to fully separate these two populations based on transcriptomics alone. If available, somatic single-nucleotide variants (SNVs) observed in the scRNA-seq data could be used to identify the cancer population and match that information with the single cells' expression profile. However, calling somatic SNVs in scRNA-seq data is a challenging task, as most variants seen in the short-read data are not somatic, but can instead be germline variants, RNA edits or transcription, sequencing, or processing errors. In addition, only variants present in actively transcribed regions for each individual cell will be seen in the data. Results: To address these challenges, we develop CCLONE (Cancer Cell Labelling On Noisy Expression), an interpretable tool adapted to handle the uncertainty and sparsity of SNVs called from scRNA-seq data. CCLONE jointly identifies cancer clonal populations, and their associated variants. We apply CCLONE on two acute myeloid leukaemia datasets and one lung adenocarcinoma dataset and show that CCLONE captures both genetic clones and somatic events for multiple patients. These results show how CCLONE can be used to gather insight into the course of the disease and the origin of cancer cells in scRNA-seq data. Availability and implementation: Source code is available at github.com/HaghverdiLab/CCLONE.Oxford University Press202420242024info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfapplication/pdfhttp://hdl.handle.net/10230/68875http://dx.doi.org/10.1093/bioinformatics/btae512reponame:Recercat. Dipósit de la Recerca de Catalunyainstname:Varias* (Consorci de Biblioteques Universitáries de Catalunya, Centre de Serveis Científics i Acadèmics de Catalunya)InglésBioinformatics. 2024 Sep 2;40(9):btae512© The Author(s) 2024. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.http://creativecommons.org/licenses/by/4.0/info:eu-repo/semantics/openAccessoai:recercat.cat:10230/688752026-05-29T05:05:01Z
dc.title.none.fl_str_mv Identifying cancer cells from calling single-nucleotide variants in scRNA-seq data
title Identifying cancer cells from calling single-nucleotide variants in scRNA-seq data
spellingShingle Identifying cancer cells from calling single-nucleotide variants in scRNA-seq data
Marot-Lassauzaie, Valérie
Cèl·lules canceroses
RNA
title_short Identifying cancer cells from calling single-nucleotide variants in scRNA-seq data
title_full Identifying cancer cells from calling single-nucleotide variants in scRNA-seq data
title_fullStr Identifying cancer cells from calling single-nucleotide variants in scRNA-seq data
title_full_unstemmed Identifying cancer cells from calling single-nucleotide variants in scRNA-seq data
title_sort Identifying cancer cells from calling single-nucleotide variants in scRNA-seq data
dc.creator.none.fl_str_mv Marot-Lassauzaie, Valérie
Beneyto Calabuig, Sergi
Obermayer, Benedikt
Velten, Lars
Beule, Dieter
Haghverdi, Laleh
author Marot-Lassauzaie, Valérie
author_facet Marot-Lassauzaie, Valérie
Beneyto Calabuig, Sergi
Obermayer, Benedikt
Velten, Lars
Beule, Dieter
Haghverdi, Laleh
author_role author
author2 Beneyto Calabuig, Sergi
Obermayer, Benedikt
Velten, Lars
Beule, Dieter
Haghverdi, Laleh
author2_role author
author
author
author
author
dc.subject.none.fl_str_mv Cèl·lules canceroses
RNA
topic Cèl·lules canceroses
RNA
description Motivation: Single-cell RNA sequencing (scRNA-seq) data are widely used to study cancer cell states and their heterogeneity. However, the tumour microenvironment is usually a mixture of healthy and cancerous cells and it can be difficult to fully separate these two populations based on transcriptomics alone. If available, somatic single-nucleotide variants (SNVs) observed in the scRNA-seq data could be used to identify the cancer population and match that information with the single cells' expression profile. However, calling somatic SNVs in scRNA-seq data is a challenging task, as most variants seen in the short-read data are not somatic, but can instead be germline variants, RNA edits or transcription, sequencing, or processing errors. In addition, only variants present in actively transcribed regions for each individual cell will be seen in the data. Results: To address these challenges, we develop CCLONE (Cancer Cell Labelling On Noisy Expression), an interpretable tool adapted to handle the uncertainty and sparsity of SNVs called from scRNA-seq data. CCLONE jointly identifies cancer clonal populations, and their associated variants. We apply CCLONE on two acute myeloid leukaemia datasets and one lung adenocarcinoma dataset and show that CCLONE captures both genetic clones and somatic events for multiple patients. These results show how CCLONE can be used to gather insight into the course of the disease and the origin of cancer cells in scRNA-seq data. Availability and implementation: Source code is available at github.com/HaghverdiLab/CCLONE.
publishDate 2024
dc.date.none.fl_str_mv 2024
2024
2024
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/10230/68875
http://dx.doi.org/10.1093/bioinformatics/btae512
url http://hdl.handle.net/10230/68875
http://dx.doi.org/10.1093/bioinformatics/btae512
dc.language.none.fl_str_mv Inglés
language_invalid_str_mv Inglés
dc.relation.none.fl_str_mv Bioinformatics. 2024 Sep 2;40(9):btae512
dc.rights.none.fl_str_mv http://creativecommons.org/licenses/by/4.0/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by/4.0/
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv Oxford University Press
publisher.none.fl_str_mv Oxford University Press
dc.source.none.fl_str_mv reponame:Recercat. Dipósit de la Recerca de Catalunya
instname:Varias* (Consorci de Biblioteques Universitáries de Catalunya, Centre de Serveis Científics i Acadèmics de Catalunya)
instname_str Varias* (Consorci de Biblioteques Universitáries de Catalunya, Centre de Serveis Científics i Acadèmics de Catalunya)
reponame_str Recercat. Dipósit de la Recerca de Catalunya
collection Recercat. Dipósit de la Recerca de Catalunya
repository.name.fl_str_mv
repository.mail.fl_str_mv
_version_ 1869414769062051840
score 15,81155