Bio / topics
I am a research associate in BONSAI team (Lille, France). My work focuses on methods and data structures in sequence bioinformatics, with applications to RNA in particular.
Je suis chargée de recherche au CNRS dans l’équipe BONSAI (Lille, France). Je travaille sur des méthodes et structures de données dédiées à la bioinformatique des séquences, avec souvent des applications à l’ARN.
Intéressé.e par une formation à la bioinformatique pour les données de séquençage ? Le CNRS propose une formation à Lille (labos/entreprises) : lien.
After an engineer degree in Bioinformatics from INSA de Lyon and a MsC in Ecology and Evolution from Université Claude Bernard Lyon 1, I worked for two years as an engineer in ERABLE team (LBBE, Lyon) with Vincent Lacroix. I obtained a PhD funding in GenScale team (Rennes, France), where I was supervised by Pierre Peterlongo. I defended my PhD in 2018 and joined BONSAI in the CRIStAL lab afterwards as a postdoc. Lately I was recruited by the CNRS to work as a researcher in the same lab (detailed cv). My research is currently supported by the ANR through PRC and JCJC projects.
My postdoc took part in Transipedia ANR, with Rayan Chikhi and Mikaël Salson. Transipedia aims at being a transcriptome-encyclopedia, e.g., facilitating indexing, query and exploitation of the numerous publicly available RNA-seq data. I am mostly working on new data structures to index large collections of NGS datasets.
Before and during my PhD I worked on methods for transcriptomics, in particular for de novo variants discovery and RNA long read analysis.
Job offers
News
Blog posts / short articles
Research projects
Present
- 2024-.. Find-RNA ANR JCJC project, Data-structures for collections of sets of (meta-)transcriptomics data (coordinator)
- 2021-.. INSSANE ANR project, Novel methods for studying RNA structures (partner)
- 2022-.. Full RNA ANR project,Indexing large scale 2nd and 3rd generation RNA datasets (member)
- 2020-.. ALPACA ITN, Methods for pangenomics (informal member)
Past
- 2018-22 Transipédia ANR project, Indexing large scale RNA-seq datasets (member)
- 2017 CNRS MASTODONS project, Correction of 3rd generation sequencing data (member)
- 2013-16 Colib’read ANR project, De novo methods for the variant calling in short read sequencing (member)
Journal Publications
- A survey of mapping algorithms in the long-reads era, K Sahlin, T Baudeau, B Cazaux, C Marchet; Genome Biology, 2023
- Scalable sequence database search using Partitioned Aggregated Bloom Comb-Trees, C Marchet & A Limasset; Bioinformatics 2023
- BLight: Efficient exact associative structure for k-mers, C Marchet, M Kerbiriou, A Limasset; Bioinformatics, 2021
- Scalable long read self-correction and assembly polishing with multiple sequence alignment, P Morisse, C Marchet, A Limasset, T Lecroq, A Lefebvre; Scientific Reports, 2021
- Data structures based on k-mers for querying large collections of sequencing data sets, C Marchet, C Boucher, S. J. Puglisi, P Medvedev, M Salson, R Chikhi; Genome Research 2020
- REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets, C Marchet, Z Iqbal, D Gautheret, M Salson, R Chikhi; Bioinformatics, 2020
- ELECTOR: Evaluator for long reads correction methods, C Marchet, P Morisse, L Lecompte, A Limasset, A Lefebvre, T Lecroq, P Peterlongo; Nucleic acid research Genomics and Bioinformatics, 2019
- Comparative assessment of long-read error-correction software applied to RNA-sequencing data, L Ishi Soares de Lima, C Marchet, S Caboche, C Da Silva, B Istace, J-M Aury, H Touzet, R Chikhi; Briefings in Bioinformatics, 2019
- Clustering de Novo by Gene of Long Reads from Transcriptomics Data, C Marchet, L Lecompte, C Da Silva, C Cruaud, J-M Aury, J Nicolas, P Peterlongo; Nucleic Acids Research, 2018
- A de novo approach to disentangle partner identity and function in holobiont systems A Meng, C Marchet, E Corre, P Peterlongo, A Alberti, C Da Silva, P Wincker, E Pelletier, I Probert, J Decelle, S Le Crom, F Not, L Bittner; Microbiome, 2018
- A resource-frugal probabilistic dictionary and applications in bioinformatics, C Marchet, L Lecompte, A Limasset, L Bittner, P Peterlongo; Discrete Applied Mathematics, 2018
- Complementarity of assembly-first and mapping-first approaches for alternative splicing annotation and differential analysis from RNAseq data C Benoit-Pilven, C Marchet, E Chautard, L Lima, M-P Lambert, G Sacomoto, A Rey, C Bourgeois, D Auboeuf, V Lacroix; Scientific Reports, 2018
- Playing hide and seek with repeats in local and global de novo transcriptome assembly of short RNA-seq reads L Lima, B Sinaimeri, G Sacomoto, H Lopez-Maestre, C Marchet, V Miele, M-F Sagot and V Lacroix; Algorithms for Molecular Biology, 2017
- SNP calling from RNA-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence H Lopez Maestre, L Brinza, C Marchet, J Kielbassa, S Bastien, M Boutigny, D Monnin, A El Filali, CM Carareto, C Vieira, F Picard, N Kremer, F Vavre, M-F Sagot, V Lacroix; Nucleic Acids Research, 2016
- Colibread on galaxy: a tools suite dedicated to biological information extraction from raw NGS reads E Rivals, A Andrieux, A Z El Aabidine, B Cazaux, C Marchet, C Lemaitre, C Monjeaud, G Sacomoto, L Salmela, O Collin, P Peterlongo, R Uricaru, S Alves-Carvalho, V Lacroix, V Miele, Y LeBras; GigaScience, 2016
Research papers accepted to peer-reviewed conferences
- Fractional Hitting Sets for Efficient and Lightweight Genomic Data Sketching, T Rouzé, I Martayan, C Marchet & A Limasset, WABI 2023
- Scalable sequence database search using Partitioned Aggregated Bloom Comb-Trees, C Marchet & A Limasset; ISMB 2023
- REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets, C Marchet, Z Iqbal, D Gautheret, M Salson, R Chikhi; ISMB 2020
- Indexing De Bruijn graphs with minimizers, C Marchet, A Kerbiriou, A Limasset; RECOMB-SEQ 2019
- CONSENT: Scalable self-correction of long reads with multiple sequence alignment
, P Morisse, C Marchet, A Limasset, A Lefebvre, T Lecroq; RECOMB-SEQ 2019
- A resource-frugal probabilistic dictionary and applications in (meta)genomics, C Marchet, A Limasset, L Bittner, P Peterlongo; PSC 2016
- Navigating in a Sea of Repeats in RNA-seq without Drowning, G Sacomoto, B Sinaimeri, C Marchet, V Miele, MF Sagot, V Lacroix; WABI 2014
Preprints
Communications
Invited talks
- 2023: Hashing-based data-structures for querying large k-mer (collections of) sets, CiE, Batumi (Georgia)
- 2023 Hashing-based data-structures for querying large k-mer (collections of) sets, Sequences in London, UK
- 2022 Scalable sequence database search using Partitioned Aggregated Bloom Comb-Trees, Genome Informatics, Cambridge (UK)
- 2022 How to improve student/advisor relationships, WABI, ALPACA 2nd Annual Workshop, Potsdam (Germany)
- 2022 Data-structures for querying large k-mer (collections of) sets, JOBIM mini-symposium, Rennes (France)
- 2022 Data-structures for querying large k-mer (collections of) sets, KIM Data and Life Sciences seminars, Montpellier (France, online) [slides] [recording (in French)]
- 2022 Data structures for large k-mer sets, MIAT seminars, Toulouse (France, online) [slides]
- 2020 From reads to transcripts: de novo methods for the analysis of transcriptome second and third generation sequencing, SIF congress, INSA de Lyon (France)
- 2020 Scalable data structures for sequencing data, Symposium GDR Madics, Lyon (France) (covid-19 : canceled)
- 2019 New methodologies for the analysis of transcriptome sequences, MAB team seminar, LIRMM Montpellier (France)
- 2018 CARNAC-LR and C2C: de novo clustering and detection of alternative isoforms in Third Generation Sequencing transcriptomes, Genotoul Biostats/Bioinfo day, INRA Toulouse (France)
- 2018 From reads to transcripts: de novo methods to analyze transcriptome 2d and 3d generations sequencing data, Roscoff Biological Station seminar (France)
- 2018 A de novo approach to disentangle partner identity and function in holobiont systems, Advances techniques to study and exploit the sponge and coral microbiomes Workshop, ULB Brussels (Belgium)
- 2018 A highly scalable data structure for read similarity computation and its application to marine holobionts, EEB group meeting, ULB Brussels (Belgium)
- 2018 CARNAC-LR: clustering genes expressed variants from long read RNA sequencing, team TIBS seminar, LITIS Rouen (France)
- 2017 De novo Clustering of Gene Expressed Variants in Transcriptomic Long Reads Data Sets at workshop RNA-Seq and Nanopore sequencing, Genoscope Evry (France)
- 2017 Rconnector: a resource-frugal probabilistic dictionary and applications in (meta)genomics and transcriptomics, LBBE NGS group seminar, Lyon (France)
Workshops/conferences/seminars
- 2020 REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets, BIATA, St Petersburg (Russia, virtual conference)
- 2020 REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets, ISMB, Montreal (Canada, virtual conference)
- 2020 REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets, JOBIM, Montpellier (France, virtual conference)
- 2020 REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets, DSB, Rennes (France)
- 2019 Indexing De Bruijn graphs with minimizers, RECOMB-seq, Washington DC (US)
- 2019 Indexing De Bruijn graphs with minimizers, BiATA, St Petersburg (Russia)
- 2019 Survey of k-mer set of sets data structures for querying large collections of sequencing datasets, DSB, Dortmund (Germany)
- 2019 Survey of k-mer set of sets data structures for querying large collections of sequencing datasets, Helsinki Bioinformatics Day (Finland)
- 2019 Read correction for non-uniform coverages, RCAM Institut Pasteur Paris (France)
- 2019 Survey of k-mer set of sets data structures for querying large collections of sequencing datasets, at SeqBim, Marne La Vallée (France)
- 2018 BCOOL-Trans: accurate and variant-preserving correction for RNA-seq, Seqbio, Rouen (France)
- 2018 ELECTOR: EvaLuator of Error CorrectionTools for lOng Reads, Seqbio, Rouen (France)
- 2018 CARNAC-LR : Clustering coefficient-based Acquisition of RNA Communities in Long Reads, JOBIM, Marseille (France)
- 2017 De novo Clustering of Gene Expression in Transcriptomic Long Reads Data Sets, Seqbio, Lille (France)
- 2017 A highly scalable data structure for read similarity computation and its application to marine holobionts, RCAM, Paris (France)
- 2017 A scaling transcriptomic approach to study holobiont data sets, 4e colloque Génomique Environnementale, Marseille (France)
- 2016 Rconnector: a resource-frugal probabilistic dictionary and applications in (meta)genomics and transcriptomics, SeqBio, Nantes (France)
- 2015 kissDE : a replicate-wise and annotation-free R package for testing the association between differential variants and experimental conditions in high throughput sequencing data, SeqBio 2015, Orsay (France)
- 2015 kissDE : a package to test for differences in reads counts derived from variants in RNA sequencing data, Quatriemes rencontres R 2015, Lyon (France)
Other talks
Involvement in scientific events
Program committee and venue organization
- 2023 PC, OC @SeqBIM (France)
- 2021 OC @SPIRE (virtual)
- 2020 PC @SPIRE (virtual)
- 2020 PC @RECOMB-Seq (virtual)
- 2019 PC @SeqBIM (France)
- 2018 Volunteer @RECOMB-seq and RECOMB, Paris
- 2016 OC “Biological insights from raw high-throughput sequencing data” - Colib’read ANR workshop, Paris
Tutoring for scientists
- 2020,2022,2023 Evomics, workshop on Genomics, Cesky Krumlov, Czech Republic
- 2019,2022,2023 Tutor at Bilille training courses (RNA-seq analysis), Lille.
- 2015 Tutor at CNRS course: “Bioinformatique pour les NGS”, Montpellier
- 2014 Tutor at BGE & EMBnet tutorials: “RNA-seq analysis”, Lyon
- 2014 Tutor at PRABI training courses, “Analyse de données RNA-seq sous l’environnement Galaxy”, Lyon
Supervision
- 2022 Louis-Maël Gueguen (M2 internship, with Laurent Jacob), subject: GWAS for metagenomics
- 2022 Thomas Baudeau (M2 internship), subject: Mapping of structural long reads
- 2021-.. Khodor Hannoush (PhD student, with Pierre Peterlongo), subject: Dynamic pangenome graphs
- 2020 Agathénaïs Adiguna (L3 internship, with Rayan Chikhi), subject: Data-structures for large scale RNA queries
- 2018 Benjamin Churcheward (M2 internship, with Pierre Peterlongo), subject: Isoform prediction using long reads sequencing
- 2017 Lolita Lecompte (M2 internship, with Pierre Peterlongo), subject: Conception and evaluation of a pipeline for de novo study of long reads in transcriptomics
- 2014 Camille Sessegolo (L3 internship, with Vincent Lacroix), subject: Improvements in the kissplice2refgenome software
Teaching
Year | Level | Topic |
2021/2022 | M1 bioinformatique Université de Lille | responsable de l'UE structures de données |
2019/2020 | L2 informatique Université de Lille | TP/TP Algorithmique et structures de données |
2018/2019 | L2 informatique Université de Lille | TP/TP Algorithmique et structures de données |
2016/2017 | 4ème et 5ème année école d'ingenieur (M1/M2) INSA Rennes | Modélisation et ingénierie pour le vivant |
2016/2017 | 2ème année cycle prépa INSA Rennes | TP/TP Bases de Données |
2016/2017 | M1 ENS Rennes | Biostatistiques, programmation avec R |
2015/2016 | M1 ENS Rennes | Biostatistiques, programmation avec R |
2014/2015 | L1 Université Lyon 1 | TD Mathématiques pour les Sciences de la Vie |
Press review