Bio / topics
I am a research associate in BONSAI team (Lille, France). My work focuses on methods and data structures in sequence bioinformatics, with applications to RNA in particular.
Je suis chargée de recherche au CNRS dans l’équipe BONSAI (Lille, France). Je travaille sur des méthodes et structures de données dédiées à la bioinformatique des séquences, avec souvent des applications à l’ARN.
Intéressé.e par une formation à la bioinformatique pour les données de séquençage ? Le CNRS propose une formation à Lille (labos/entreprises) : lien.
After an engineer degree in Bioinformatics from INSA de Lyon and a MsC in Ecology and Evolution from Université Claude Bernard Lyon 1, I worked for two years as an engineer in ERABLE team (LBBE, Lyon) with Vincent Lacroix. I obtained a PhD funding in GenScale team (Rennes, France), where I was supervised by Pierre Peterlongo. I defended my PhD in 2018 and joined BONSAI in the CRIStAL lab afterwards as a postdoc. Lately I was recruited by the CNRS to work as a researcher in the same lab (CV).
My postdoc took part in Transipedia ANR, with Rayan Chikhi and Mikaël Salson. Transipedia aims at being a transcriptome-encyclopedia, e.g., facilitating indexing, query and exploitation of the numerous publicly available RNA-seq data. I am mostly working on new data structures to index large collections of NGS datasets.
Before and during my PhD I worked on methods for transcriptomics, in particular for de novo variants discovery and RNA long read analysis.
Job offers
- Internship position on pangenomics visualization, see the job offers page
- Internship position on sequence partition using minizers, see the offer
News
Blog posts / short articles
Research projects
Present
- 2024-.. Find-RNA ANR JCJC project, Data-structures for collections of sets of (meta-)transcriptomics data (PI)
- 2024-.. ESCALATE INSERM MIC project, Scalable methods for human cancer RNA-seq data (member)
- 2021-.. INSSANE ANR project, Novel methods for studying RNA structures (co-PI)
- 2022-.. Full RNA ANR project,Indexing large scale 2nd and 3rd generation RNA datasets (member)
- 2020-.. ALPACA ITN, Methods for pangenomics (informal member)
Past
- 2018-22 Transipédia ANR project, Indexing large scale RNA-seq datasets (member)
- 2017 CNRS MASTODONS project, Correction of 3rd generation sequencing data (member)
- 2013-16 Colib’read ANR project, De novo methods for the variant calling in short read sequencing (member)
Journal Publications
- Conway-Bromage-Lyndon (CBL): an exact, dynamic representation of k-mer sets, I Martayan, B Cazaux, A Limasset & C Marchet, Bioinformatics 2024 (in press)
- KaMRaT: a C++ toolkit for $k$-mer count matrix dimension reduction, H Xue, M Gallopin, C Marchet, H N Nguyen, Y Wang, A Lainé, C Bessiere, D Gautheret; Bioinformatics 2024
- A survey of mapping algorithms in the long-reads era, K Sahlin, T Baudeau, B Cazaux, C Marchet; Genome Biology, 2023
- Scalable sequence database search using Partitioned Aggregated Bloom Comb-Trees, C Marchet & A Limasset; Bioinformatics 2023
- BLight: Efficient exact associative structure for k-mers, C Marchet, M Kerbiriou, A Limasset; Bioinformatics, 2021
- Scalable long read self-correction and assembly polishing with multiple sequence alignment, P Morisse, C Marchet, A Limasset, T Lecroq, A Lefebvre; Scientific Reports, 2021
- Data structures based on k-mers for querying large collections of sequencing data sets, C Marchet, C Boucher, S. J. Puglisi, P Medvedev, M Salson, R Chikhi; Genome Research 2020
- REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets, C Marchet, Z Iqbal, D Gautheret, M Salson, R Chikhi; Bioinformatics, 2020
- ELECTOR: Evaluator for long reads correction methods, C Marchet, P Morisse, L Lecompte, A Limasset, A Lefebvre, T Lecroq, P Peterlongo; Nucleic acid research Genomics and Bioinformatics, 2019
- Comparative assessment of long-read error-correction software applied to RNA-sequencing data, L Ishi Soares de Lima, C Marchet, S Caboche, C Da Silva, B Istace, J-M Aury, H Touzet, R Chikhi; Briefings in Bioinformatics, 2019
- Clustering de Novo by Gene of Long Reads from Transcriptomics Data, C Marchet, L Lecompte, C Da Silva, C Cruaud, J-M Aury, J Nicolas, P Peterlongo; Nucleic Acids Research, 2018
- A de novo approach to disentangle partner identity and function in holobiont systems A Meng, C Marchet, E Corre, P Peterlongo, A Alberti, C Da Silva, P Wincker, E Pelletier, I Probert, J Decelle, S Le Crom, F Not, L Bittner; Microbiome, 2018
- A resource-frugal probabilistic dictionary and applications in bioinformatics, C Marchet, L Lecompte, A Limasset, L Bittner, P Peterlongo; Discrete Applied Mathematics, 2018
- Complementarity of assembly-first and mapping-first approaches for alternative splicing annotation and differential analysis from RNAseq data C Benoit-Pilven, C Marchet, E Chautard, L Lima, M-P Lambert, G Sacomoto, A Rey, C Bourgeois, D Auboeuf, V Lacroix; Scientific Reports, 2018
- Playing hide and seek with repeats in local and global de novo transcriptome assembly of short RNA-seq reads L Lima, B Sinaimeri, G Sacomoto, H Lopez-Maestre, C Marchet, V Miele, M-F Sagot and V Lacroix; Algorithms for Molecular Biology, 2017
- SNP calling from RNA-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence H Lopez Maestre, L Brinza, C Marchet, J Kielbassa, S Bastien, M Boutigny, D Monnin, A El Filali, CM Carareto, C Vieira, F Picard, N Kremer, F Vavre, M-F Sagot, V Lacroix; Nucleic Acids Research, 2016
- Colibread on galaxy: a tools suite dedicated to biological information extraction from raw NGS reads E Rivals, A Andrieux, A Z El Aabidine, B Cazaux, C Marchet, C Lemaitre, C Monjeaud, G Sacomoto, L Salmela, O Collin, P Peterlongo, R Uricaru, S Alves-Carvalho, V Lacroix, V Miele, Y LeBras; GigaScience, 2016
Research papers accepted to peer-reviewed conferences
- Cdbgtricks: strategies to update a compacted de Bruijn graph, K Hannoush, C Marchet and P Peterlongo, PSC 2024
- Conway-Bromage-Lyndon (CBL): an exact, dynamic representation of k-mer sets, I Martayan, B Cazaux, A Limasset & C Marchet, ISMB 2024
- Fractional Hitting Sets for Efficient and Lightweight Genomic Data Sketching, T Rouzé, I Martayan, C Marchet & A Limasset, WABI 2023
- Scalable sequence database search using Partitioned Aggregated Bloom Comb-Trees, C Marchet & A Limasset; ISMB 2023
- REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets, C Marchet, Z Iqbal, D Gautheret, M Salson, R Chikhi; ISMB 2020
- Indexing De Bruijn graphs with minimizers, C Marchet, A Kerbiriou, A Limasset; RECOMB-SEQ 2019
- CONSENT: Scalable self-correction of long reads with multiple sequence alignment
, P Morisse, C Marchet, A Limasset, A Lefebvre, T Lecroq; RECOMB-SEQ 2019
- A resource-frugal probabilistic dictionary and applications in (meta)genomics, C Marchet, A Limasset, L Bittner, P Peterlongo; PSC 2016
- Navigating in a Sea of Repeats in RNA-seq without Drowning, G Sacomoto, B Sinaimeri, C Marchet, V Miele, MF Sagot, V Lacroix; WABI 2014
Preprints
- Advances in practical k-mer sets: essentials for the curious, C Marchet (2024)
- Advances in colored k-mer sets: essentials for the curious, C Marchet (2024)
- Constrained enumeration of k-mers from a collection of references with metadata, F Ingels, I Martayan, M Salson and C Marchet (2024)
- Exploring a large cancer cell line RNA-sequencing dataset with k-mers, C Bessière, H Xue, B Guibert, A Boureux, F Rufflé, J Viot, R Chikhi, M Salson, C Marchet, T Commes and D Gautheret (2024)
Communications
Invited talks
- 2024: Incoming keynote TBA, Genome Informatics, Cambridge (UK)
- 2024: The de Bruijn graph, a computational structure for pangenomes, Iggsy, Ascona (Switzerland)
- 2024: Reference-free pangenomics and other large indexes, EAGS International Environmental and Agronomical Genomics symposium, Toulouse (France)
- 2024: Reference-free transcriptomics and other large indexes, Statistical Methods for Post Genomic Data workshop, Paris (France)
- 2023: Hashing-based data-structures for querying large k-mer (collections of) sets, CiE, Batumi (Georgia)
- 2023 Hashing-based data-structures for querying large k-mer (collections of) sets, Sequences in London, UK
- 2022 Scalable sequence database search using Partitioned Aggregated Bloom Comb-Trees, Genome Informatics, Cambridge (UK)
- 2022 How to improve student/advisor relationships, WABI, ALPACA 2nd Annual Workshop, Potsdam (Germany)
- 2022 Data-structures for querying large k-mer (collections of) sets, JOBIM mini-symposium, Rennes (France)
- 2022 Data-structures for querying large k-mer (collections of) sets, KIM Data and Life Sciences seminars, Montpellier (France, online) [slides] [recording (in French)]
- 2022 Data structures for large k-mer sets, MIAT seminars, Toulouse (France, online) [slides]
- 2020 From reads to transcripts: de novo methods for the analysis of transcriptome second and third generation sequencing, SIF congress, INSA de Lyon (France)
- 2020 Scalable data structures for sequencing data, Symposium GDR Madics, Lyon (France) (covid-19 : canceled)
- 2019 New methodologies for the analysis of transcriptome sequences, MAB team seminar, LIRMM Montpellier (France)
- 2018 CARNAC-LR and C2C: de novo clustering and detection of alternative isoforms in Third Generation Sequencing transcriptomes, Genotoul Biostats/Bioinfo day, INRA Toulouse (France)
- 2018 From reads to transcripts: de novo methods to analyze transcriptome 2d and 3d generations sequencing data, Roscoff Biological Station seminar (France)
- 2018 A de novo approach to disentangle partner identity and function in holobiont systems, Advances techniques to study and exploit the sponge and coral microbiomes Workshop, ULB Brussels (Belgium)
- 2018 A highly scalable data structure for read similarity computation and its application to marine holobionts, EEB group meeting, ULB Brussels (Belgium)
- 2018 CARNAC-LR: clustering genes expressed variants from long read RNA sequencing, team TIBS seminar, LITIS Rouen (France)
- 2017 De novo Clustering of Gene Expressed Variants in Transcriptomic Long Reads Data Sets at workshop RNA-Seq and Nanopore sequencing, Genoscope Evry (France)
- 2017 Rconnector: a resource-frugal probabilistic dictionary and applications in (meta)genomics and transcriptomics, LBBE NGS group seminar, Lyon (France)
Workshops/conferences/seminars
- 2024: Vizitig: interactive sequence de Bruijn graphs using databases, MIGGS Lille (France)
- 2023: KmerStore: interactive manipulation and visualization of graphs from collections of sequences, seqBIM Lille (France)
- 2020 REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets, BIATA, St Petersburg (Russia, virtual conference)
- 2020 REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets, ISMB, Montreal (Canada, virtual conference)
- 2020 REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets, JOBIM, Montpellier (France, virtual conference)
- 2020 REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets, DSB, Rennes (France)
- 2019 Indexing De Bruijn graphs with minimizers, RECOMB-seq, Washington DC (US)
- 2019 Indexing De Bruijn graphs with minimizers, BiATA, St Petersburg (Russia)
- 2019 Survey of k-mer set of sets data structures for querying large collections of sequencing datasets, DSB, Dortmund (Germany)
- 2019 Survey of k-mer set of sets data structures for querying large collections of sequencing datasets, Helsinki Bioinformatics Day (Finland)
- 2019 Read correction for non-uniform coverages, RCAM Institut Pasteur Paris (France)
- 2019 Survey of k-mer set of sets data structures for querying large collections of sequencing datasets, at SeqBim, Marne La Vallée (France)
- 2018 BCOOL-Trans: accurate and variant-preserving correction for RNA-seq, Seqbio, Rouen (France)
- 2018 ELECTOR: EvaLuator of Error CorrectionTools for lOng Reads, Seqbio, Rouen (France)
- 2018 CARNAC-LR : Clustering coefficient-based Acquisition of RNA Communities in Long Reads, JOBIM, Marseille (France)
- 2017 De novo Clustering of Gene Expression in Transcriptomic Long Reads Data Sets, Seqbio, Lille (France)
- 2017 A highly scalable data structure for read similarity computation and its application to marine holobionts, RCAM, Paris (France)
- 2017 A scaling transcriptomic approach to study holobiont data sets, 4e colloque Génomique Environnementale, Marseille (France)
- 2016 Rconnector: a resource-frugal probabilistic dictionary and applications in (meta)genomics and transcriptomics, SeqBio, Nantes (France)
- 2015 kissDE : a replicate-wise and annotation-free R package for testing the association between differential variants and experimental conditions in high throughput sequencing data, SeqBio 2015, Orsay (France)
- 2015 kissDE : a package to test for differences in reads counts derived from variants in RNA sequencing data, Quatriemes rencontres R 2015, Lyon (France)
Other talks
Involvement in scientific events
PC Chair
PC
- 2024 MIGGS (FRANCE), ISMB (Canada), Recomb-Seq (US), ECCB (Finland), SPIRE (Mexico)
- 2023 SeqBIM (France)
- 2020 SPIRE (virtual)
- 2020 RECOMB-Seq (virtual)
- 2019 SeqBIM (France)
Organization committee
- 2024 MIGGS (FRANCE)
- 2023 SeqBIM (France)
- 2021 SPIRE (virtual)
- 2018 Volunteer RECOMB-seq and RECOMB, Paris
- 2016 “Biological insights from raw high-throughput sequencing data” - Colib’read ANR workshop, Paris
Training in summer/winter schools, graduate schools courses
- 2020, 2022, 2023, 2024: Teacher at Evomics, workshop on Genomics (Czech Republic)
- 2022: Teacher at Bilille training courses on RNA-seq analysis (France)
- 2021, 2024: Teacher at (JC)2BIM, GDR BIMMM’s school on algorithmics and statistics for bioinformatics (France)
Training for researchers and professionals
- 2023, 2024: Teacher at CNRS Formation Entreprises on RNA-seq analysis and Assembly (France)
- 2019, 2021-23: Teacher at Bilille training courses on RNA-seq analysis (France)
- Teaching assistant:
- 2015: CNRS Formation Entreprises “Bioinformatique pour les NGS” (France),
- 2014: Teaching Assistant at BGE & EMBnet tutorials: “RNA-seq analysis” (France) and PRABI training courses, “Analyse de données RNA-seq sous l’environnement Galaxy” (France)
Supervision and direction
Direction of PhD students
- 2024-.. Bastien Degardins (with Charles Paperman), subject: Integration of sequence de Bruijn graphs in databases and visualization
Supervision
- 2023-.. Igor Martayan (PhD student, with Jean-Stéphane Varré), subject: locality-preserving k-mer data structures
- 2023-.., FLorian Ingels (postdoc)
- 2022-.. Thomas Baudeau (PhD student, with Mikaël Salson), subject: Mapping of structural long reads
- 2021-.. Khodor Hannoush (PhD student, with Pierre Peterlongo), subject: Dynamic pangenome graphs
Teaching
Year | Level | Topic |
2023/2024 | M1 bioinformatique Université de Lille | responsable de l'UE structures de données |
2021/2022 | M1 bioinformatique Université de Lille | responsable de l'UE structures de données |
2019/2020 | L2 informatique Université de Lille | TP/TP Algorithmique et structures de données |
2018/2019 | L2 informatique Université de Lille | TP/TP Algorithmique et structures de données |
2016/2017 | 4ème et 5ème année école d'ingenieur (M1/M2) INSA Rennes | Modélisation et ingénierie pour le vivant |
2016/2017 | 2ème année cycle prépa INSA Rennes | TP/TP Bases de Données |
2016/2017 | M1 ENS Rennes | Biostatistiques, programmation avec R |
2015/2016 | M1 ENS Rennes | Biostatistiques, programmation avec R |
2014/2015 | L1 Université Lyon 1 | TD Mathématiques pour les Sciences de la Vie |
Press review