Camille Marchet - CV and publication list, 2024
I am a CNRS researcher in sequence bioinformatics. My topics are data structures for indexing sequencing data at scale, and algorithms to work with sequences, in particular de novo and RNA-seq long and short reads.
1. Education
- 2015-2018: PhD in Computer Science, University of Rennes (France). Thesis title: From reads to transcripts: de novo methods for the analysis of transcriptome second and third generation sequencing. Jury:
- Rapporteur, rapportice: Eric Coissac (MCF HDR Univ Grenoble Alpes, LECA), Hélène Touzet (DR CNRS, CRIStAL)
- Examinateurs, examinatrice : Thomas Derrien (CR CNRS, IGDR), Dominique Lavenier (DR CNRS, IRISA), Thierry Lecroq (PR Univ Rouen, LITIS), Anne Siegel (DR CNRS, IRISA), Hagen Tilgner (Lecturer, Weil Cornell NY)
- Directeur : Pierre Peterlongo (CR Inria, IRISA)
- qualified in 2018 for CNU section 27
- 2013: MS in Ecology Evolution Biometry, Claude Bernad University of Lyon (France)
- 2013: Engineer degree in Bioinformatics and modeling, INSA Lyon (France)
- 2007: Baccalauréat, Lycée Romain Rolland (Clamecy, France)
2. Professional career
- 2021-present: Chargée de recherche CNRS (junior researcher) in BONSAI team, CRIStAL lab, Lille, France
- 2018-2020: Postdoc in BONSAI team, CRIStAL lab, Lille, France (supervisors: Rayan Chikhi, Mikaël Salson)
- 2015-2018: PhD student in GenScale team, IRISA lab, Rennes, France (director: Pierre Peterlongo)
- 2013-2015: Research engineer in ERABLE team, LBBE lab, Lyon, France (supervisor: Vincent Lacroix)
- Career breaks: parental leaves (September 2020-September 2021) and (October 2022-September 2023)
3. Prizes, awards, fellowships
- 2019: 2nd prize for the Gilles Kahn prize of the Société Informatique de France for PhDs in Computer Science
- 2015: MESR PhD funding (ranked 1/11)
4. Teaching, training and dissemination of science
Supervision
PhD students
- As a director
- 2024-present: Bastien Degardins, Enhancing RNA-seq data analysis through advanced de Bruijn graph structures and relational database integration
- Funding: JCJC ANR
- Implication: co-supervised with Charles Paperman (Inria Lille)
-
As a co-supervisor
- 2021-present: Khodor Hannoush,Dynamic pangenome graphs, at IRISA/Inria Rennes (France)
- Funding: ALPACA ITN
- Implication: 50%, co-supervised with Pierre Peterlongo (Inria Rennes)
- 2022-present: Thomas Baudeau, Mapping methods for new long reads viral data
- Funding: INSSANE ANR
- Implication: co-supervised with Mikaël Salson (CRIStAL Lille)
- 2023-present: Igor Martayan, discrete data structures for improved features on k-mer sets
- Funding: ENS Rennes
- Implication: co-supervised with Jean-Stéphane Varré (CRIStAL Lille)
Alumni
Pierre Berriet (M2, 2024, now PhD student our team), Margaux Mouton (M1, 2024), Louis-Maël Gueguen (M2, 2022, now PhD student in Montreal), Timothé Rouzé (M1, 2022, now PhD student in our team), Zakaria Tougui (L3, 2021, now PhD student in Grenoble), William Desaintjean (L3, 2021), Nika Zarubina (L3, 2021), Agathénaïs Adiguna (L3, 2020), Benjamin Churcheward (M2, 2018, defended his PhD in 2022 in Nantes), Lolita Lecompte (M2, 2017, now a research engineer at Institut Curie), Camille Sessegolo (L3, 2014, defended her PhD in 2022 in Lyon, now in the private sector)
Teaching
As a course responsible
- 2021, 2022, 2024: University of Lille, Data structures (MISO Masters, Bioinformatics)
As a part-time lecturer
- At University of Lille
- 2019, 2020: Algorithmics and data structures (Licence 2, Computer Science)
- At INSA Rennes
- 2016: Modeling and engineering for the living (Masters, Computer Science), 2016-17
- 2016: Data bases (Licence 2), 2016-17
- At ENS Rennes
- 2015, 2016: Biostatistics, R programming (Licence 3, Mathematics/Comp Sci)
- At Lyon University Claude Bernard
- 2014: Mathematics for biology (Licence 1, Biology), 2014-15
Training in summer/winter schools, graduate schools courses
- 2020, 2022, 2023, 2024: Teacher at Evomics Workshop on Genomics (Czech Republic)
- 2022: Teacher at Bilille training courses on RNA-seq analysis (France)
- 2021, 2024: Teacher at (JC)2BIM, GDR BIMMM’s school on algorithmics and statistics for bioinformatics (France)
Training for researchers and professionals
- 2023, 2024: Teacher at CNRS Formation Entreprises on RNA-seq analysis and Assembly (France)
- 2019, 2021-23: Teacher at Bilille training courses on RNA-seq analysis (France)
- Teaching assistant:
- 2015: CNRS Formation Entreprises “Bioinformatique pour les NGS” (France),
- 2014: Teaching Assistant at BGE & EMBnet tutorials: “RNA-seq analysis” (France) and PRABI training courses, “Analyse de données RNA-seq sous l’environnement Galaxy” (France)
Popularization
- 2020: Press article SARS COV2 et covid-19 : on-va jouer sur les mots, a short popularization article about coronavirus assembly with long reads (in French)
- 2020: Press article Rencontre à la frontière entre l’informatique et la biologie about my PhD thesis (in French)
- 2015: Genome Assembly workshop, IRISA Lab open day (France)
- 2014-15: Introduction to academic jobs to undergrad students, INSA Lyon (France)
Mentoring
- 2023-24: Mentor of one mentoree in the CRIStAL mentoring program
Organization of scientific events
As a coordinator
- 2024: MIGGS - National workshop on pangenome visualization, bugdet granted by GDR BIMMM (2k euros)
- 2024: K-mer days - Symposium organized for ANR Full-RNA (Dijon, France)
- 2022: TUDASTIC - National workshop on data-structures (Lille, France)
Chairs
- 2025: Recomb-seq (Seoul, Corea)
- 2022: Genome informatics (Hinxton, UK)
In the organization committee
- 2023: SeqBIM (Lille)
- 2022: JOBIM Mini symposium “Indexation et requêtage de grandes collections de données de séquençage” (Rennes)
- 2022: Transipedia ANR k-mer days mini Symposium (Marville)
- 2022: GDR IM (Lille)
- 2021: SPIRE (Lille)
- 2018: Volunteer at RECOMB (France)
- 2016: Colib’read ANR workshop “Biological insights from raw high-throughput sequencing data” (Evry)
Research visits
- 2019: (3 months) Iqbal lab, EBI, (Cambridge, UK)
- 2015: (3 weeks) Laboratório Nacional de Computação Científica (LNC, Petrópolis, Brazil) and Universidade de São Paulo (USP, Brazil)
5. Research administration and management
PhD defense jurys
As an examiner
- 2023: Victor Epain, Assemblage de fragments ADN : structures de graphes et échafaudage de génomes de chloroplastes (directed by Rumen Andonov), IRISA Rennes (France)
- 2021: Claudio Lorenzi, Design and implementation of bioinformatic tools for RNA sequencing data analysis (directed by William RITCHIE & Alban MANCHERON), IGH Montpellier (France)
PhD Thesis advisory committee
- 2024-26: Sasha Darmon, Détection de répétitions dans les graphes de de Bruijn issus de données RNA-seq (supervised by Vincent Lacroix and Arnaud Mary), LBBE (Lyon)
- 2024-26: Nina Marthe, Transfert d’annotation dans des graphes de pangénome (supervised by François Sabot), IRD (Montpellier)
- 2024-26: Siegfried Dubois, Comparaison et visualisation de graphes de pangénomes (supervised by Claire Lemaitre and Matthias Zytnicki), IRISA Rennes (France)
- 2021-23: Sandra Romain, Représentation, détection et quantification de variants de structure dans les pan-génomes (supervised by Claire Lemaitre and Fabrice Legeai), IRISA Rennes (France)
6. Animation, editorial activities
Reviewing for journals & conferences
- Regular reviews for journals: Nature Methods, Nature Communication, NAR, Bioinformatics, GigaScience, …
- External reviewer for conferences: RECOMB (2019, 2022), ISMB/ECCB (2020), RECOMB-Seq (2020), RECOMB (2020), SPIRE (2020)
Program committee for conferences/workshops
- 2024: ISMB (Canada), Recomb-Seq (US), ECCB (Finland), SPIRE (Mexico)
- 2023: SeqBIM (France)
- 2022: ISMB (USA), JOBIM (France), WABI (Germany), ECCB (Spain)
- 2021: SeqBIM (France)
- 2020: SPIRE (Online), RECOMB-seq (Italy/Online), SeqBIM (France)
- 2018: RECOMB-seq and RECOMB (volunteer, France)
Animation
- 2024-.. Coordinator of the Sequences data-structures journal club (on multiple sites in France, 2 past sessions)
- 2023-.. Member of the parity work group (cellule parité-égalité femmes hommes) at CRIStAL
7. Research
Themes
- De novo algorithms for RNA sequences
- Mapping for RNA structure data
- Data structures for sets of reads sets representation and indexing
- Pangenomics data structures
- Research of cancer signatures using k-mer approaches
Publications
Peer-reviewed journal articles (17)
- Conway-Bromage-Lyndon (CBL): an exact, dynamic representation of k-mer sets, I Martayan, B Cazaux, A Limasset & C Marchet, Bioinformatics 2024
- KaMRaT: a C++ toolkit for $k$-mer count matrix dimension reduction, H Xue, M Gallopin, C Marchet, H N Nguyen, Y Wang, A Lainé, C Bessiere, D Gautheret; Bioinformatics 2024
- A survey of mapping algorithms in the long-reads era, K Sahlin, T Baudeau, B Cazaux & C Marchet; Genome Biology 2023
- Scalable sequence database search using Partitioned Aggregated Bloom Comb-Trees C Marchet & A Limasset; Bioinformatics 2023
- BLight: Efficient exact associative structure for k-mers, C Marchet, M Kerbiriou, A Limasset; Bioinformatics, 2021
- Scalable long read self-correction and assembly polishing with multiple sequence alignment
P Morisse, C Marchet, A Limasset, T Lecroq, A Lefebvre; Scientific reports, 2021
- REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets, C Marchet, Z Iqbal, D Gautheret, M Salson, R Chikhi; Bioinformatics, 2020
- Data structures based on k-mers for querying large collections of sequencing data sets, C Marchet, C Boucher, SJ Puglisi, P Medvedev, M Salson, R Chikhi; Genome Research, 2020
- ELECTOR: Evaluator for long reads correction methods, C Marchet, P Morisse, L Lecompte, A Limasset, A Lefebvre, T Lecroq, P Peterlongo; Nucleic Acids Research Genomics and Bioinformatics, 2019
- Clustering de Novo by Gene of Long Reads from Transcriptomics Data, C Marchet, L Lecompte, C Da Silva, C Cruaud, J-M Aury, J Nicolas, P Peterlongo; Nucleic Acids Research, 2018
- A de novo approach to disentangle partner identity and function in holobiont systems, A Meng+, C Marchet+, E Corre, P Peterlongo, A Alberti, C Da Silva, P Wincker, E Pelletier, I Probert, J Decelle, S Le Crom, F Not, L Bittner; Microbiome, 2018 (+ co-first authors)
- A resource-frugal probabilistic dictionary and applications in bioinformatics, C Marchet, L Lecompte, A Limasset, L Bittner, P Peterlongo; Discrete Applied Mathematics, 2018
- Comparative assessment of long-read error-correction software applied to RNA-sequencing data, L Ishi Soares de Lima, C Marchet, S Caboche, C Da Silva, B Istace, J-M Aury, H Touzet, R Chikhi; Briefings in Bioinformatics, 2019
- SNP calling from RNA-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence, H Lopez Maestre, L Brinza, C Marchet, J Kielbassa, S Bastien, M Boutigny, D Monnin, A El Filali, CM Carareto, C Vieira, F Picard, N Kremer, F Vavre, M-F Sagot, V Lacroix; Nucleic Acids Research, 2016
- Complementarity of assembly-first and mapping-first approaches for alternative splicing annotation and differential analysis from RNAseq data, C Benoit-Pilven, C Marchet, E Chautard, L Lima, M-P Lambert, G Sacomoto, A Rey, C Bourgeois, D Auboeuf, V Lacroix; Scientific Reports, 2018
- Playing hide and seek with repeats in local and global de novo transcriptome assembly of short RNA-seq reads, L Lima, B Sinaimeri, G Sacomoto, H Lopez-Maestre, C Marchet, V Miele, M-F Sagot and V Lacroix; Algorithms for Molecular Biology, 2017
- Colibread on galaxy: a tools suite dedicated to biological information extraction from raw NGS reads, E Rivals, A Andrieux, A Z El Aabidine, B Cazaux, C Marchet, C Lemaitre, C Monjeaud, G Sacomoto, L Salmela, O Collin, P Peterlongo, R Uricaru, S Alves-Carvalho, V Lacroix, V Miele, Y LeBras; GigaScience,
2016
Papers accepted to conferences with proceedings (9)
- Conway-Bromage-Lyndon (CBL): an exact, dynamic representation of k-mer sets, I Martayan, B Cazaux, A Limasset & C Marchet, ISMB 2024
- Cdbgtricks: strategies to update a compacted de bruijn graph K Hannoush, C Marchet, P Peterlongo, PSC 2024
- Fractional Hitting Sets for Efficient and Lightweight Genomic Data Sketching, T Rouzé, I Martayan, C Marchet & A Limasset, WABI 2023
- Scalable sequence database search using Partitioned Aggregated Bloom Comb-Trees, C Marchet & A Limasset; ISMB 2023
- REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets, C Marchet, Z Iqbal, D Gautheret, M Salson, R Chikhi; ISMB, 2020
- Indexing De Bruijn graphs with minimizers, C Marchet, A Kerbiriou, A Limasset, RECOMB-SEQ, 2019
- CONSENT: Scalable self-correction of long reads with multiple sequence alignment
, P Morisse, C Marchet, A Limasset, A Lefebvre, T Lecroq, RECOMB-Seq, 2019
- A resource-frugal probabilistic dictionary and applications in (meta)genomics, C Marchet, A Limasset, L Bittner, P Peterlongo, PSC, 2016
- Navigating in a Sea of Repeats in RNA-seq without Drowning, G Sacomoto, B Sinaimeri, C Marchet, V Miele, MF Sagot, V Lacroix, International Workshop on Algorithms in Bioinformatics (WABI), 2014
PhD manuscript
- From reads to transcripts: de novo methods for the analysis of transcriptome second and third generation sequencing.
Software
REINDEER in Transipédia:
- https://transipedia.fr
- github.com/kamimrcht/REINDEER
Former activities:
- kissplice.prabi.fr
- bioconductor.org/packages/release/bioc/html/kissDE.html
Involvement in research projects
- 2024-.. Find-RNA ANR, Data-structures for collections of sets of (meta-)transcriptomics data (PI)
- 2024-.. ESCALATE INSERM MIC project, Efficient data-structures for hundreds of thousands of human cancer RNA-seq datasets (member)
- 2023-.. Full-RNA ANR, Indexing large scale 2nd and 3rd generation RNA datasets(member)
- 2022-.. INSSANE ANR, Novel methods for studying RNA structures (co-PI)
- 2021-.. ALPACA ITN, Methods for pangenomics(member)
- 2019-.. Seqdigger ANR, Indexing large scale genomic samples” (member)
- 2018-2022 Transipédia ANR, _Indexing large scale RNA-seq datasets__(member)
- 2018-.. ASTER ANR, Algorithms for 3rd generation RNA sequencing”(member)
- 2017 CNRS MASTODONS project, Correction of 3rd generation sequencing data(member)
- 2015-18 Hydrogen ANR_, Indexing datasets for environmental genomics(member)
- 2013-16 Colib’read ANR, De novo methods for the variant calling in short read sequencing (member)
Communication
Invited talks to international workshops/conferences
- Incoming: keynote TBA, Genome Informatics, Cambridge (UK)
- 2024: Keynote, The de Bruijn graph, a computational structure for pangenomes, IGGSy, Ascona (Switzerland)
- 2024: Reference-free pangenomics and other large indexes, EAGS International Environmental and Agronomical Genomics symposium, Toulouse (France)
- 2023: Hashing-based data-structures for querying large k-mer (collections of) sets, CiE, Batumi (Georgia)
- 2023: Hashing-based data-structures for querying large k-mer (collections of) sets, Sequences in London (United Kingddom)
- 2022: Scalable sequence database search using approximate membership data structures, Genome Informatics, Cambridge (United Kingdom)
- 2022: How to improve student/advisor relationships, WABI, ALPACA 2nd Annual Workshop, Potsdam (Germany)
- 2018: A de novo approach to disentangle partner identity and function in holobiont systems, Advances techniques to study and exploit the sponge and coral microbiomes Workshop, ULB Brussels (Belgium)
Invited talks to national workshops/conferences
- 2024: Reference-free transcriptomics and other large indexes, Statistical Methods for Post Genomic Data workshop, Paris (France)
- 2022: Data-structures for querying large k-mer (collections of) sets, JOBIM mini-symposium, Rennes (France)
- 2020: From reads to transcripts: de novo methods for the analysis of transcriptome second and third generation sequencing, SIF congress, INSA de Lyon (France)
- 2020: Scalable data structures for sequencing data, Symposium GDR Madics, Lyon (France) (covid-19 : canceled)
- 2018: CARNAC-LR and C2C: de novo clustering and detection of alternative isoforms in Third Generation Sequencing transcriptomes, Genotoul Biostats/Bioinfo day, INRA Toulouse (France)
- 2017: De novo Clustering of Gene Expressed Variants in Transcriptomic Long Reads Data Sets at workshop RNA-Seq and Nanopore sequencing, Genoscope Evry (France)
Invited talks to seminars
- 2023: Sketching in sequence bioinformatics: methods and applications, RT MIA “Réduction de dimension pour l’apprentissage et la visualisation” seminars, Lyon (France)
- 2022: Data-structures for querying large k-mer (collections of) sets, KIM Data and Life Sciences seminars, Montpellier (France, online)
- 2022: Data structures for large k-mer sets, MIAT seminar, INRAE Toulouse (France, online)
- 2019: New methodologies for the analysis of transcriptome sequences, MAB team seminar, LIRMM Montpellier (France)
- 2018: From reads to transcripts: de novo methods to analyze transcriptome 2d and 3d generations sequencing data, Roscoff Biological Station seminar (France)
- 2018: From reads to transcripts: de novo methods to analyze transcriptome 2d and 3d generations sequencing data, INRAe Toulouse (France)
- 2018: A highly scalable data structure for read similarity computation and its application to marine holobionts, EEB group meeting, ULB Brussels (Belgium)
- 2018: CARNAC-LR: clustering genes expressed variants from long read RNA sequencing, team TIBS seminar, LITIS Rouen (France)
- 2017: Rconnector: a resource-frugal probabilistic dictionary and applications in (meta)genomics and transcriptomics, LBBE NGS group seminar, Lyon (France)
Selected talks in workshops/conferences
- 2024: Vizitig: interactive sequence de Bruijn graphs using databases, MIGGS Lille (France)
- 2023: KmerStore: interactive manipulation and visualization of graphs from collections of sequences, seqBIM Lille (France)
- 2020: REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets, BiATA, St Petersburg (Russia, virtual conference)
- 2020: REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets, ISMB, Montreal (Canada, virtual conference)
- 2020: REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets, JOBIM, Montpellier (France, virtual conference)
- 2020: REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets, DSB, Rennes (France)
- 2019: Indexing De Bruijn graphs with minimizers, RECOMB-seq, Washington DC (US)
- 2019: Indexing De Bruijn graphs with minimizers, BiATA, St Petersburg (Russia)
- 2019: Survey of k-mer set of sets data structures for querying large collections of sequencing datasets, DSB, Dortmund (Germany)
- 2019: Survey of k-mer set of sets data structures for querying large collections of sequencing datasets, Helsinki Bioinformatics Day (Finland)
- 2019: Read correction for non-uniform coverages, RCAM, Institut Pasteur Paris (France)
- 2019: Survey of k-mer set of sets data structures for querying large collections of sequencing datasets, seqBIM Marne la Vallée (France)
- 2018: BCOOL-Trans: accurate and variant-preserving correction for RNA-seq, Seqbio, Rouen (France)
- 2018: ELECTOR: EvaLuator of Error CorrectionTools for lOng Reads, Seqbio, Rouen (France)
- 2018: CARNAC-LR : Clustering coefficient-based Acquisition of RNA Communities in Long Reads, JOBIM, Marseille (France)
- 2017: De novo Clustering of Gene Expression in Transcriptomic Long Reads Data Sets, Seqbio, Lille (France)
- 2017: A highly scalable data structure for read similarity computation and its application to marine holobionts, RCAM, Paris (France)
- 2017: A scaling transcriptomic approach to study holobiont data sets, 4e colloque Génomique Environnementale, Marseille (France)
- 2016: Rconnector: a resource-frugal probabilistic dictionary and applications in (meta)genomics and transcriptomics, SeqBio, Nantes (France)
- 2015: kissDE : a replicate-wise and annotation-free R package for testing the association between differential variants and experimental conditions in high throughput sequencing data, SeqBio 2015, Orsay (France)
- 2015: kissDE : a package to test for differences in reads counts derived from variants in RNA sequencing data, Quatriemes rencontres R 2015, Lyon (France)
8. Research contracts, grants
As a coordinator
- 2024-2028 ANR project Find-RNA
- ~190k euros
- collaboration with CRIL, University of Lens and SciLife lab, University of Stockholm
- data-structures for collections of sets of RNA-seq and meta-transcriptomics
As a partner
- 2022-2026 ANR project INSSANE
- ~330k euros
- collaboration with LIX Polytechnique, CiTCoM Université de Paris and LCBPT Université de Paris
- integration of novel algorithms, custom chemistry, and new probing protocols for studying large structured RNA