Camille Marchet
I am a CNRS researcher in sequence bioinformatics. My topics are data structures for indexing sequencing data at scale, and algorithms to work with sequences, in particular de novo and RNA-seq long and short reads.
1. Education
- 2015-2018
- PhD in Computer Science, University of Rennes (France). Thesis title: From reads to transcripts: de novo methods for the analysis of transcriptome second and third generation sequencing. Jury:
- Eric Coissac (MCF HDR Univ Grenoble Alpes, LECA) - Rapporteur
- Hélène Touzet (DR CNRS, CRIStAL) - Rapportrice
- Examinateurs : Thomas Derrien (CR CNRS, IGDR), Dominique Lavenier (DR CNRS, IRISA), Thierry Lecroq (PR Univ Rouen, LITIS), Anne Siegel (DR CNRS, IRISA), Hagen Tilgner (Lecturer, Weil Cornell NY)
- Pierre Peterlongo, (CR Inria, IRISA) - Directeur
- 2013
- MS in Ecology Evolution Biometry, Claude Bernad University of Lyon (France)
- Engineer degree in Bioinformatics and modeling, INSA Lyon (France)
- 2007
- Baccalauréat, Lycée Romain Rolland (Clamecy, France)
2. Professional career
- 2021-..
- Chargée de recherche CNRS in BONSAI team, CRIStAL lab, Lille, France
- 2020-2021
- Career break (parental leave) (September 2020-September 2021)
- 2018-2020
- Postdoc in BONSAI team, CRIStAL lab, Lille, France (supervisors: Rayan Chikhi, Mikaël Salson)
- 2015-2018
- PhD student in GenScale team, IRISA lab, Rennes, France (director: Pierre Peterlongo)
- 2013-2015
- Research engineer in ERABLE team, LBBE lab, Lyon, France (supervisor: Vincent Lacroix)
3. Prizes, awards, fellowships
- 2019: Honorable mention for the Gilles Kahn prize of the Société Informatique de France for PhDs in Computer Science
- 2020: SFBI travel grant
- 2015: MESR PhD funding
4. Teaching, training and dissemination of science
Supervision
PhD students
- 2021-..
- Khodor Hannoush, subject: Dynamic pangenome graphs, at IRISA/Inria Rennes (France)
- Funding: ALPACA ITN.
- Implication: 75%. Co-supervised with Pierre Peterlongo (Inria Rennes)
Interns
Thomas Baudeau (M2, 2022), Louis-Maël Gueguen (M2, 2022), Agathénaïs Adiguna (L3, 2020), Benjamin Churcheward (M2, 2018), Lolita Lecompte (M2, 2017), Camille Sessegolo (L3, 2014)
Teaching
As a course responsible
- University of Lille, Data structures (MISO Masters, Bioinformatics), 2021-22
As a part-time lecturer
- At University of Lille
- Algorithmics and data structures (Licence 2, Computer Science), 2019-20, 2018-19
- At INSA Rennes
- Modeling and engineering for the living (Masters, Computer Science), 2016-17
- Data bases (Licence 2), 2016-17
- At ENS Rennes
- Biostatistics, R programming (Licence 3, Mathematics/Comp Sci), 2015-16, 2016-17
- At Lyon University Claude Bernard
- Mathematics for biology (Licence 1, Biology), 2014-15
Training for PhD students & postdocs (summer/winter schools, graduate schools courses)
- 2021: Teacher at (JC)2BIM, GDR BIM’s school on algorithmics and statistics for bioinformatics (France)
- 2020, 2022: Teacher at Evomics Workshop on Genomics (Czech Republic)
- 2022: Teacher at Bilille training courses on RNA-seq analysis (France)
Training for researchers and professionals
- 2019, 2021, 2022: Teacher at Bilille training courses on RNA-seq analysis (France)
- 2015: Teaching Assistant at CNRS course: “Bioinformatique pour les NGS” (France)
- 2014: Teaching Assistant at BGE & EMBnet tutorials: “RNA-seq analysis” (France)
- 2014: Teaching Assistant at PRABI training courses, “Analyse de données RNA-seq sous l’environnement Galaxy” (France)
Popularization
- 2020: Press article SARS COV2 et covid-19 : on-va jouer sur les mots, a short popularization article about coronavirus assembly with long reads (in French)
- 2020: Press article Rencontre à la frontière entre l’informatique et la biologie about my PhD thesis (in French)
- 2015: Genome Assembly workshop, IRISA Lab open day (France)
- 2014-15: Introduction to academic jobs to undergrad students (INSA Lyon, France)
Organization of scientific events
As a coordinator
- 2022: TUDASTIC (France)
National workshop, Lille University participation: ~600 euros, GDR IM: ~800 euros, Inria: ~800 euros.
In the organization committee
- 2022: JOBIM Mini symposium “Indexation et requêtage de grandes collections de données de séquençage” (Rennes)
- 2022: Transipedia ANR k-mer days seminar (Marville)
- 2022: GDR IM (France)
- 2021: SPIRE (France)
- 2018: Volunteer at RECOMB (France)
- 2016: Colib’read ANR workshop “Biological insights from raw high-throughput sequencing data” (France)
5. Research administration and management
PhD defense jurys
As an examiner
- 2021: Claudio Lorenzi, Design and implementation of bioinformatic tools
for RNA sequencing data analysis (directed by William RITCHIE & Alban MANCHERON), IGH Montpellier (France)
PhD Thesis advisory committee
- 2021-..: Sandra Romain, Représentation, détection et quantification de variants de structure dans les pan-génomes (directed by Claire Lemaitre), IRISA Rennes (France)
6. Editorial/program committees
Reviewing for journals
- 2022: Bioinformatics, BMC Bioinformatics, IEEE/ACM Transactions on Computational Biology and Bioinformatics
- 2021: BMC Bioinformatics, Natural Computing
- 2020: BMC Supplements, Bioinformatics, Nucleic Acids Research, Gigascience, Bioinformatics, Natural Computing (NACO)
- 2019: BMC Supplements, Bioinformatics, Nucleic Acid Research
- 2018: GigaScience
- 2017: BMC Bioinformatics
Program committee for conferences/workshops
- 2022: ISMB (USA), JOBIM (France), WABI (Germany), ECCB (Spain)
- 2021: SeqBIM (France)
- 2020: SPIRE (Online), RECOMB-seq (Italy/Online), SeqBIM (France)
- 2018: RECOMB-seq and RECOMB (volunteer, France)
External reviewer for conferences
- 2022: RECOMB
- 2020: ISMB/ECCB,RECOMB-Seq, RECOMB, SPIRE
- 2019: RECOMB
7. Research
Themes
-
De novo algorithms for RNA sequences
-
Mapping for RNA structure data
-
Data structures for sets of reads sets representation and indexing
-
Pangenomics data structures
-
Research of cancer signatures using k-mer approaches
Publications
Peer-reviewed Journal articles
- BLight: Efficient exact associative structure for k-mers, C Marchet, M Kerbiriou, A Limasset; Bioinformatics, 2021
- Scalable long read self-correction and assembly polishing with multiple sequence alignment
P Morisse, C Marchet, A Limasset, T Lecroq, A Lefebvre; Scientific reports 11, 2021
- REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets, C Marchet, Z Iqbal, D Gautheret, M Salson, R Chikhi; Bioinformatics, 2020
- Data structures based on k-mers for querying large collections of sequencing data sets, C Marchet, C Boucher, SJ Puglisi, P Medvedev, M Salson, R Chikhi; Genome Research, 2020
- ELECTOR: Evaluator for long reads correction methods, C Marchet, P Morisse, L Lecompte, A Limasset, A Lefebvre, T Lecroq, P Peterlongo; Nucleic Acids Research Genomics and Bioinformatics, 2019
- Clustering de Novo by Gene of Long Reads from Transcriptomics Data, C Marchet, L Lecompte, C Da Silva, C Cruaud, J-M Aury, J Nicolas, P Peterlongo; Nucleic Acids Research, 2018
- A de novo approach to disentangle partner identity and function in holobiont systems, A Meng+, C Marchet+, E Corre, P Peterlongo, A Alberti, C Da Silva, P Wincker, E Pelletier, I Probert, J Decelle, S Le Crom, F Not, L Bittner; Microbiome, 2018 (+ co-first authors)
- A resource-frugal probabilistic dictionary and applications in bioinformatics, C Marchet, L Lecompte, A Limasset, L Bittner, P Peterlongo; Discrete Applied Mathematics, 2018
- Comparative assessment of long-read error-correction software applied to RNA-sequencing data, L Ishi Soares de Lima, C Marchet, S Caboche, C Da Silva, B Istace, J-M Aury, H Touzet, R Chikhi; Briefings in Bioinformatics, 2019
- SNP calling from RNA-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence, H Lopez Maestre, L Brinza, C Marchet, J Kielbassa, S Bastien, M Boutigny, D Monnin, A El Filali, CM Carareto, C Vieira, F Picard, N Kremer, F Vavre, M-F Sagot, V Lacroix; Nucleic Acids Research, 2016
- Complementarity of assembly-first and mapping-first approaches for alternative splicing annotation and differential analysis from RNAseq data, C Benoit-Pilven, C Marchet, E Chautard, L Lima, M-P Lambert, G Sacomoto, A Rey, C Bourgeois, D Auboeuf, V Lacroix; Scientific Reports, 2018
- Playing hide and seek with repeats in local and global de novo transcriptome assembly of short RNA-seq reads, L Lima, B Sinaimeri, G Sacomoto, H Lopez-Maestre, C Marchet, V Miele, M-F Sagot and V Lacroix; Algorithms for Molecular Biology, 2017
- Colibread on galaxy: a tools suite dedicated to biological information extraction from raw NGS reads, E Rivals, A Andrieux, A Z El Aabidine, B Cazaux, C Marchet, C Lemaitre, C Monjeaud, G Sacomoto, L Salmela, O Collin, P Peterlongo, R Uricaru, S Alves-Carvalho, V Lacroix, V Miele, Y LeBras; GigaScience,
2016
Papers accepted to conferences with proceedings
- Scalable sequence database search using Partitioned Aggregated Bloom Comb-Trees, C Marchet & A Limasset; RECOMB-SEQ 2022.
- REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets, C Marchet, Z Iqbal, D Gautheret, M Salson, R Chikhi; ISMB, 2020
- Indexing De Bruijn graphs with minimizers, C Marchet, A Kerbiriou, A Limasset, RECOMB-SEQ, 2019
- CONSENT: Scalable self-correction of long reads with multiple sequence alignment
, P Morisse, C Marchet, A Limasset, A Lefebvre, T Lecroq, RECOMB-Seq, 2019
- A resource-frugal probabilistic dictionary and applications in (meta)genomics, C Marchet, A Limasset, L Bittner, P Peterlongo, PSC, 2016
- Navigating in a Sea of Repeats in RNA-seq without Drowning, G Sacomoto, B Sinaimeri, C Marchet, V Miele, MF Sagot, V Lacroix, International Workshop on Algorithms in Bioinformatics (WABI), 2014
PhD manuscript
- From reads to transcripts: de novo methods for the analysis of transcriptome second and third generation sequencing.
Software
- github.com/kamimrcht
- kissplice.prabi.fr
- bioconductor.org/packages/release/bioc/html/kissDE.html
Involvement in research projects
- 2023-.. Full-RNA ANR, Indexing large scale 2nd and 3rd generation RNA datasets
- 2022-.. INSSANE ANR, Novel methods for studying RNA structures
- 2021-.. ALPACA ITN, Methods for pangenomics
- 2019-.. Seqdigger ANR, Indexing large scale genomic samples
- 2018-2022 Transipédia ANR, Indexing large scale RNA-seq datasets
- 2018-.. ASTER ANR, Algorithms for 3rd generation RNA sequencing
- 2017 CNRS MASTODONS project, Correction of 3rd generation sequencing data
- 2015-18 Hydrogen ANR, Indexing datasets for environmental genomics
- 2013-16 Colib’read ANR, De novo methods for the variant calling in short read sequencing
Communication
Research visits
- 2019: (3 months) Iqbal lab, EBI, (Cambridge, UK)
- 2015: (3 weeks) Laboratório Nacional de Computação Científica (LNC, Petrópolis, Brazil) and Universidade de São Paulo (USP, Brazil)
Invited talks to international workshops/conferences
- 2022: Scalable sequence database search using approximate membership data structures, Genome Informatics, Cambridge (United Kingdom)
- 2022: TBA, ALPACA 2nd Annual Workshop, Potsdam (Germany)
- 2018: A de novo approach to disentangle partner identity and function in holobiont systems, Advances techniques to study and exploit the sponge and coral microbiomes Workshop, ULB Brussels (Belgium)
Invited talks to national workshops/conferences
- 2022: Data-structures for querying large k-mer (collections of) sets, JOBIM mini-symposium, Rennes (France)
- 2020: From reads to transcripts: de novo methods for the analysis of transcriptome second and third generation sequencing, SIF congress, INSA de Lyon (France)
- 2020: Scalable data structures for sequencing data, Symposium GDR Madics, Lyon (France) (covid-19 : canceled)
- 2018: CARNAC-LR and C2C: de novo clustering and detection of alternative isoforms in Third Generation Sequencing transcriptomes, Genotoul Biostats/Bioinfo day, INRA Toulouse (France)
- 2017: De novo Clustering of Gene Expressed Variants in Transcriptomic Long Reads Data Sets at workshop RNA-Seq and Nanopore sequencing, Genoscope Evry (France)
Invited talks to seminars
- 2022: Data-structures for querying large k-mer (collections of) sets, KIM Data and Life Sciences seminars, Montpellier (France, online)
- 2022: Data structures for large k-mer sets, MIAT seminar, INRAE Toulouse (France, online)
- 2019: New methodologies for the analysis of transcriptome sequences, MAB team seminar, LIRMM Montpellier (France)
- 2018: From reads to transcripts: de novo methods to analyze transcriptome 2d and 3d generations sequencing data, Roscoff Biological Station seminar (France)
- 2018: A highly scalable data structure for read similarity computation and its application to marine holobionts, EEB group meeting, ULB Brussels (Belgium)
- 2018: CARNAC-LR: clustering genes expressed variants from long read RNA sequencing, team TIBS seminar, LITIS Rouen (France)
- 2017: Rconnector: a resource-frugal probabilistic dictionary and applications in (meta)genomics and transcriptomics, LBBE NGS group seminar, Lyon (France)
Selected talks in workshops/conferences
- 2020: REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets, BIATA, St Petersburg (Russia, virtual conference)
- 2020: REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets, ISMB, Montreal (Canada, virtual conference)
- 2020: REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets, JOBIM, Montpellier (France, virtual conference)
- 2020: REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets, DSB, Rennes (France)
- 2019: Indexing De Bruijn graphs with minimizers, RECOMB-seq, Washington DC (US)
- 2019: Indexing De Bruijn graphs with minimizers, BiATA, St Petersburg (Russia)
- 2019: Survey of k-mer set of sets data structures for querying large collections of sequencing datasets, DSB, Dortmund (Germany)
- 2019: Survey of k-mer set of sets data structures for querying large collections of sequencing datasets, Helsinki Bioinformatics Day (Finland)
- 2019: Read correction for non-uniform coverages, RCAM, Institut Pasteur Paris (France)
- 2019: Survey of k-mer set of sets data structures for querying large collections of sequencing datasets, seqBIM Marne la Vallée (France)
- 2018: BCOOL-Trans: accurate and variant-preserving correction for RNA-seq, Seqbio, Rouen (France)
- 2018: ELECTOR: EvaLuator of Error CorrectionTools for lOng Reads, Seqbio, Rouen (France)
- 2018: CARNAC-LR : Clustering coefficient-based Acquisition of RNA Communities in Long Reads, JOBIM, Marseille (France)
- 2017: De novo Clustering of Gene Expression in Transcriptomic Long Reads Data Sets, Seqbio, Lille (France)
- 2017: A highly scalable data structure for read similarity computation and its application to marine holobionts, RCAM, Paris (France)
- 2017: A scaling transcriptomic approach to study holobiont data sets, 4e colloque Génomique Environnementale, Marseille (France)
- 2016: Rconnector: a resource-frugal probabilistic dictionary and applications in (meta)genomics and transcriptomics, SeqBio, Nantes (France)
- 2015: kissDE : a replicate-wise and annotation-free R package for testing the association between differential variants and experimental conditions in high throughput sequencing data, SeqBio 2015, Orsay (France)
- 2015: kissDE : a package to test for differences in reads counts derived from variants in RNA sequencing data, Quatriemes rencontres R 2015, Lyon (France)
8. Research contracts
As a partner
- 2022-2026
- ANR project INSSANE
- ~330k euros
- collaboration with LIX Polytechnique, CiTCoM Université de Paris and LCBPT Université de Paris
- integration of novel algorithms, custom chemistry, and new probing protocols for studying large structured RNA
- 2022
- MAMMALOC
- collaboration with INSERM Lille and Bilille bioinformatics platform
- 3 months of funded engineer
- proteo-genomics approach to detect novel cancer signatures