This project aims to begin work on the nf-core/proteinannotator pipeline.
Vision
Build the best protein annotator in the world.
Protein fasta -> ??? -> Profit!
- the ??? is
nf-core/proteinannotator - We want to build the pipeline of choice by the people sequencing the genomes of new creatures to annotate protein fasta files with function
- Future options include using synteny of genes, but that is beyond the 1.0.0 release
BEFORE WRITING ANY CODE, we will first draw out the metromap for the pipeline.
Similar pipelines
Below are pipelines that also process protein fasta files and add either functional or structural information to them, but don’t have exactly the same purpose as proteinannotator. We will likely use their modules.
- funcscan to search (meta)genomic nucleotide data for functional protein sequences, e.g. for biosynthetic gene clusters, antimicrobial peptide genes, and antimicrobial resistance genes
- reportho to compare ortholog predictions across methods
- proteinfamilies to cluster protein sequences into families, and updates existing families with new sequences
- proteinfold to fold protein sequences with ESMFold, AlphaFold2
Annotation Tools to Include
Please contribute more tools! This is just a starting point.
- DIAMOND-blastp
- InterProScan
- UniProt’s UniFire — Instructions
- FoldSeek — will require folding protein structures, e.g. with ESMFold2 or AlphaFold2
We welcome contributors of all experience levels.