How can we compute a partitioned topology on LONI?
Objectives
Stage One: Data management (10 pts)
Initiailize a new Git repo for your final project.
Populate it with the normal directories we put in these directories. Add the hybrid assembly file, and the Illumina reads (concatenated fastq is fine). Do not commit genomic files to git, unless told to.
Each of the below segments asks for specific files to be committed. Do not do one commit of the whole group. Commit each file as noted.
Stage Two: Mapping reads to the assembly (10 pts)
Install: minimap in your work directory. Minimap is bwa for long-read assemblies.
Create a script to run minimap. It must:
Run on a workq node
Have a walltime limit of six hours
Produce an output file called minimap.out
Run minimap. The command is as follows:
/work/UserName/minimap2-2.10_x64-linux/minimap2 -ax sr -t 20 Path/To/Hybrid_assembly.fasta Path/to/IlluminaReads > aln.sam
This will take walltime=05:47:10 or so.
Commit the minimap script and the minimap.out file.
Stage Three: Sam to Bam Conversion
Convert the sam file to bam using samtools view. Sort with sambamba sort, and index with samtools index
These steps should be performed on the workq
Commit your script, and the sam.out file
walltime=00:59:09
Stage Four: Improving the assembly
Create a pilon script to run on the indexed bam file, using the hybrid assembly as the reference.
Decide if you would like to run pilon on the default settings, or if there are specific improvements you’d like to try (Note: there is no right answer)
Briefly justify the choice you made in (2).
Run pilon on a bigmem node. Request 24 cores.
Add and commit the run script and the .out file.
Walltime: 02:40:38
Stage Five: Quantifying assembly metrics
Create a BUSCO script for each assembly, the raw Hybrid_assembly and the one improved by Pilon.
Run busco. Workq will be fine.
Have a look at the metrics - were you able to recover any more complete genes? Did improving the assembly with Pilon make a difference?
Add and commit the BUSCO scripts, and the summary gene table of each.
wall_hours=2.90 per run
Have it done May 11th. Add and commit it, and email me a link.