Linux For Biologists

Working in a Linux environment and handling sequencing data

Synopsis

As the volume of bioinformatic data rapidly increases the need for skills handling and manipulating these outputs becomes more and more essential for biologists. The majority of software and pipelines run exclusively in a linux (or unix/mac) environment and having the ability to comfortably work in this ecosystem greatly improves the speed of data analysis and reduces errors and issues.

Objectives

This course is designed to provide you with the knowledge to work and understand running command line software in the Linux environment, including the powerful abilities it has to assist your research. Being able to create and manipulate data this way will give you the skills required to take on much more complex bioinformatics and data analysis!

All examples will use real biological data to perform common bioinformatic processes e.g. subsampling nucleotide sequence files, searching a genome for a sequence motif, or calculating gene lengths etc.

Instructor: Dr. Daniel Pass
Class size: 10-12 people
All days run 9:00-17:00 with 1h break

Schedule

Day1

  • What is Linux anyway?!
  • Connecting to remote servers
  • Navigating the command line and Creating, Moving, & copying files
    Lunch
  • Working with textfiles and running scripts
  • Searching and manipulating files: Grep, Sed, & Awk, and regular expressions
  • Review and practice exercises

Day2

  • Exploring and manipulating DNA/RNA sequence files – FastQC & fastP
  • Using loops for high throughput analysis
    Lunch
  • Loops and sequence data! Review and practice exercises
  • Installing packages and using docker
  • Final Q&A, practice, review, and personal data queries