What is a genome?
Your genome is the instructions for making and maintaining you. It is written in a chemical code called DNA. All living things have a genome; plants, bacteria, viruses and animals.
Your genome is all 3.2 billion letters of your DNA. It contains around 20,000 genes. Genes are the instructions for making the proteins our bodies are built of – from the keratin in hair and fingernails to the antibody proteins that fight infection.
Genes make up about 1-5% of your genome. The rest of the DNA, between the genes, used to be called ‘junk’ DNA. It wasn’t thought to be important. But we now know that DNA between genes is important for regulating the genes and the genome. For example, it can switch genes on and off at the right time. There is still much more to learn about what it all does.
What is a genome and how does genome sequencing work? Found out in this short film, courtesy of Great Ormond Street Hospital.
What is DNA?
DNA (deoxyribose nucleic acid) is a long molecule. It has a twisted, double helix shape. DNA is made up of four different chemicals, or bases. These are represented by the letters A, T, C and G. The bases are attached to two phosphate backbones. The bases are paired together; A with T, G with C. The two backbones twist around each other to give the characteristic double helix.
As well as being helix shaped, DNA is tightly packed so it takes up less space. If you stretched the DNA in one cell all the way out, it would be about 2m long.
What is genome sequencing?
Sequencing is a technique that is used to ‘read’ DNA. It finds the order of the letters of DNA (A, T, C and G), one by one.
Sequencing a human genome means finding the sequence of someone’s unique 3 billion letters of DNA. There are different machines and methods that can sequence DNA genomes. In the 100,000 genomes project, DNA is sequenced by our partners at Illumina. It takes about a day to sequence the genome, although the analysis of all that data takes a lot longer. DNA machines don’t sequence DNA all in one go – instead they sequence it in short pieces, around 150 letters long, with each sequence being known as a “read”.
Mapping
The reads from the sequencing machine are matched to a ‘reference genome sequence’. This is done by ‘mapping’ software on high performance computers. The software finds where each read belongs on the genome.
The reference sequence is used by scientists world-wide. It is a representative example of a human genome sequence. It is made up of DNA sequences from 13 anonymous donors, so is not any single person. The reference sequence was the result of the original human genome project, which finished in 2001.
The position of most of our genes is known, and is shown on the reference sequence. The next step is to identify the differences between your genome and the reference.
Analysis
Every person has millions of differences to the reference sequence. The differences are called ‘variants’. These might be a single letter. Or a string of letters may be in a different place or missing. Most of the differences are completely harmless – they are the reason we are different from each other.
Some differences could be causing a disease. Scientists use a range of software to filter millions of differences down to just a few that could be harmful.
Any change that is likely to be the cause of someone’s symptoms or disease is given back to the NHS. They then confirm the result in their laboratories. The findings and any implications are then discussed with the patient.
If it is not clear that a change is causing disease, it is sent to researchers for further analysis.
Bioinformatics
Bioinformatics is the science of collecting and analysing complex biological data, such as genomic data.
Bioinformaticians are scientists who specialise in analysing genomic or other biological data. They develop methods and software tools to understand and interpret genomic data. They may have studied biology, engineering, computing or maths, and have training in bioinformatics.
Why sequence a genome?
Learning more about genomes can help us to identify the cause of genetic diseases.
Some rare diseases are caused by as little as a single change (variant), like a spelling mistake, in someone’s DNA. Looking at the genome of a person affected by a rare disease can help find which DNA changes might be causing the problem.
In cancer, the tumour cells have developed a different genome to the healthy cells. Comparing the normal and cancer genomes may give clues about ways to treat the cancer.
For some patients, knowing more about their genome may mean that a particular treatment can be recommended.
When the genome sequences of patients with the same condition are compared, it is possible to see patterns. These patterns can be put together with health information. Once this is done we may be able to link particular patterns with whether people are likely to become ill and, if so, how severe their illness is likely to be.