Overview Of Bioinformatics: Methods, Impact, Analysis & Applications
The human genome project and the sequencing projects in other species have provided an unparalleled wealth of biological data. The enormous demand for analysis and understanding of this knowledge is handled by emerging bioinformatics research.
Bioinformatics is characterized as the use of computational and analytical tools for the collection and analysis of biological data. It is an interdisciplinary field that uses informatics, mathematics, physics, and biology. Bioinformatics is important for modern biology and medical data management.
This article outlines the key methods of the bioinformatician and explores how the biological data are interpreted and the disease is better understood. The future therapeutic uses of these data are also explored in drug discovery and development.
- Bioinformatics is the use of computing and research methods to collect and interpret biological data.
- Bioinformatics is important for modern biology and medical data management.
- The toolbox contains software programs like BLAST and Ensembl, depending on the internet accessible.
- The analysis of genome sequence data, particularly human genome analysis, is one of the most important bioinformatics achievements to date.
- Bioinformatics opportunities include its potential contribution to the functional understanding of the human genome, leading to an improved finding of drug targets and individualized therapy.
This paper is based on personal experience in bioinformatics and selected papers on Nature Genetics, Nature Genetics Reviews, Nature Medicine and Research, recent issues. The search of applicable papers in peer-reviewed scientifical literature was performed with keywords such as bioinformatics, comparative and functional genomics, proteomics, microarray, disease, and medicine.
The effect of bioinformatics on genomics
Last year it was said that as a result of the efforts of the global human genome project and a private genome corporation, the entire human genome was mapped.
In recent years, however, the scientific community has observed several other species completing complete genome sequences. A key accomplishment in bioinformatics is the study of evolving genomic sequence data and the human genome project.
A novel method was used to sequence the genome of Haemophilus influenzae by random sequencing the entire genome (the so-called “shotgun” technique) in 1995.
This was the first complete genome to be sequenced for any free-living organism. Other bacterial genomes were sequenced shortly after, including Mycoplasma genitalium and Mycobacterium tuberculosis, and the Yersinia pestis plague bacterium sequencing was recently completed.
The sequence and annotation of Saccharomyces cerevisiae (a yeast) was the first eucaryotic genome, accompanied by the sequence and annotation of other eukaryotic organisms including Caenorhabditis Elegans (a worm), Drosophila melanogaster (a fly), and Arabdope Thaliana (mustard weed). Sequences of many other animals, including zebrafish, mouse, rat, and non-human primates, are either underway or are about to be completed by private and public sequence initiatives.
The knowledge gained from these sequence data would have important consequences for our understanding of biology and medicine. Based on comparative genomic and proteomic studies, we will soon be able to locate and understand each human gene entirely.
Useful bioinformatic websites (available freely on the internet)
- National Center for Biotechnology Information (www.ncbi.nlm.nih.gov)—maintains bioinformatic tools and databases
- National Center for Genome Resources (www.ncgr.org/)—links scientists to bioinformatics solutions by collaborations, data, and software development
- Genbank (www.ncbi.nlm.nih.gov/Genbank)—stores and archives DNA sequences from both large scale genome projects and individual laboratories
- Unigene (www.ncbi.nlm.nih.gov/UniGene)—gene sequence collection containing data on the map location of genes in chromosomes
- European Bioinformatics Institute (www.ebi.ac.uk)—center for research and services in bioinformatics; manages databases of biological data
- Ensembl (www.ensembl.org)—automatic annotation database on genomes
- BioInform (www.bioinform.com)—global bioinformatics news service
- SWISS-PROT (www.expasy.org/sprot/)—important protein database with sequence data from all organisms, which has a high level of annotation (includes function, structure, and variations) and is minimally redundant (few duplicate copies)
- International Society for Computational Biology (www.iscb.org/)—aims to advance scientific understanding of living systems through computation; has useful bioinformatic links
Tools for bioinformatics
Computer software and the Internet are the key instruments of a bioinformatician. A fundamental task is the sequence analysis of DNA and proteins using different web-based programs and databases. Anyone with access to the Internet and related websites can discover the composition of biological molecules such as nuclear acids and proteins freely via simple bioinformatic methods, from physicians to molecular biologists.
This does not mean that everyone can easily manage and interpret raw genomic data. Bioinformatics is an emerging discipline. Experts from the bioinformatics field today use sophisticated software programs to capture, sort, analyze, predict, and store data on DNA and protein sequences.
Large business entities such as pharmaceutical firms are hiring bioinformaticians to meet and sustain these industries’ large and complicated bioinformatic needs. With the growing need for constant contributions from bioinformatics experts, most biomedical laboratories will soon have their own in-house bioinformatics specialists.
The individual researcher will definitely require external bioinformatics advice for any complicated study apart from the basic acquisition and analysis of simple data.
Bioinformatics has developed globally, developing computer networks that allow easy access to biological data and developing software programs for easy analysis. The entire scientific community is openly able to access numerous international programs aimed at supplying gene and protein databases via the internet.
Study of bioinformatics
The growing amount of genome data demanded computer databases with quick assimilation, useable formats and algorithm software to effectively handle biological data.
Due to the diversity of evolving data, there is no single comprehensive database to access all this information. There are, however, an increasing array of databases providing valuable knowledge for clinicians and researchers.
Most of these databases are open to academics but some sites require a subscription and industrial users pay a license fee for specialized sites. Examples vary from sites that include detailed clinical condition definitions, list genetic mutations, and polymorphisms that are prone to disease to the quest of DNA genes (box).
These databases comprise both “public” genetic data repositories and repositories established by private companies. The best way to find databases is to check the resources and databases for bioinformatics in some of the popular search engines. Another approach to classify bioinformatic sources is by providing database links and searchable indexes in one of the major public databases.
The Entrez browser offers an automated database retrieval system for combining DNA and protein sequence databases, for example, the National Center for Biotechnology Information (www.ncbi.nlm.nih.gov). In genome studies of all species, the European Bioinformatics Institute archives gene and protein records, while Ensemble produces and maintains automated annotations of eukaryotic genomes. The quality and reliability of databases vary; some of the better-known and proven databases, such as the above, are undoubtedly superior to others.
BLAST (basic local alignment search tool on www.ncbi.nlm.nih.gov/BLAST/) was one of the simplest and best-known search tools. This algorithm program searches for genes with the same nucleotide structure and comparing the unknown DNA or amino acid sequence with hundreds or thousands of human or other organism sequences to match.
Databases with recognized sequences are therefore used for the detection of related sequences, which may be query sequence equivalents. Homology means that sequences may be connected by divergent opinions from a common ancestor or share common functional aspects. If a database is searched with a newly defined sequence (quest sequence), local alignment takes place between the query sequence and any related sequence.
The product of the search is sorted on the basis of maximal similarity in order of priority. The highest score sequence in the database of identified genes is the homolog. If there are homologs or associated molecules for a query sequence, a newly identified protein can be modeled, and the gene product can be predicted without further laboratory experiments.
Since the first draft of the human genome was completed, the focus has moved from genes to gene products. Functional genomics assigns genomic knowledge of functional significance. It consists of studying genes, their resulting proteins, and the function of proteins.
Biological data processing and interpretation recognize knowledge not only at the genome level but also at the proteome level and the transcriptome. Proteomics is the study of a cell’s total number of proteins (proteomics), and transcriptomics refers to the analysis of the RNA transcripts produced by a cell transcript (transcriptome). The microarray technology for DNA specifies the level of gene expression and involves genotyping and gene sequence.
Gene expression arrays allow simultaneous analysis in benign and malignant tumors such as keloid and melanoma of the messenger RNA levels of the thousands of genes. Expression profiles categorize tumors and provide possible therapeutic goals.
Study on bioinformatics proteins builds on annotated databases of protein and two-dimensional electrophoresis. The next challenge in bioinformatics after the isolation, recognition, and characterization of a protein is the prediction of its structure. Structural biologists often employ bioinformatic technology to control the large, complex data in three-dimensional molecular models from x-ray crystallography, nuclear magnetic resonance, and electron microscopy study.
Other bioinformatics applications
In addition to the analyzes of genome sequence data, bioinformatics is also being used for an extensive array of other major tasks, including gene variation and gene expression analysis and prediction of gene and protein structure and function, gene regulation network prediction and detection, simulation environments for whole-cell modeling, complex gene regulatory dynamics modeling and network modeling.
While, on a lesser scale, simpler bioinformatics tasks of the clinical researcher can vary from design of primers to predict the role of gene products (short oligonucleotide sequences required by DNA amplification in polymerase chain reaction experiments).
Bioinformatics clinical application
In the immediate, short, and long term, the clinical applications of bioinformatics can be viewed. The human genome project plans to finish the human sequence by 2003 and create a database of all sequence variants that all of us vary. For example, a full list of human gene products may provide new treatments and gene therapy may become the routine for individual gene disorders (www.ornl.gov/hgmis/medicine/tnty.html).