How Bioinformatics Instruments Took Genetic Research To The Masses?
Computing biologists start designing user-friendly platforms for the analysis and interpretation of genetic sequence data.
In cases where doctors want to treat patients with no apparent cause of symptoms, gene sequencing technology may help them identify them. However, the huge quantity of data produced will make it difficult to get fast answers.
Until a few years ago, physicians from the US Naval Medical Research Unit 6 (NAMRU-6) in Lima were forced to send their sequence data for review to the United States, a procedure that could take several weeks—all too long to make pressing treatment decisions. “If you could only obtain the information you would then need to ship to the US, this is almost inutile,” says Mariana Leguia, head of the Genomics and Pathogen Discovery Unit of the center.
But Leguia doesn’t have to wait for the analysis; in days or even hours she can obtain results—and she can do it in her own laboratory. Her unit uses EDGE, an instrument for bioinformatics, to hide typical microbial-genomic tasks, such as sequence assembly and species identification, behind a sleek interface that allows users to produce polished analyses. “We can provide actionable on-site information, which allows us to decide how to proceed very quickly,” Leguia says.
EDGE is not the first tool with a point-and-click GUI to simplify IT. Indeed, it lacks a great deal of versatility and breadth from more developed alternatives like the BaseSpace platform of Galaxy and Illumina. However, its simplicity attracts users who otherwise might shy away from bioinformatics. “Those who never had to bother learning command-line tools have used [EDGE],” says Clinton Paden, who uses EDGE at the US Disease Control and Prevention Centers in Atlanta, Georgia, in her work on viral pathogenesis. As such, it is a case study in the democratization of genome informatics — a study that can allow pure biologists to speed up the field.
Computers in the field
The manager of the Los Alamos National Laboratory (LANL) in New Mexico, Patrick Chain, says that EDGE was developed to attempt to balance the fast-growing available low-cost DNA sequencers with the relative insufficiency of data-sense know-how. Joe Anderson, a computer biologist who has developed Military Application Software at the Biological Defense Research Directorate (BDRD) of the Naval Medical Research Center in Frederic, Maryland, says it is planned for use in facilities with little bioinformatics experience.
It is also self-contained, open-source, and offers end-to-end microbial genomics analyzes with a single click of a raw sequence to species recognition and phylogeny. The device is relatively cheap to operate as the recommended configuration of hardware (256 gigabytes of memory and 64 processors) can be purchased for less than $10,000, Anderson says.
This means that most laboratories can afford the hardware to operate sequencing projects. “That’s not cashing money away, but it’s cheap enough,” he says. It also helps that the set-up cannot be powered by a generator without an internet connection.
A framework can be deployed on a cloud network by users with secure network connections. Nicholas Loman, a bioinformatics from the University of Birmingham, UK, points to CLIMB where he contributed to the creation of Cloud Infrastructure for Microbial Bioinformatics. CLIMB is a free service dedicated exclusively to academics working on microbial genomics in the UK.
The UK Medical Research Commission funded CLIMB for £8.4 million (US$10.5 million), which contains many IT instruments, including sequence databases and an analysis workbench known as the Genomics Virtual Laboratory. “I certainly think that EDGE is also a possible option,” says Loman.
In particular, EDGE was officially set up at 18 US Defense Department and partner nation laboratories and on all continents except Antarctica, says Theron Hamilton, head of the Department of Genomics and Bioinformatics at BDRD.
One is at NAMRU-2 in Phnom Penh that uses the device for monitoring vector-borne diseases. “Traditionally, it’s not the kind of place you’d go for bioinformatics,” Anderson says. But EDGE changes that. “One of the things I have known is that if you give [researchers] tools and get out of the way, they’ll surprise you,” says Anderson.
EDGE’s new edition – version 1.5, launched last October – contains 54 tools from third parties. The servers of six inter-locking sections, including algorithms, databases, visualizer, and reference genomes, are all housed: sequence clean up, assembly and annotation, benchmarking, taxonomic identification, evolutionary analysis, and PCR primer design. For the upcoming EDGE 2.0, Chain says other modules, including RNA analysis and pathogen detection, are expected.
“People have used EDGE who would never have bothered learning command-line tools.”
In the past November, Chain and his colleagues demonstrated EDGE’s capacity to compile, identify, and map innovations on a platform in bacterial isolates Bacillus anthracis and Yersinia pestis; disassemble a mock human microbiome; and analyze a variety of human clinical samples, including cases of Ebola virus and Escherichia coli infection1.
But the first reported use of the method typically takes several months before the report. In the study published last June 2, Leguia’s laboratory used EDGE to refine methods for the whole-genome sequencing of the dengue virus.
Users may use a free preview on the LANL server to explore these and other data sets. The program must be installed on their systems by researchers who want to study their own sequences. The code can be freely downloaded from GitHub and there is a Docker container and virtual machine image available, but the installation would probably require an information-technology expert, Chain said. You can tweak the source code and incorporate additional resources and workflows, but this goes beyond the ability of many users, Chain recognizes. A process simplification mechanism is being developed, he said.
Paden, who is a researcher in computer science, says that the simplicity of the method makes computer biology open to researchers who otherwise might be intimidated by the normal bioinformatics tool – the text-based line of control on the computer.
But Titus Brown, a University of California computer scientist, Davis, warns that EDGE’s gain is moderated by deficiencies that could hinder the software’s long-term use. He describes EDGE as a “software opinion” example. “It provides you with a small set of software that is tuned to a specific set of examples, and it provides nice graphical summaries and results.” However, it is not clear how other researchers can develop this method, nor what is going to happen if its funding is destroyed.
The chain says the team opened EDGE to some degree because of questions about potential financing, which also inform future development plans. “Sustainability is a problem on which we must think,” says Chain, “which is why we strive to allow third-party implementers to plug and play their projects much easier, most possibly using Docker.
A galaxy of instruments
EDGE is not the first bio-informatics framework to deliver an easy-to-use GUI. Galaxy, first published3 in 2005, helps researchers to assemble computer pipelines via a large and versatile web-based toolbox provided by free software. By integrating these resources in various ways, users can solve almost any problem they can conceive of.
However, Galaxy can be daunting. And, contrary to EDGE’s graphic representations such as phylogenetic trees or interactive ‘Krona’ plots of taxonomic data, the output of Galaxy tends to take the form of processed files that it takes for users to visualize elsewhere.
Jeremy Leipzig, a software developer at the Department of Biomedicine and Health Informatics at the Philadelphia Children’s Hospital in Pennsylvania, says “Galaxy is more like a kitchen, but there is no dining room. “The system doesn’t really have a way to deliver this output in an attractive way,” he says. “They thought of what the reports should look like with EDGE.”
A biotechnology scientist at the University of Adelaide, Australia, Nathan Watson-Haigh says that EDGE could help to alleviate pressure on overworked biotechnologists. He warns that it is still a complex bioinformatics method, and biologists who are unfamiliar with computing will be wise to consult an expert before putting too much certainty on their findings.
Kathleen Fisch, Interim Director of the Center for Computational Biology and Bioinformatics at the University of California, adds that they must also understand what the algorithms are and how different parameters influence their performance. “It doesn’t mean that you should run the tools simply because you can run them.”
However, with the facilitation of bioinformatics software, computer technology may lose its aura of complexity. And this could lead to broader adoption — and democratization for biologists.