Supramap is a completely new method of generating and sharing knowledge about evolution and biogeography.
A supramap gives people a quick and easy way to integrate genotypic and phenotypic data in a geospatial context. When viewed in a virtual globe (e.g. Google Earth or NASA WorldWind), the user has an interactive map of the spread of various lineages of organisms (e.g. strains of pathogens) over the earth.
Biologists have powerful new tools for collecting large comparative genomic, phenotypic, and geospatial datasets. However our ability to make sense of these data lies far behind acquisition. New platforms for analyses and visualization of large integrated data such as Supramap are necessary to generate and test hypotheses.
The field of phylogenetics, the analysis of the relationships and evolutionary trajectory of organisms, is fundamental to make sense of phenotypic and genomic data. Because phylogenetics is able to detect not only changes that discriminate among organismal lineages but also changes that have occurred many times in parallel, phylogenetic trees provide significant predictive power. Models on the direction, frequency, order, and reversibility of genotypic and phenotypic changes inferred from phylogenetic analyses are increasingly critical tools for numerous disciplines in biomedicine and public health, and thus are important to national and economic security.
Geographic mapping of evolutionary trees projected into a virtual globe allows users to analyze the spread of the organismal lineages into areas of interest. When all these data are integrated, we can visualize patterns in or to develop and test hypotheses. For example, we have used supramap to combine phylogenetic and virtual globe technologies to pinpoint which strains of a virus are infecting which hosts in specific areas (Janies et al., 2007). Finally, because phylogenetic analysis groups like strains into lineages, information drawn from limited experimentation on one strain in a lineage can be used to predict the properties of another strain in the lineage. This transitive property of phylogenetic inference will help us predict which strains are capable of infecting humans, are pathogenic, and/or are resistant to drugs. These capabilities are valuable to the public health community to make informed decisions on where and how to allocate resources to prepare for emerging diseases.
Email us at email@example.com
How to install POY5 (and make Supramaps offline)
You will need a computer running a recent Linux distribution or Mac OS X. Under OS X, make sure you have the latest Xcode.
Download the latest source code of OCaml and install. Unarchive and compile with the following commands:
./configure make world make opt make opt.opt make install
Download the POY-Supramap source https://code.google.com/p/poy/downloads/
Compile POY and the Supramap plug-in by first configuring the install. Inside the src directory, run the following command:
(parallel, assumes an MPI implementation is installed)
./configure --enable-interface=flat --enable-mpi CC=mpicc CFLAGS="-03 -msse" --enable-long-sequences
./configure --enable-interface=flat --enable-long-sequences
make make install ocamlbuild supramap.cmxs cp _build/plugins/supramap.cmxs path
(path is where you would store the plug-in, you will need this later when running)
A couple of notes:
a. An error about installing man pages might occur, which can be safely ignored.
b. Running 'make install' might require sudo privileges
Run POY with the Supramap plug-in to check if it loads correctly:
poy -plugin path
If you get no errors then everything compiled correctly and is in working order. You can now type
to exit the application. If you get an error, then something didn't compile correctly or there might be a path problem.
Link to Salmonella based supramapsSalmonella
A sequence file is presented in FASTA format (.fas). The file contains sequence data (e.g. nucleotides or amino acids) labeled with taxon names. The sequence data can be prealinged or raw unaligned data. One file can be used for each locus and multiple files can be used. Missing data is tolerated if multiple files are used. See POY documentation for details on how to manage missing data.
The first taxon in the file will be considered the outgroup. The outgroup will be used to root the tree. The choice of the outgroup taxon is up to the user. In the case of temporal series of isolates of pathogens the outgroup is of often the oldest isolate. In natural sciences, the outgroup is often selected because it is outside of the set of interest, termed the ingroup. If the outgroup is related to but not a member of the ingroup then these two groups share a more ancient common ancestor than that shared by the ingroup. Rooting on an ancestor more ancient than the ancestor of the ingroup provides a baseline from which the branching pattern and polarities of changes within the ingroup can be elucidated.
Categorical data files are in TNT (.tnt) format including a header and footer. The file can be used for any phenotypic data (e.g., viral host). The various character states (e.g., human, chimp, swine, avian) should be represented as integers for states zero through nine and then with letters up to 32 states (e.g., 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W).
A comma separated values (.csv) file that contains the geographic and temporal data for each taxon is necessary. The latitude and longitude coordinates must be in decimal format. If you provide dates of isolation, Supramap will build a Keyhole Markup File (KML) that allows you to animate the tree's growth over time in Google Earth ™. Microsoft Excel ™ can output csv format but make sure it's in unix format - a text editor such as BBedit ™or Text Pad ™ can ensure this.
The csv file must have a header followed by the data. The header line should be "strain_name,latitude,longitude" or "strain_name,latitude,longitude,date". If you do not have coordinates for a specific taxon, include a line like "strain_name, 0, 0". If you have problems, make sure that the taxon names in the .fas and the .csv files match exactly in spelling and content.
Temporal data is optional in the format YYYY-MM-DD:
In filling out your geographic and temporal references file, you might use an external data source such as Getty Thesaurus of Geographic Names or in the search field of Google Earth ™ to look up place names and convert them to decimal degrees. If you have a lot of place names to convert we may be able to help automate the process and feel free to contact us.
A tree file contains the evolutionary relationships of the taxa in nested parentheses ending in a semicolon. It is usually generated by a phylogenetic tree search program. For example:
(NC_001608, (EU051644, EU051642));
Annotations in the tree such branch length or bootstrap values are not supported. Which files do I need and how do I organize them? We have created a web interface where users can upload data files, name projects, and organize sets of data files into jobs to be executed on a computing cluster (treated below). As files are uploaded the user is asked to identify what kind of file is being uploaded. A user can keep several files in a project and mix and match them as sets for various jobs. A valid job consists of at least one sequence data file and one and only one file of geographic and temporal references.
These are some valid sets of files for jobs: