Thursday, May 20, 2010

UniFrac :: Setup :: Execution

Download source code from
http://bmf2.colorado.edu/unifrac/about.psp

$ unzip  unifrac.zip
$ cd unifrac.zip

read README.txt
Requirements
- Python
- Python module Numeric
To check if Numeric module is present
$ python
Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import Numeric
>>>
- if the above return error, then
$ sudo apt-get install python-numeric-ext-dbg
- i tried installing only the python-numeric but it gave a error saying that MLab module is missing.
- so i used the above installation set

Steps
>>> from set import Set
- there is some deprecated code, so use this line
>>> from tree_comparison_api import TreeAnalysis
- To start with the analysis
>>> a=TreeAnalysis()
- create object a
>>> print a.__doc__
- lists all the methods along with the parameters required.
session for doing an analysis on a tree
   
Class: TreeAnalysis(interactive=True)

    Usage: session = TreeAnalysis()
        Starts a new session called 'session'.
        Interactive indicates whether the session will be run on the command
            line (True) or from a script (False). By default is True.

    Properties:
        Tree: tree on which all called stats functions are run
        LastSession: saves the current TreeAnalysis object whenever
            the tree is changed
        DataInfo: maps the name of a node (node.Data) to the node object
        Output: records the results of analyses done during a session as a list
        SessionLog: records the steps performed during a session as a list

    Methods:

    To get detailed info on any method type:
    print TreeAnalysis.method_name.__doc__


    ***Loading a Tree:***
   
    loadTreeFromFile(self, file_path, format, tree_name=None)
            loads a tree from a Nexus or Newick from Arb formatted file
    loadTreeFromString(self, input)
            loads a tree from a string or list of strings


    ***Setting the Branch Lengths:***

    setBlFromFile(self, file_path, format='NexLog')
            sets node BranchLengths from info in a file
    setBlToValue(self, value=1.0)
            sets the BranchLengths property of all nodes to value


    ***Loading Environment Information***

    loadEnvs(self, format, file_path=None):
            sets Envs property of nodes


    ***Modifying Tree Contents (taxa represented by tree)***

    pruneTree(self):
        removes terminal descendants that have no environment assigned
    removeNodeByName(name):
        removes a node from the tree identified by name (Data property)
    removeNodeByLCA(self, first_name, second_name):
        removes a node identified as Last Common Ancestor of 2 nodes
    isolateNodeByName(self, name):
        sets the Tree to a node identified by name
    isolateNodeByLCA(self, first_name, second_name):
        sets the Tree to the Last Common Ancestor of 2 nodes
    restoreLastSession(self):
        sets the current TreeAnalysis session to LastSession


    ***Modify Node Names (Data property of nodes in a tree)***

    nameUnnamedNodes(self):
        assigns an arbitrary name to unnamed nodes
    addEnvsToName(self):
        appends the envs to the node name (Data property of node)


    ***Statistical analysis***

    phylogeneticTestP(self, p_output='Tree', pop_size):
        generates p-values for the Tree using the Phylogenetic (P) test
    uniFracP(self, p_output='Tree', pop_size=1000, Weight=False):
        generates p-values for the Tree using the UniFrac metric
    makeEnvDistanceMatrix(self, Weight=False, pad=None, norm=False):
        generates a Distance Matrix for environments in Tree using UniFrac
    clusterEnvs(self, Weight=False, norm=False):
        uses UPGMA to cluster the environments in a tree based on UniFrac
    UniFracPCA(self, Weight=False, norm=False):
        performs principal coordinates analysis on self.Tree using UniFrac
    PD_rarefaction(self, file_path_base='PD_rarefaction_output', env_list=None
        num_reps = 50, stride=1)
        Creates PD rarefaction curves
    G_rarefaction(self, file_path_base='G_rarefaction_output', env_list=None
        num_reps = 50, stride=1)
        Creates G rarefaction curves
       
The exact python code depends on the type of UniFrac analyses that you want to do,  ie;.e UPGMA clustering, PCoA, Pairwise significance testing, whether you want to do  a weighted or unweighted analysis, etc. It also depends on things like the format of  your tree file (it can be in Newick or Nexus format). An example of a list of python  commands that one might type in to do an analysis, however, is below:  
#load the necessary code 
>>> from tree_comparison_api import TreeAnalysis

#make an UniFrac session object
>>>  a = TreeAnalysis()

#load a tree file in Newick format 
>>>  a.loadTreeFromFile('path_to_tree_file.tree', 'NwA') 

#load Environment information that has no abundance information 
>>> a.loadEnvs('TabDel', 'env_file.txt') 

#run unweighted UniFrac pairwise significance test 
>>> a.uniFracP(p_output='Pairwise', Weight=False, pop_size=1000)

No comments:

Post a Comment