data data everywhere, but not a web to analyze

For my dissertation research, I have been working on collecting any and all available datasets of ecological networks. My main interest has been on food webs, but I have also been searching for other network types (plant/pollinator, plant/seed disperser, parasite/host, etc). As a result, I have found datasets in a variety of places; in large databases like the Interaction Web Database (IWDB), Ecological Archives, or Dryad,  from personal correspondence with Jennifer Dunne of the PEaCE Lab, or from author’s websites like Robert Ulanowicz’s.

Let me first say, that it was very easy for me to find/get these datasets and I am very appreciative of the people who have made them available. But I also have to say that it is frustrating, as anyone who has taken a look at Twitter’s #otherpeoplesdata may know, dealing with other people’s data. When you are dealing with data from multiple sources there is no set standard for how those data are stored. With network data you typically get one of two forms (other than .xls vs .csv): (1) an adjacency matrix with row/column names that are either node identities or numbers, or (2) an edge list where the first column is the predator and the second is the prey.

The good part is that R can read in and understand both of these forms, which is nice. The bad part is that there are some issues with strangely formatted “species names” in the IWDB data which is a matrix, and the PEaCE lab data (edge list) has 3 columns instead of two, which presented its own challenges (although I was able to use R to reformat the data into a more directly usable form, and I may post on that later). Then there is always dealing with data that are specifically formatted for a type of program, such as the data provided by Ulanowicz which is formatted as SCOR for use with the particular ecosystem network analysis programs he uses (although hooray for R and enaR package for being able to actually read that format, i just had to go through and re-save each network as its own separate file since it is provided in one .txt file). So basically it is small idiosyncrasies with the data that can make using it a little more difficult, and I would think that in today’s era of open and reproducible science you would want your data to be as easy to use/understand as possible.

Now I want to mention two in progress efforts to make this type of data acquisition as painless as possible. The first is an effort by Tim Poisot who introduced mangal (first here, then examples here, here, and here), a web based means to store network data in a new and imaginative way. He has also, conveniently, released an R package to interface with mangal via an API, currently available at GitHub.

I have not yet been able to play around with Tim’s rmangal package, although I can definitely see that there would be a huge benefit to being able to directly interact with the web API through R. I can imagine that for someone like me it would make my workflow much smoother and definitely make reproducibility simpler. Having everything from data downloading to analysis in one script would simplify things as well. Maybe I should stop making excuses and try it out already. My only question is how much data is available through mangal, and whether those wonderful people who put food web/ecological network data together are going to adopt this formatting. As it stands it is pretty easy to put together a csv file that is either an adjacency matrix, or an edgelist. From what I have read of Tim’s mangal database it seems he wants to put the data in a much more flexible and informative framework (as a theoretician I am all for this)

The second effort is a recently released website/database from the Bascompte Lab, the Web of Life. Bascompte’s Web of Life currently has 89 mutualistic networks, 59 are plant-pollinator (20 of those are weighted, the rest are binary), and the remaining 30 are seed dispersal (with 12 weighted networks). I really like the visualization of this website. The homepage is a network with the options for the different types as nodes around a hub displaying the site name. When you go to the network page you get a map of Earth and each network of the selected type is represented as a point on the map. That way you can get a quick idea of where these data are coming from. You can also subset the available networks by type (e.g., pollinator vs. seed dispersal), weighted vs. binary, number of species, and number of interactions.

Once you have selected the subset of available networks that you want to download, there is a convenient button that allows you to download all of your selected networks into a single folder. Most importantly you can get your data in one of three forms: csv, Excel, and Pajek. For each of these you can choose whether or not to include species names. I found this useful because csv adjacency matrices are very easy to work with, especially when they are standardized in format.

I am definitely looking forward to seeing how these two projects develop and grow in the future. While I do think that both of these projects are very promising, I wonder whether they will be able to get those who have data on board with it, and who will be more successful at doing so.

Advertisements
Posted in Question, Research | Tagged , , , , , , | 3 Comments

Random walks in R

For some reason I thought it would be an interesting exercise in my programming skills to try and write code to generate random walks in a two-dimensional plane. Turns out this was not super hard, so I wound up having some fun making the code relatively nice.

Below is the function that I made to model a random walk for a specified number of individuals (the argument n.org) and a specified number of time steps (the argument steps). The additional arguments specify the probability of an individual moving left (or p(right) = 1-p(left)) or the probability of moving up (where p(down) = 1-p(up)). The last argument I included to show a plot of all individuals’ paths through time.


random_walk <- function(n.org, steps, left.p = .5, up.p = .5, plot = TRUE){

require(ggplot2)

whereto <- matrix(ncol = 2)

for(x in 1:n.org){
walker <- matrix(c(0,0), nrow = steps+1, ncol = 2, byrow = T)

for(i in 1:steps){
# left/right = 1/0
horizontal <- rbinom(1, 1, left.p)

# distance 2
h.dist <- abs(rnorm(1, 0, 1))

# Horizontal Movement
if(horizontal == 0){
walker[i+1,1] <- walker[i,1] + h.dist
}
if(horizontal == 1){
walker[i+1,1] <- walker[i,1] - h.dist
}

# up/down = 1/0
vertical <- rbinom(1, 1, up.p)

#distance 2
v.dist <- abs(rnorm(1, 0, 1))

# Vertical Movement
if(vertical == 1){
walker[i+1,2] <- walker[i,2] + v.dist
}
if(vertical == 0){
walker[i+1,2] <- walker[i,2] - v.dist
}
}

whereto <- rbind(whereto, walker)
}

id <- rep(1:n.org, each = 1001)
colnames(whereto) <- c("x" , "y")
whereto <- as.data.frame(whereto)
whereto <- cbind(whereto[2:nrow(whereto),], org = factor(id))

if(plot){
require(ggplot2)
p <- ggplot(whereto, aes(x = x, y = y, colour = org))
p <- p + geom_path()
print(p)
}

return(whereto)
}

rw.test <- random_walk(1, 1000, .5, .5)

Running this code gives a 3 column data frame with steps*n.org rows. Each row of the data frame gives the (x, y) position of a given individual (individual number is in the third column). When plot = TRUE, the function will also print out a plot of the path that is taken by each individual (color coded by individual). Note that the plotting part of this function does require ggplot2.

random-walk-2

Each individual in this case starts at (0, 0) and follows a random walk from there.

randowalk2

I have also been playing around with the animation package, so the above gif was created using the saveGIF function and just iteratively creating images with n rows of the data frame. The code I used to generate this is:


require(animation)
ani.options(interval = .25)
saveGIF(
{for(i in seq(2,1001,1)){
gp <- ggplot(rw.test, aes(x=x, y=y, col = org)) + geom_path(alpha = .5)
gp <- gp + geom_point(x=rw.test[i,1], y=rw.test[i,2], col = "red")
print(gp)
}},
movie.name = "randowalk.gif", interval = .25, nmax =1000, ani.width = 600, ani.height = 600,
outdir = getwd()
)

It was not super hard to write, but it did take me a while before things worked properly.


rw.test <- random_walk(10, 1000, .5, .5)
ind10.1 <- rw.test
rw.test <- random_walk(10, 1000, .5, .5)
ind10.2 <- rw.test
rw.test <- random_walk(10, 1000, .5, .5)
ind10.3 <- rw.test
rw.test <- random_walk(10, 1000, .5, .5)
ind10.4 <- rw.test

ind10.1 <- cbind(ind10.1, group = factor("gr1"))
ind10.2 <- cbind(ind10.2, group = factor("gr2"))
ind10.3 <- cbind(ind10.3, group = factor("gr3"))
ind10.4 <- cbind(ind10.4, group = factor("gr4"))

fourgrp <- rbind(ind10.1, ind10.2, ind10.3, ind10.4)

ggplot(fourgrp, aes(x=x, y=y, col = org)) + geom_path() +facet_wrap(facets = ~group, nrow = 2, ncol = 2)

The above code will generate four examples of the random walk with 10 individuals each. The plot then looks like:

Four runs of the random walk with 10 individuals each

Four runs of the random walk with 10 individuals each

Messing around with some of the animation functions in R (the animation package) again you can watch multiple individual randomly walk around the plane…

randowalk-multi

Well now I have committed to trying to do more with this, so I will be posting more about this kind of random walk stuff in the future. My plan is to (1) make the code better, more modular and more versatile, and (2) build in interactions between individuals. I think that (1) is important because it is how I become a better programmer, but the second part of the plan I find most interesting because species interactions is what I study, and I would like to find a way to incorporate this kind of thing into my research.

 

Posted in Coding, Research | Tagged , , , | 4 Comments

#swc Reflecting on my first bootcamp

This week, I was asked to help out with one of Software Carpentry‘s bootcamps at NYU. If you are a twitter follower I encourage you to check out the stream of #swc tweets that went out over the past couple of days.

I had heard of Software Carpentry (SWC) before, and I had used some of their tutorials listed on their site. As a side note, if you are interested in learning more about programming, SWC has a great library of tutorial videos on a range of subjects, from the basics of the shell and version control to regular expressions and program design. What is great about these videos is that you can either watch them (with audio and video) or you can read through with slide images and a script. I think this is great for self directed learning.

As a helper, I was supposed to help the people taking the bootcamp with any issues/troubleshooting. I was admittedly nervous about this, because although I was helping out in the R room, there are a number of other topics covered during the workshop. The SWC bootcamps typically cover four main areas of programming skill: the shell, Python/R, Git and GitHub, and SQL. While I am fairly comfortable with programming in R and using GitHub (I typically use the GUI), my only experience in the shell is when I learned the basics of navigating through directories from one of the SWC tutorials and I had never used SQL before.

When I relayed my concerns over how useful I would be as a helper to the two instructors, they reminded me that some experience is typically better than none. Also that my experience with programming has lead to my being able to work through fixing problems faster than people who are new to it.

Overall I felt like the SWC bootcamp was a great experience. Based on some of the feedback that the instructors got, it seems as if most of the participants also found the material both helpful and interesting. Most everyone learned something new, even if it was only about how to structure code in an organized and logical manner.

I will admit that I was a bit surprised by the content of the bootcamp. My expectation going in was that there was going to be a bit more actual coding in R than there was. In all we wound up spending about 3 hours (quarter of the total time) actually doing things in R. About equal time was then split between shell, git, and sql. Even though it was different than my expectation (and I was a little disappointed because I enjoy R coding so much) I thought that the workshop was wonderful. Everyone came away with at least some understanding of how to do things in the shell, a base knowledge that you can make files and run scripts all in the shell, an account with GitHub, ow to initialize and use git repos, understanding of data structures and manipulation of data in SQLite database manager, and how to run queries on your data to select subsets.

During the R portion of the workshop, participants were taught how to approach problem solving as a programmer. I think one of the good things that was done in this bootcamp was that creating custom functions was at the forefront of the lesson. In addition, the instructor showed how organizing their functions in a modular manner, building functions that complete small tasks and then wrapping these functions into another larger function,  makes life easier. I thought that it was an especially useful introduction, especially since the group of participants tended to skew towards the novice level of programming knowledge.

Also, while I feel like they could have covered a bit more of what can be done with the R programming language, the instructors were able to use their course materials as examples in the git/GitHub portion of the bootcamp, so not only do all the participants have access to the (open source) teaching materials on the GitHub page, but they have a cloned copy of the repository locally on each of their machines. With the skills that they were taught in the two-day workshop they should now feel comfortable enough with the material to continue learning on their own. Looking through the material provided by the instructors there is enough there such that anyone who wants to learn more about programming in R can get quite far on their own.

I really liked being a part of the SWC bootcamp, and all the people involved were great. I am definitely going to have to be on the lookout for more opportunities to become a part of the organization.

Posted in LeaRning R, Software Carpentry | Tagged , , , , | 5 Comments

New section on the blog

Due to the popularity of my posts about the basics of networks using R and the igraph package (and some others), I decided it may be a good idea to post them all in one location (as their own page). Plus this provides me the opportunity to do more tutorial type posts and keep them centralized. As I learn new tools and techniques I will probably do more posts in this style to both help myself learn and practice as well as hopefully helping other people in the same position.

Network basics with R and igraph

Also, more is on the way. I have not posted in a while but I am working on building more content so I can have more frequent posting in the future.

Posted in Other | Tagged , , , | Leave a comment

The tangled disentangled tangle

In the most recent issue of Nature (Aug 22) there was a brief communication about a past article by James et al. (2012) entitled “Disentangling nestedness from models of ecological complexity.” The communication came from Saavedra and Stouffer who offered a criticism of the James et al paper based on their choice of randomization techniques. Included with the communication is a reply by James et al.

A nested network has interactions where specialists utilize a subset of the resources used by generalists.

A nested network has interactions where specialists utilize a subset of the resources used by generalists.

For those who don’t know, nestedness here is referring to the statistical pattern of ecological networks (of mutualisms in this case) where more specialist species utilize a subset of resources used by more generalist species. There are a number of ways to measure nestedness, given a binary interaction matrix. In the bipartite package in R I believe there are 5 or 6 measures that a user can implement (two nestedness temperatures, two versions of NODF, and I think one or two others that I cannot think of right now). There has been a lot of uncertainty surrounding nestedness in mutualistic networks in the literature over the past few years. Some suggest that nestedness increases stability, while others argue the opposite. Additionally, there are questions over how best to measure nestedness, not only with respect to which metric to use, but how to know whether the measure you have used is significant or not. By far the most agreed upon statistical method is the comparison to a null model, but of course the question then becomes which null model. When it comes to network level metrics there are a number of possible null models which vary in both implementation and degree of conservative-ness (meaning it takes more to be significant with a conservative null model). Some examples are comparison to a purely random network, and link swapping techniques.

Disentangling nestedness

First I should set the stage and summarize the findings of the James et al. paper. They used 59 empirically described plant-pollinator networks as their data set. James et al. then parameterized dynamic models incorporating mutualisms, intra, and inter specific competition, and intrinsic growth. They measure persistence as the proportion of species surviving at equilibrium. They attempt as well to isolate the dynamic impact of mutualism by “shutting off” mutualistic interactions, and running their model with the plant and pollinators as two separate groups (with only competition and intrinsic growth). Their results compare the change in persistence between runs of the model with vs without mutualisms, and against runs of random networks. They show first that mutualism decreases persistence of the community, but more importantly they show that there is no correlation between the degree of nestedness in a community and the change in persistence. Ostensibly if nestedness increased persistence we would expect that more nested networks should have a lower decline in persistence than non-nested networks.

Disentangling “disentangling nestedness”

Saavedra and Stouffer (hereafter SS), in their communication assert that the lack of correlation between nestedness and persistence in comparing against random network is an artifact of the methods used by James et al. SS suggest that the random networks used by James and colleagues altered the number of interactions for each species, the degree distribution, leading to more homogenous random networks. When they repeat the analysis controlling for the degree distribution either statistically or through null model choice, they demonstrate a positive correlation between the degree of nestedness and the persistence of the community.

Tangling “disentangling ‘disentangling nestedness'”

James et al. offered a reply to SS’s communication, reasserting their claim that nestedness does not have an impact on persistence. They redid their analysis, this time using the swap algorithm as a null model for randomization (preserving degree distribution, suggested by SS). Then they plot the change in nestedness from real to random networks against the change in persistence from real to random networks. The result is basically a cloud of points without any clear correlation.  Presumably we would expect, given the findings of SS, that a negative change in nestedness should lead to a negative change in persistence, and vice versa.

So, the question of the day then must be: who got it right, and who is wrong? These are clearly contradictory findings, although it is difficult to compare the results directly because the authors of the original paper and SS chose to illustrate and report their results differently. I wonder if small variations in methods, e.g., choice of parameters or number of randomizations, play any role (although I doubt it considering the apparent differences).  I also am curious why SS chose only to address part of the findings of James et al. and not both the randomization methods, and the comparison with competition only models. James et al. used both methods to test their hypothesis. Certainly if one method is flawed it takes away some of the weight behind the findings of the other method, but the result is still the result. Maybe it will take a detailed analysis of both of their methods to truly disentangle the apparent contradictions being debated here. I will admit I have not looked too deeply into their methods to discuss it now however.

Long story short, I really don’t know which result is best. What do you think?

 

Posted in New Papers | Tagged , , , | 1 Comment

Models of food web evolution

The two major types of food web models are those that are static and phenomenological, and those that are dynamic. The first type includes models such as the cascade, niche and nested hierarchy models (I have talked about these models elsewhere). The static models use a set of rules to reproduce statistical patterns of food web topologies, and have proven valuable to food web research but they have their limitations. One criticism is that using these models forces us to use what are called “trophic food webs” where trophically similar species are collapsed into a single node. I think that the future of food web research lies in models that explicitly incorporate the dynamics of predator prey interactions. Within this class of models are two basic methods of building food webs. The two methods are using community assembly or evolution based algorithms. In this post I am going to briefly go over a number of models that are utilize evolution based algorithms to introduce new species to the community. The following is modified from an appendix I wrote for my dissertation proposal, with links to arXiv preprints (when available) or journals.

There are five models of food webs based on evolution that have been proposed, as well as several modifications of some of these (Loeuille & Loreau 2010). These models are not designed to be realistic representations of biological evolution. Instead they should be thought of as theoretical approximations of how coevolution may act to drive the construction of community interactions (Christensen et al. 2002). The models I will describe here are Webworld (Caldarelli et al. 1998; Drossel et al. 2001, 2004), Recursive Evolutionary Branching (Ito & Ikegami 2006), Matching (Rossberg et al. 2006), Tangled Nature (Christensen et al. 2002; Anderson & Jensen 2005), and Body Size Evolution (Loeuille & Loreau 2005, 2010).

In the Webworld model species are defined by a particular set of abstract traits, which could in theory represent either morphological or behavioral characteristics (Caldarelli et al. 1998). Each species is assigned L traits out of a pool of K traits. It is important to note once again that these traits are abstract and only represented by integers 1… K. A KxK matrix (m) is defined such that its elements mab represent the score of a particular trait a against trait b. Predator-prey relationships are thus assigned by summing the score of species i against species j, where a positive score means i is adapted to be a predator of j and 0 means no interaction (Drossel et al. 2001). Competition for resources is hierarchically based on predator scores for a given species, meaning that a higher score indicates a better competitor. Speciation in the Webworld model occurs probabilistically as a function of population size. When a speciation event occurs, the new species is generated by randomly replacing 1 of the L traits of the chosen species (constrained such that the event must generate a unique set of species traits).  Other than the choice of L and K, the model has only three parameters to be set; the trophic efficiency of consumers, the total amount of external resources, and a parameter that determines the strength of competition (Caldarelli et al. 1998). This model does not consider individual variation, genetics, the mechanisms of speciation, and there is no distinction between the modes of reproduction (Caldarelli et al. 1998). It has, however, been used to consider the implications of varying the form of the functional response in predator-prey equations such that more realistic population dynamics can be incorporated (Drossel et al. 2001, 2004).

I suppose the model proposed by Ito and Ikegami (2006) would be termed the Recursive Evolutionary Branching (REB) model although they do not explicitly name it. This model is essentially a continuous version of the Webworld model described above (Loeuille & Loreau 2010). Unlike the Webworld model, however, in the REB model species are given two sets of traits; one for defining its location as a resource (resource traits), and one for defining the set of resources it uses (utilization traits). Ito and Ikegami (2006) assumed a single dimensional resource space and a two dimensional phenotype space to reduce the complexity of the model. In other words, given a 1 dimensional resource space each of the k species has a distribution describing it as a resource and a distribution describing the resources it utilizes. The distributions were described by a delta function. Evolutionary dynamics in the REB model are described as a diffusion process where most species are assumed to have a large number of individuals and the magnitude of a mutation is assumed to be small.

The Matching model (Rossberg et al. 2006) combines parts of both the Webworld and REB models. Like the Webworld model the Matching model represents species as a set of traits, and like the REB model there is a set of traits describing it as a resource (vulnerability) and as a predator (foraging). Additionally unlike either model these traits are represented as binary character strings with a defined length n. The strength of an interaction is then defined as the number of predator foraging traits that match prey vulnerability traits, so long as it exceeds some threshold value (defining the presence of a link).  Additionally, species are given a parameter which defines the size of a species, and they can only consume prey that are below a size threshold defined as their lambda more than their size (Rossberg et al. 2006). Lambda is a loopiness parameter, determining the probability of a trophic loop occurring. Unlike the previously described models the Matching model does not explicitly incorporate population dynamics, instead defining extinction, speciation, and invasion probability with specified parameters. New species invading the system are given their two binary trait sets and size randomly, while speciation events cause foraging and vulnerability traits to flip with a given probability and a random value (from a standard normal distribution) is added to the size parameter. This model assumes that evolution in all directions is equally likely.

I will only briefly outline the Tangled Nature model, as I do not anticipate using it in my research (and it gets confusing). The model is unique in that it may incorporate multiple interaction types, rather than being restricted to trophic interactions (Loeuille & Loreau 2010). Additionally it is an individual based model, rather than focusing on populations or species as the base unit. Individuals are defined as a binary string of “alleles” of length L, which are allowed to mutate with a defined probability. The model assumes that the interactions between species are determined by the interaction between the L loci. The strength of the interaction is given by a matrix with terms drawn from the uniform distribution along an interval of [-c,c] with a given probability.

While the first four models described typically utilize multiple traits to describe the interactions between organisms, the Body Size Evolution model uses only a single trait (body size, as the name implies). Loeiulle and Loreau (2005) developed a model of community evolution where each species is given a characteristic body size. This body size allows demography to be determined based on allometric relationships, trophic interactions to be described by defining an optimum feeding size range, and competition based on similar body sizes to be modeled. Population dynamics are modeled using differential equations with allometrically determined production efficiency (growth rate) and mortality, as well as functions describing consumption and competition based on body size differences between species. Basal species in the model consume inorganic nutrient, which is also modeled with a differential equation describing its rate of change as a function of input, output, and nutrient recycling. Species are modified at a rate (usually 10-6) per unit biomass at each time step. If a given population has a mutation event a new daughter population is created with a body size that is drawn from a uniform distribution along the interval [0.8x, 1.2x] where x is the parent body size.

Posted in Research | Tagged , , , | Leave a comment

Network basics with R and igraph (part III of III)

In this third and final post in this little series of mine I want to bring it all together with my favorite kind of network, food webs. Thus I will be using a real food web to go over all the different tools and properties. Specifically I will be using the food web of the Otago Harbour, NZ intertidal zone (found here).


###############################################################
###############################################################
####
####            PART III: FOOD WEBS
####
###############################################################
###############################################################
library(igraph)
library(NetIndices)
# Kim N. Mouritsen, Robert Poulin, John P. McLaughlin and David W. Thieltges. 2011.
# Food web including metazoan parasites for an intertidal ecosystem in New Zealand.
# Ecology 92:2006.

# Website: http://esapubs.org/archive/ecol/E092/173/

# Otago Harbour: intertidal mudflat
otago.links.data<-read.csv("~/Desktop/Projects/FoodwebAlpha/Data/Otago_Data_Links.csv")
otago.nodes.data<-read.csv("~/Desktop/Projects/FoodwebAlpha/Data/Otago_Data_Nodes.csv")

# Column names for data
colnames(otago.links.data)
colnames(otago.nodes.data)

# Convert the data into a graph object using the first 2 columns of the dataset as an edgelist
otago.graph<-graph.edgelist(as.matrix(otago.links.data[,1:2]))
# Create graph object of just predator prey links
otago.graph.p<-graph.edgelist(as.matrix(otago.links.data[1:1206,1:2]))

# Get the web into matrix form
otago.adjmatrix<-get.adjacency(otago.graph,sparse=F)
otago.adjmatrix.p<-get.adjacency(otago.graph.p,sparse=F)

# Get the basic network indices from the matrices with GenInd()
ind.otago<-GenInd(otago.adjmatrix)
ind.otago.p<-GenInd(otago.adjmatrix.p)

# Now to plot these two webs to get a feel for what we are dealing with

par(mar=c(.1,.1,.1,.1))
plot.igraph(otago.graph,vertex.label=NA,vertex.size=3,edge.arrow.size=.25,layout=layout.circle)
plot.igraph(otago.graph.p,vertex.label=NA,vertex.size=3,edge.arrow.size=.25,layout=layout.circle)

# The NetIndices package also has a function to get some of the trophic properties of the food web
# TrophInd() takes in an adjacency matrix and gives an output of the trophic level of each node,
# as well as an index of the degree of omnivory for each node

troph.otago<-TrophInd(otago.adjmatrix)
troph.otago.p<-TrophInd(otago.adjmatrix.p)

# An interesting aside, by adding parasites to the web it increases the trophic level of all species in
# this web.

plot(troph.otago[1:123,1]~troph.otago.p[,1],xlab="Level Without Parasites",ylab="Level With Parasites")
abline(a=0,b=1)

trophlevels


# An interesting use for this trophic level function is to then use trophic level as a plotting parameter.
# This way, I can plot the food web nodes according to trophic height. I think that this adds greatly to a plot
# of a food web, since you can gain more information about the trophic structure of the web by simply
# glancing at the plot.

# First we need to create a two-column matrix identifying the x and y values for each node.
layout.matrix.1<-matrix(
nrow=length(V(otago.graph)),  # Rows equal to the number of vertices
ncol=2
)
layout.matrix.1[,1]<-runif(length(V(otago.graph))) # randomly assign along x-axis
layout.matrix.1[,2]<-troph.otago$TL # y-axis value based on trophic level

layout.matrix.1p<-matrix(
nrow=length(V(otago.graph.p)),  # Rows equal to the number of vertices
ncol=2
)
layout.matrix.1p[,1]<-runif(length(V(otago.graph.p)))
layout.matrix.1p[,2]<-troph.otago.p$TL

# Now we can use these matrices to define the layout instead of using the circle layout

par(mar=c(.1,.1,.1,.1),mfrow=c(1,2))

plot.igraph(otago.graph,
vertex.label.cex=.35,
vertex.size=3,
edge.arrow.size=.25,
layout=layout.matrix.1)

plot.igraph(otago.graph.p,
vertex.label.cex=.35,
vertex.size=3,
edge.arrow.size=.25,
layout=layout.matrix.1p)

# I am still working on the best way to plot the nodes along the x-axis. You may notice that using
# runif() means that there is some chance that two nodes with the same trophic level
# will be right on top of one another


# It is also a bit interesting to see how the inclusion of parasites impacts community detection
wtc.otago<-walktrap.community(otago.graph)
wtc.otago.p<-walktrap.community(otago.graph.p)

par(mar=c(.1,.1,.1,.1),mfrow=c(1,2))

plot.igraph(otago.graph,
vertex.label.cex=.35,
vertex.size=3,
edge.arrow.size=.25,
layout=layout.matrix.1,
mark.groups=wtc.otago$membership,
mark.col="green")

plot.igraph(otago.graph.p,
vertex.label.cex=.35,
vertex.size=3,
edge.arrow.size=.25,
layout=layout.matrix.1p,
mark.groups=wtc.otago.p$membership,
mark.col="green")

# It is clear that the increase in the connectivity of the web with parasites has led to
# a larger densely connected community

Webs with walktrap community shown in green

Webs with walktrap community shown in green


# It is clear that the increase in the connectivity of the web with parasites has led to
# a larger densely connected community

# The degree distribution of a food web can tell us a lot about the amount of specialization and
# generalization in the web (in degree), as well as vulnerability (out degree)

deg.otago<-degree(otago.graph)
deg.otago.p<-degree(otago.graph.p)

# Using the degree distribution gives a better way to visualize any differences
# Looking at the in degree tells us about how general the diets of consumers are
dd.otago.in<-degree.distribution(otago.graph,mode="in",cumulative=T)
dd.otago.in.p<-degree.distribution(otago.graph.p,mode="in",cumulative=T)

# Out degree is a measure of the vulnerability of organisms, telling us how many consumers
# eat each species.
dd.otago.out<-degree.distribution(otago.graph,mode="out",cumulative=T)
dd.otago.out.p<-degree.distribution(otago.graph.p,mode="out",cumulative=T)

# And finally the degree ("all") simply tells us about how well connected that species is
# within the network
dd.otago<-degree.distribution(otago.graph,mode="all",cumulative=T)
dd.otago.p<-degree.distribution(otago.graph.p,mode="all",cumulative=T)

par(mfrow=c(2,2))
plot(dd.otago.in,xlim=c(0,80))
plot(dd.otago.out,xlim=c(0,80))
plot(dd.otago.in.p,xlim=c(0,80))
plot(dd.otago.out.p,xlim=c(0,80))

power.fit<-power.law.fit(deg.otago)
power.fit.p<-power.law.fit(deg.otago.p)

par(mfrow=c(1,2))
plot(dd.otago,log="xy")
lines(1:180,10*(1:180)^((-power.fit$alpha)+1))

plot(dd.otago.p,log="xy")
lines(1:100,10*(1:100)^((-power.fit.p$alpha)+1))

# I can look at the diameter of the two versions of the web
# For food webs the diameter is going to be the longest food chain
# since energy only flows in one direction, the diameter will read from
# basal species to top predator.

get.diameter(otago.graph)
get.diameter(otago.graph.p)

# I think that here it is interesting to note that the diameter of the predator-prey only
# food web (which we expect to be smaller) is not a subset of the diameter for the
# larger parasites included network

# The next few properties are all related to the small world-ness of the network:

transitivity(otago.graph)
transitivity(otago.graph.p)

# Betweenness is the number of shortest paths going through a specified node or edge

otago.between<-betweenness(otago.graph)
otago.between.p<-betweenness(otago.graph.p)

plot(otago.between[1:123]~otago.between.p)
abline(a=0,b=1)

otago.edge.between<-edge.betweenness(otago.graph)
otago.edge.between.p<-edge.betweenness(otago.graph.p)

closeness(otago.graph)

# Here are the adjacency matrices for each of the 13 subgraphs again
s1<-matrix(c(0,1,0,0,0,1,0,0,0),nrow=3,ncol=3)
s2<-matrix(c(0,1,1,0,0,1,0,0,0),nrow=3,ncol=3)
s3<-matrix(c(0,1,0,0,0,1,1,0,0),nrow=3,ncol=3)
s4<-matrix(c(0,0,1,0,0,1,0,0,0),nrow=3,ncol=3)
s5<-matrix(c(0,1,1,0,0,0,0,0,0),nrow=3,ncol=3)
d2<-matrix(c(0,1,1,1,0,1,0,0,0),nrow=3,ncol=3)
d1<-matrix(c(0,1,1,0,0,1,0,1,0),nrow=3,ncol=3)
d3<-matrix(c(0,0,1,1,0,0,1,0,0),nrow=3,ncol=3)
d4<-matrix(c(0,0,0,1,0,1,0,1,0),nrow=3,ncol=3)
d5<-matrix(c(0,1,1,0,0,1,1,0,0),nrow=3,ncol=3)
d6<-matrix(c(0,1,1,1,0,1,1,1,0),nrow=3,ncol=3)
d7<-matrix(c(0,1,1,1,0,1,1,0,0),nrow=3,ncol=3)
d8<-matrix(c(0,1,1,1,0,0,1,0,0),nrow=3,ncol=3)

# Turn them into a convenient list
subgraph3.mat<-list(s1,s2,s3,s4,s5,d1,d2,d3,d4,d5,d6,d7,d8)
# And then into a list of graph objects
subgraph3.graph<-lapply(subgraph3.mat,graph.adjacency)

# Count the number of the 13 different 3-node subgraphs in the two webs
subgraph.freq.otago<-c()
subgraph.freq.otago.p<-c()
for(i in 1:13){
subgraph.freq.otago[i]<-
graph.count.subisomorphisms.vf2(otago.graph,subgraph3.graph[[i]])
subgraph.freq.otago.p[i]<-
graph.count.subisomorphisms.vf2(otago.graph.p,subgraph3.graph[[i]])
}

plot(subgraph.freq.otago,type="o",lty=3, xlab="Subgraph",ylab="Frequency")
points(subgraph.freq.otago.p,type="o",lty=2)

plot(subgraph.freq.otago~subgraph.freq.otago.p)
abline(a=0,b=1)

Frequencies of the 13 possible 3-node subgraphs

Frequencies of the 13 possible 3-node subgraphs

Note I did not spend a great deal of time on the degree distribution and the small world properties of the food web. I am planning different posts on these different properties that will explore them in greater detail. Mostly, as I was writing up this post I noticed that there were a few things that didn’t match my expectations, and that there are a few other tools out there that give different answers. So be on the lookout for future posts about degree distributions and fitting a function to them, also for posts about the multitude of different tools that are supposedly all measuring the same thing (do they all give the same answer?).

Posted in Rbasics | Tagged , , , , , | 2 Comments