Package 'TRAMPR' reference manual

Title:	'TRFLP' Analysis and Matching Package for R
Description:	Matching terminal restriction fragment length polymorphism ('TRFLP') profiles between unknown samples and a database of known samples. 'TRAMPR' facilitates analysis of many unknown profiles at once, and provides tools for working directly with electrophoresis output through to generating summaries suitable for community analyses with R's rich set of statistical functions. 'TRAMPR' also resolves the issues of multiple 'TRFLP' profiles within a species, and shared 'TRFLP' profiles across species.
Authors:	Rich FitzJohn [aut, cre], Ian Dickie [aut]
Maintainer:	Rich FitzJohn <[email protected]>
License:	GPL-2
Version:	1.0-10
Built:	2025-02-13 02:39:59 UTC
Source:	https://github.com/richfitz/trampr

The TRAMPR Package (TRFLP Analysis and Matching Package for R)

Description

This package contains a collection of functions to help analyse terminal restriction fragment length polymorphism (TRFLP) profiles, by matching unknown peaks to known TRFLP profiles in order to identify species.

The TRAMPR package contains a vignette, which includes a worked example; type vignette("TRAMPRdemo") to view it. To see all documented help topics, type library(help=TRAMPR).

Details

Start by reading the TRAMP (and perhaps create.diffsmatrix) help pages, which explain the matching algorithm.

Then read load.abi to learn how to load ABI format data into the program. Alternatively, read TRAMPsamples and read.TRAMPsamples to load already-processed data.

If you already have a collection of knowns, read TRAMPknowns and read.TRAMPknowns to learn how to load them. Otherwise, read build.knowns to learn how to automatically generate a set of known profiles from your data.

Once your data are loaded, reread TRAMP to do the analysis, then read plot.TRAMP and summary.TRAMP to examine the analysis. update.TRAMP may also be useful for modifying your matches. summary.TRAMP is also useful for preparing presence/absence matrices for analysis with other tools (e.g. the vegan package; see the vignette indicated below).

TRAMPR works with database-like objects, and a basic understanding of relational databases and primary/foreign keys will aid in understanding some aspects of the package.

Citation

Please see citation("TRAMPR") for the citation of TRAMPR.

Note

TRAMPR is designed specifically for “database TRFLP” (identifying species based on a database of known TRFLP profiles: see Dicke et al. 2002. It is not designed for direct community analysis of TRFLP profiles as in peak-profile TRFLP.

Author(s)

Rich FitzJohn and Ian Dickie, Landcare Research

References

Dicke IA, FitzJohn RG 2007: Using terminal-restriction fragment length polymorphism (T-RFLP) to identify mycorrhizal fungi; a methods review. Mycorrhiza 17: 259-270.

Dickie IA, Xu B, Koide RT 2002. Vertical distribution of ectomycorrhizal hyphae in soil as shown by T-RFLP analysis. New Phytologist 156: 527-535.

FitzJohn RG, Dickie IA 2007: TRAMPR: An R package for analysis and matching of terminal-restriction fragment length polymorphism (TRFLP) profiles. Molecular Ecology Notes [doi:10.1111/j.1471-8286.2007.01744.x].

Add Knowns To TRAMPknowns Databases

Description

Add a single known or many knowns to a knowns database in a TRAMPknowns object. add.known takes a TRAMPknowns object, and adds the peak profile of a single sample from a TRAMPsamples object. combine.TRAMPknowns combines two TRAMPknowns objects (similar to combine.TRAMPsamples). add.known and combine are generic, so if x argument is a TRAMP object, then the knowns component of that object will be updated.

Usage

add.known(x, ...)
## S3 method for class 'TRAMPknowns'
add.known(x, samples, sample.fk, prompt=TRUE, default.species=NULL, ...)
## S3 method for class 'TRAMP'
add.known(x, sample.fk, rebuild=TRUE, ...)

## S3 method for class 'TRAMPknowns'
combine(x, y, rewrite.knowns.pk=FALSE, ...)
## S3 method for class 'TRAMP'
combine(x, y, rebuild=TRUE, ...)
add.known(x, ...)
## S3 method for class 'TRAMPknowns'
add.known(x, samples, sample.fk, prompt=TRUE, default.species=NULL, ...)
## S3 method for class 'TRAMP'
add.known(x, sample.fk, rebuild=TRUE, ...)

## S3 method for class 'TRAMPknowns'
combine(x, y, rewrite.knowns.pk=FALSE, ...)
## S3 method for class 'TRAMP'
combine(x, y, rebuild=TRUE, ...)

Arguments

`x`	A `TRAMPknowns` or `TRAMP` object, containing identified TRFLP patterns.
`samples`	A `TRAMPsamples` object, containing unidentified samples.
`sample.fk`	`sample.fk` of sample in `samples` to add to the knowns database. If `x` is a `TRAMP` object, then `sample.fk` refers to a sample in the `TRAMPsamples` object used in the creation of that `TRAMP` object (stored as `x$samples`: see `labels(x$samples)` for codes).
`prompt`	Logical: Should the function interactively prompt for a new species name?
`default.species`	Default species name. If `NULL` (the default), the name chosen will be the value of `samples$info$species` for the current sample. Set to `NA` if no name is currently known (see `group.knowns` - identical non-`NA` names are considered related).
`y`	A second `TRAMPknowns` object, containing knowns to add to `x`.
`rewrite.knowns.pk`	Logical: If the new knowns data contain `knowns.pk` values that conflict with those in the original `TRAMPknowns` object, should the new knowns be renumbered? If this is `TRUE`, do not rely on any `knowns.pk` values staying the same for the newly added knowns. `knowns.pk` values in the original `TRAMPknowns` object will never be changed.
`rebuild`	Logical: should the `TRAMP` object be rebuilt after adding knowns, by running `rebuild.TRAMP` on it? This is important to determine if the new known(s) match any of the samples in the `TRAMP` object. This should be left as `TRUE` unless you plan on manually rebuilding the object later.
`...`	Additional arguments passed to future methods.

Details

(add.known only): When adding the profile of a single individual via add.known, if more than one peak per enzyme/primer combination is present we select the most likely profile by picking the highest peak (largest height value) for each enzyme/primer combination (a warning will be given). If two peaks are of the same height, then the peak taken is unspecified (similar to build.knowns with min.ratio=0).

(combine only): rewrite.knowns.pk provides a simple way of merging knowns databases that use the same values of knowns.pk. Because knowns.pk must be unique, if y (the new knowns database) uses knowns.pk values present in x (the original database), then the knowns.pk values in y must be rewritten. This will be done by adding max(labels(x)) to every knowns.pk value in y$info and knowns.fk value in y$data.

If retaining knowns.pk information is important, we suggest saving the value of knowns.pk before running this function, e.g.

info$knowns.pk.old <- info$knowns.pk

If more control over the renaming process is required, manually adjust y$info$knowns.pk yourself before calling this function. However, by default no translation will be done, and an error will occur if x and y share knowns.pk values.

For add.known, only a subset of columns are passed to the knowns object (a future version may be more inclusive):

From samples$info: sample.pk (as knowns.pk.)
From samples$data: sample.fk (as knowns.fk), primer, enzyme, size.

For combine, the data and info elements of the resulting TRAMPknowns object will have the union of the columns present in both sets of knowns. If any additional elements exist as part of the second TRAMPknowns object (e.g. passed as ... to TRAMPknowns when creating y), these will be ignored.

Value

An object of the same class as x: if a TRAMP object is supplied, a new TRAMP object with an updated TRAMPknowns component will be returned, and if the object is a TRAMPknowns object an updated TRAMPknowns object will be returned.

Note

If the TRAMPknowns object has a file.pat element (see TRAMPknowns), then the new knowns database will be written to file. This may be confusing when operating on TRAMP objects directly, since both the TRAMPknowns object used in the TRAMP object and the original TRAMPknowns object will share the same file.pat argument, but contain different data as soon as add.known or combine is used. In short - be careful! To avoid this issue, either set file.pat to NULL before using add.known or combine.

Examples

data(demo.knowns)
data(demo.samples)

## (1) Using add.known(), to add a single known:

## Sample "101" looks like a potential known, add it to our knowns
## database:
plot(demo.samples, 101)

## Add this to a knowns database:
## Because there is more than one peak per enzyme/primer combination, a
## warning will be given.  In this case, since there are clear peaks it
## is harmless.
demo.knowns.2 <- add.known(demo.knowns, demo.samples, 101,
                           prompt=FALSE)

## The known has been added:
demo.knowns.2[101]
try(demo.knowns[101]) # error - known didn't exist in original knowns

## Same, but adding to an existing TRAMP object.
res <- TRAMP(demo.samples, demo.knowns)
plot(res, 101)
res2 <- add.known(res, 101, prompt=FALSE, default.species="New known")

## Now the new known matches itself.
plot(res2, 101)

## (2) Using combine() to combine knowns databases.

## Let's split the original knowns database in two:
demo.knowns.a <- demo.knowns[head(labels(demo.knowns), 10)]
demo.knowns.b <- demo.knowns[tail(labels(demo.knowns), 10)]

## Combining these is easy:
demo.knowns.c <- combine(demo.knowns.a, demo.knowns.b)

## Knowns from both the small database are present in the new one:
identical(c(labels(demo.knowns.a), labels(demo.knowns.b)),
          labels(demo.knowns.c))


## Demonstration of knowns rewriting:
demo.knowns.d <- demo.knowns.a
demo.knowns.a$info$from <- "a"
demo.knowns.d$info$from <- "d"

try(combine(demo.knowns.a, demo.knowns.d)) # error
demo.knowns.e <- combine(demo.knowns.a, demo.knowns.d,
                         rewrite.knowns.pk=TRUE)

## See that both data sets are here (check the "from" column).
demo.knowns.e$info

## Note that a better approach in might be to manually resolve
## conficting knowns.pk values before combining.
data(demo.knowns)
data(demo.samples)

## (1) Using add.known(), to add a single known:

## Sample "101" looks like a potential known, add it to our knowns
## database:
plot(demo.samples, 101)

## Add this to a knowns database:
## Because there is more than one peak per enzyme/primer combination, a
## warning will be given.  In this case, since there are clear peaks it
## is harmless.
demo.knowns.2 <- add.known(demo.knowns, demo.samples, 101,
                           prompt=FALSE)

## The known has been added:
demo.knowns.2[101]
try(demo.knowns[101]) # error - known didn't exist in original knowns

## Same, but adding to an existing TRAMP object.
res <- TRAMP(demo.samples, demo.knowns)
plot(res, 101)
res2 <- add.known(res, 101, prompt=FALSE, default.species="New known")

## Now the new known matches itself.
plot(res2, 101)

## (2) Using combine() to combine knowns databases.

## Let's split the original knowns database in two:
demo.knowns.a <- demo.knowns[head(labels(demo.knowns), 10)]
demo.knowns.b <- demo.knowns[tail(labels(demo.knowns), 10)]

## Combining these is easy:
demo.knowns.c <- combine(demo.knowns.a, demo.knowns.b)

## Knowns from both the small database are present in the new one:
identical(c(labels(demo.knowns.a), labels(demo.knowns.b)),
          labels(demo.knowns.c))


## Demonstration of knowns rewriting:
demo.knowns.d <- demo.knowns.a
demo.knowns.a$info$from <- "a"
demo.knowns.d$info$from <- "d"

try(combine(demo.knowns.a, demo.knowns.d)) # error
demo.knowns.e <- combine(demo.knowns.a, demo.knowns.d,
                         rewrite.knowns.pk=TRUE)

## See that both data sets are here (check the "from" column).
demo.knowns.e$info

## Note that a better approach in might be to manually resolve
## conficting knowns.pk values before combining.

Automatically Build Knowns Database

Description

This function uses several filters to select likely knowns, and construct a TRAMPknowns object from a TRAMPsamples object. Samples are considered to be “potential knowns” if they have data for an adequate number of enzyme/primer combinations, and if for each combination they have either a single peak, or a peak that is “distinct enough” from any other peaks.

Usage

build.knowns(d, min.ratio=3, min.comb=NA, restrict=FALSE, ...)
build.knowns(d, min.ratio=3, min.comb=NA, restrict=FALSE, ...)

Arguments

`d`	A `TRAMPsamples` object, containing samples from which to build the knowns database.
`min.ratio`	Minimum ratio of maximum to second highest peak to accept known (see Details).
`min.comb`	Minimum number of enzyme/primer combinations required for each known (see Details for behaviour of default).
`restrict`	Logical: Use only cases where `d$info$species` is non-blank? (These are assumed to come from samples of a known species. However, it is not guaranteed that all samples with data for `species` will become knowns; if they fail either the `min.ratio` or `min.comb` checks they will be excluded.)
`...`	Additional arguments passed to `TRAMPknowns` (e.g. `cluster.pars`, `file.pat` and any additional objects).

Details

For all samples and enzyme/primer combinations, the ratio of the largest to the second largest peak is calculated. If it is greater than min.ratio, then that combination is accepted. If the sample has at least min.comb valid enzyme/primer combinations, then that sample is included in the knowns database. If min.comb is NA (the default), then every enzyme/primer combination present in the data is required.

Value

A new TRAMPknowns object. It will generally be neccessary to edit this object; see read.TRAMPknowns for details on how to write, edit, and read back a modified object.

Note

If two peaks have the same height, then using min.ratio=1 will not allow the entry as part of the knowns database; use min.ratio=0 instead if this is desired. In this case, the peak chosen is unspecified.

Note that this function is sensitive to data quality. In particular split peaks may cause a sample not to be added. These samples may be manually added using add.known.

Examples

data(demo.samples)
demo.knowns.auto <- build.knowns(demo.samples, min.comb=4)
plot(demo.knowns.auto, cex=.75)
data(demo.samples)
demo.knowns.auto <- build.knowns(demo.samples, min.comb=4)
plot(demo.knowns.auto, cex=.75)

Combine Two Objects

Description

This function is used to combine TRAMPsamples together, and to combine TRAMPknowns to TRAMPknowns or TRAMP objects. combine is generic; please see combine.TRAMPsamples and combine.TRAMPknowns for more information.

Usage

combine(x, y, ...)
combine(x, y, ...)

Arguments

`x`, `y`	Objects to be combined. See `combine.TRAMPsamples` and `combine.TRAMPknowns` for more information.
`...`	Additional arguments required by methods.

Combine TRAMPsamples Objects

Description

Combines two TRAMPsamples objects into one large TRAMPsamples object containing all the samples for both original objects.

Usage

## S3 method for class 'TRAMPsamples'
combine(x, y, rewrite.sample.pk=FALSE, ...)
## S3 method for class 'TRAMPsamples'
combine(x, y, rewrite.sample.pk=FALSE, ...)

Arguments

`x`, `y`	`TRAMPsamples` objects, containing TRFLP patterns.
`rewrite.sample.pk`	Logical: If the new sample data (`y`) contains `sample.pk` values that conflict with those in the original `TRAMPsamples` object (`x`), should the new samples be renumbered? If this is `TRUE`, do not rely on any `sample.pk` values staying the same for the newly added samples. `sample.pk` values in the original `TRAMPsamples` object will never be changed.
`...`	Further arguments passed to or from other methods.

Details

For a discussion of rewrite.sample.pk, see the comments on rewrite.knowns.pk in the Details of combine.TRAMPknowns.

The data and info elements of the resulting TRAMPsamples object will have union of the columns present in both sets of samples.

If any additional elements exist as part of the second TRAMPsamples object (e.g. passed as ... to TRAMPsamples), these will be ignored with a warning (see Example).

Examples

data(demo.samples)

## Let's split the original samples database in two, and recombine.
demo.samples.a <- demo.samples[head(labels(demo.samples), 10)]
demo.samples.b <- demo.samples[tail(labels(demo.samples), 10)]

## Combining these is easy:
demo.samples.c <- combine.TRAMPsamples(demo.samples.a, demo.samples.b)

## There is a warning message because demo.samples.b contains extra
## elements:
names(demo.samples.b)

## In this case, these objects should not be combined, but in other
## cases it may be necessary to rbind() the extra objects together:
## Not run: 
demo.samples.c$soilcore <- rbind(demo.samples.a$soilcore,
                                 demo.samples.b$soilcore)

## End(Not run)

## This must be done manually, since there is no way of telling what
## should be done automatically.  Ideas/contributions are welcome here.
data(demo.samples)

## Let's split the original samples database in two, and recombine.
demo.samples.a <- demo.samples[head(labels(demo.samples), 10)]
demo.samples.b <- demo.samples[tail(labels(demo.samples), 10)]

## Combining these is easy:
demo.samples.c <- combine.TRAMPsamples(demo.samples.a, demo.samples.b)

## There is a warning message because demo.samples.b contains extra
## elements:
names(demo.samples.b)

## In this case, these objects should not be combined, but in other
## cases it may be necessary to rbind() the extra objects together:
## Not run: 
demo.samples.c$soilcore <- rbind(demo.samples.a$soilcore,
                                 demo.samples.b$soilcore)

## End(Not run)

## This must be done manually, since there is no way of telling what
## should be done automatically.  Ideas/contributions are welcome here.

Calculate Matrix of Distances between Peaks

Description

Generate an array of goodness-of-fit (or distance) between samples and knowns based on the sizes (in base pairs) of TRFLP peaks. For each sample/known combination, and for each enzyme/primer combination, this calculates the minimum distance between any peak in the sample and the single peak in the known.

Usage

create.diffsmatrix(samples, knowns)
create.diffsmatrix(samples, knowns)

Arguments

`samples`	A `TRAMPsamples` object, containing unidentified samples.
`knowns`	A `TRAMPknowns` object, containing identified TRFLP patterns.

Details

This function will rarely need to be called directly, but does most of the calculations behind TRAMP, so it is useful to understand how this works.

This function generates a three-dimensional $s \times k \times n$ matrix of the (smallest, see below) distance in base pairs between peaks in a collection of unknowns (run data) and a database of knowns for several enzyme/primer combinations. $s$ is the number of different samples in the samples data (length(labels(samples))), $k$ is the number of different types in the knowns database (length(labels(knowns))), and $n$ is the number of different enzyme/primer combinations. The enzyme/primer combinations used are all combinations present in the knowns database; combinations present only in the samples will be ignored. Not all samples need contain all enzyme/primer combinations present in the knowns.

In the resulting array, m[i,j,k] is the difference (in base pairs) between the ith sample and the jth known for the kth enzyme/primer combination. The ordering of the $n$ enzyme/primer combinations is arbitrary, so a data.frame of combinations is included as the attribute enzyme.primer, where enzyme.primer$enzyme[k] and enzyme.primer$primer[k] correspond to enzyme and primer used for the distances in m[,,k].

Each case in the knowns database has a single (or no) peak for each enzyme/primer combination, but each sample may contain multiple peaks for an enzyme/primer combination; the difference is always the smallest distance from the sample to the known peak. Where a sample and/or a known lacks an enzyme/primer combination, the value of the difference is NA. The smallest absolute distance is taken between sample and known peaks, but the sign of the difference is preserved (negative where the closest sample peak was less than the known peak, positive where greater; see absolute.min).

Value

A three-dimensional matrix, with an attribute enzyme.primer, described above.

Examples

data(demo.samples)
data(demo.knowns)

s <- length(labels(demo.samples))
k <- length(labels(demo.knowns))
n <- nrow(unique(demo.knowns$data[c("enzyme", "primer")]))

m <- create.diffsmatrix(demo.samples, demo.knowns)

dim(m)
identical(dim(m), c(s, k, n))

## Maximum error for each sample/known (i.e. across all enzyme/primer
## combinations), similar to how calculated by \link{TRAMP}
error <- apply(abs(m), 1:2, max, na.rm=TRUE)
dim(error)

## Euclidian error (see ?\link{TRAMP})
error.euclid <- sqrt(rowSums(m^2, TRUE, 2))/rowSums(!is.na(m), dims=2)

## Euclidian and maximum error will require different values of
## accept.error in TRAMP:
plot(error, error.euclid, pch=".")
data(demo.samples)
data(demo.knowns)

s <- length(labels(demo.samples))
k <- length(labels(demo.knowns))
n <- nrow(unique(demo.knowns$data[c("enzyme", "primer")]))

m <- create.diffsmatrix(demo.samples, demo.knowns)

dim(m)
identical(dim(m), c(s, k, n))

## Maximum error for each sample/known (i.e. across all enzyme/primer
## combinations), similar to how calculated by \link{TRAMP}
error <- apply(abs(m), 1:2, max, na.rm=TRUE)
dim(error)

## Euclidian error (see ?\link{TRAMP})
error.euclid <- sqrt(rowSums(m^2, TRUE, 2))/rowSums(!is.na(m), dims=2)

## Euclidian and maximum error will require different values of
## accept.error in TRAMP:
plot(error, error.euclid, pch=".")

Demonstration Knowns Database

Description

A knowns database, for demonstrating the TRAMPR package. This is a subset of a full knowns database, and not intended to represent any real data set, and should not be assumed to be accurate.

The data are stored as a TRAMPknowns object. Columns in the info and data components are described on the TRAMPknowns page.

Usage

data(demo.knowns)data(demo.knowns)

Licence

This data set is provided under a Creative Commons “Attribution-NonCommercial-NoDerivs 2.5” licence. Please see https://creativecommons.org/licenses/by-nc-nd/2.5/ for details.

Demonstration Samples Database

Description

A samples database, for demonstrating the TRAMPR package. This is a subset of a full samples database, is not intended to represent any real data set, and should not be assumed to be accurate.

The data are stored as a TRAMPsamples object. Columns in the info and data components are described on the TRAMPsamples page, but with some additions:

info:
- soilcore.fk: Key to the soil core from which a sample came. See soilcore, below.
data:
- sample.file.name: Original .fsa file corresponding to the TRFLP run. This is included in all TRAMPsamples objects created by load.abi.
soilcore: A data.frame with information about the soilcore from which samples came.
- soilcore.pk: Key, distinguishing soil cores.
- plot: Plot number (1 to 10).
- elevation: Height above mean sea level, in metres.
- east: Easting (New Zealand Map Grid/NZMG).
- north: Northing (NZMG).
- vegetation: Vegetation type (Nothofagus solandri or Pinus contorta).

Usage

data(demo.samples)data(demo.samples)

Format

A TRAMPsamples object.

Licence

This data set is provided under a Creative Commons “Attribution-NonCommercial-NoDerivs 2.5” licence. Please see https://creativecommons.org/licenses/by-nc-nd/2.5/ for details.

Knowns Clustering

Description

Group a TRAMPknowns object so that knowns with similar TRFLP patterns and knowns that share the same species name “group” together. In general, this function will be called automatically whenever appropriate (e.g. when loading a data set or adding new knowns). Please see Details to understand why this function is necessary, and how it works.

The main reason for manually calling group.knowns is to change the default values of the arguments; if you call group.knowns on a TRAMPknowns object, then any subsequent automatic call to group.knowns will use any arguments you passed in the manual group.knowns call (e.g. after doing group.knowns(x, cut.height=20), all future groupings will use cut.height=20).

Usage

group.knowns(x, ...)
## S3 method for class 'TRAMPknowns'
group.knowns(x, dist.method, hclust.method, cut.height, ...)
## S3 method for class 'TRAMP'
group.knowns(x, ...)
group.knowns(x, ...)
## S3 method for class 'TRAMPknowns'
group.knowns(x, dist.method, hclust.method, cut.height, ...)
## S3 method for class 'TRAMP'
group.knowns(x, ...)

Arguments

`x`	A `TRAMPknowns` or `TRAMP` object, containing identified TRFLP patterns.
`dist.method`	Distance method used in calculating similarity between different knowns (see `dist`). Valid options include `"maximum"`, `"euclidian"` and `"manhattan"`.
`hclust.method`	Clustering method used in generating clusters from the similarity matrix (see `hclust`).
`cut.height`	Passed to `cutree`; controls how similar members of each group should be (the larger `cut.height`, the more inclusive knowns groups will be).
`...`	Arguments passed to further methods.

Details

group.knowns groups together knowns in a TRAMPknowns object based on two criteria: (1) TRFLP profiles that are very similar across shared enzyme/primer combinations (based on clustering) and (2) TRFLP profiles that belong to the same species (i.e. share a common species column in the info data.frame of x; see TRAMPknowns for more information). This is to solve three issues in TRFLP analysis:

The TRFLP profile of a single species can have variation in peak sizes due to DNA sequence variation. By including multiple collections of each species, variation in TRFLP profiles can be accounted for. If a TRAMPknowns object contains multiple collections of a species, these will be aggregated by group.knowns. This aggregation is essential for community analysis, as leaving individual collections will artificially inflate the number of “present species” when running TRAMP.

Some authors have taken an alternative approach by using a larger tolerance in matching peaks between samples and knowns (effectively increasing accept.error in TRAMP) to account for within-species variation. This is not recommended, as it dramatically increases the risk of incorrect matches.
Distinctly different TRFLP profiles may occur within a species (or in some cases within an individual); see Avis et al. (2006). group.knowns looks at the species column of the info data.frame of x and joins any knowns with identical species values as a group. This can also be used where multiple profiles are present in an individual.
Different species may share a similar TRFLP profile and therefore be indistinguishable using TRFLP. If these patterns are not grouped, two species will be recorded as present wherever either is present. group.knowns prevents this by joining knowns with “very similar” TRFLP patterns as a group. Ideally, these problematic groups can be resolved by increasing the number of enzyme/primer pairs in the data.

Groups names are generated by concatenating all unique (sorted) species names together, separated by commas.

To determine if knowns are “similar enough” to form a group, we use R's clustering tools: dist, hclust and cutree. First, we generate a distance matrix of the knowns profiles using dist, and using method dist.method (see Example below; this is very similar to what TRAMP does, and dist.method should be specified accordingly). We then generate clusters using hclust, and using method hclust.method, and “cut” the tree at cut.height using cutree.

Knowns are grouped together iteratively; so that all groups sharing a common cluster are grouped together, and all knowns that share a common species name are grouped together. In certain cases this may chain together seemingly unrelated groups.

Because group.knowns is generic, it can be run on either a TRAMPknowns or a TRAMP object. When run on a TRAMP object, it updates the TRAMPknowns object (stored as x$knowns), so that subsequent calls to plot.TRAMPknowns or summary.TRAMPknowns (for example) will use the new grouping parameters.

Parameters set by group.knowns are retained as part of the object, so that when adding additional knowns (add.known and combine), or when subsetting a knowns database (see [.TRAMPknowns, aka TRAMPindexing), the same grouping parameters will be used.

Value

For group.knowns.TRAMPknowns, a new TRAMPknowns object. The cluster.pars element will have been updated with new parameters, if any were specified.

For group.knowns.TRAMP, a new TRAMP object, with an updated knowns element. Note that the original TRAMPknowns object (i.e. the one from which the TRAMP object was constructed) will not be modified.

Warning

Warning about missing data: where there are NA values in certain combinations, NAs may be present in the final distance matrix, which means we cannot use hclust to generate the clusters! In general, NA values are fine. They just can't be everywhere.

References

Avis PG, Dickie IA, Mueller GM 2006. A ‘dirty’ business: testing the limitations of terminal restriction fragment length polymorphism (TRFLP) analysis of soil fungi. Molecular Ecology 15: 873-882.

Examples

data(demo.knowns)
data(demo.samples)

demo.knowns <- group.knowns(demo.knowns, cut.height=2.5)
plot(demo.knowns)

## Increasing cut.height makes groups more inclusive:
plot(group.knowns(demo.knowns, cut.height=100))

res <- TRAMP(demo.samples, demo.knowns)
m1.ungrouped <- summary(res)
m1.grouped <- summary(res, group=TRUE)
ncol(m1.grouped) # 94 groups

res2 <- group.knowns(res, cut.height=100)
m2.ungrouped <- summary(res2)
m2.grouped <- summary(res2, group=TRUE)
ncol(m2.grouped) # Now only 38 groups

## group.knowns results in the same distance matrix as produced by
## TRAMP, therefore using the same method (e.g. method="maximum") is
## important.  The example below shows how the matrix produced by
## dist(summary(x)) (as calculated by group.knowns) is the same as that
## produced by TRAMP:
f <- function(x, method="maximum") {
  ## Create a pseudo-samples object from our knowns
  y <- x
  y$data$height <- 1
  names(y$info)[names(y$info) == "knowns.pk"] <- "sample.pk"
  names(y$data)[names(y$data) == "knowns.fk"] <- "sample.fk"
  class(y) <- "TRAMPsamples"

  ## Run TRAMP, clean up and return
  ## (If method != "maximum", rescale the error to match that
  ## generated by dist()).
  z <- TRAMP(y, x, method=method)
  if ( method != "maximum" ) z$error <- z$error * z$n
  names(dimnames(z$error)) <- NULL
  z
}

g <- function(x, method="maximum")
  as.matrix(dist(summary(x), method=method))

all.equal(f(demo.knowns, "maximum")$error,   g(demo.knowns, "maximum"))
all.equal(f(demo.knowns, "euclidian")$error, g(demo.knowns, "euclidian"))
all.equal(f(demo.knowns, "manhattan")$error, g(demo.knowns, "manhattan"))

## However, TRAMP is over 100 times slower in this special case.
system.time(f(demo.knowns))
system.time(g(demo.knowns))
data(demo.knowns)
data(demo.samples)

demo.knowns <- group.knowns(demo.knowns, cut.height=2.5)
plot(demo.knowns)

## Increasing cut.height makes groups more inclusive:
plot(group.knowns(demo.knowns, cut.height=100))

res <- TRAMP(demo.samples, demo.knowns)
m1.ungrouped <- summary(res)
m1.grouped <- summary(res, group=TRUE)
ncol(m1.grouped) # 94 groups

res2 <- group.knowns(res, cut.height=100)
m2.ungrouped <- summary(res2)
m2.grouped <- summary(res2, group=TRUE)
ncol(m2.grouped) # Now only 38 groups

## group.knowns results in the same distance matrix as produced by
## TRAMP, therefore using the same method (e.g. method="maximum") is
## important.  The example below shows how the matrix produced by
## dist(summary(x)) (as calculated by group.knowns) is the same as that
## produced by TRAMP:
f <- function(x, method="maximum") {
  ## Create a pseudo-samples object from our knowns
  y <- x
  y$data$height <- 1
  names(y$info)[names(y$info) == "knowns.pk"] <- "sample.pk"
  names(y$data)[names(y$data) == "knowns.fk"] <- "sample.fk"
  class(y) <- "TRAMPsamples"

  ## Run TRAMP, clean up and return
  ## (If method != "maximum", rescale the error to match that
  ## generated by dist()).
  z <- TRAMP(y, x, method=method)
  if ( method != "maximum" ) z$error <- z$error * z$n
  names(dimnames(z$error)) <- NULL
  z
}

g <- function(x, method="maximum")
  as.matrix(dist(summary(x), method=method))

all.equal(f(demo.knowns, "maximum")$error,   g(demo.knowns, "maximum"))
all.equal(f(demo.knowns, "euclidian")$error, g(demo.knowns, "euclidian"))
all.equal(f(demo.knowns, "manhattan")$error, g(demo.knowns, "manhattan"))

## However, TRAMP is over 100 times slower in this special case.
system.time(f(demo.knowns))
system.time(g(demo.knowns))

Load ABI Output Files

Description

These functions help convert data from Applied Biosystems Gene Mapper (ABI) output format into TRAMPsamples objects for analysis. Note that this operates on the summarised output (a text file), rather than the .fsa files containing data for individual runs.

Details of the procedure of this function are given below, and a worked example is given in the package vignette; type vignette("TRAMPRdemo") to view it.

The function peakscanner.to.genemapper is an experimental function to convert from peakscanner output to abi genemapper output. The peakscanner output is very slightly different in format, and currently load.abi is very fussy about the input file's structure. Eventially load.abi will be made more tolerant, but as an interim solution, run peakscanner.to.genemapper on your file. By default, running peakscanner.to.genemapper(myfile.csv) will produce a file myfile.txt. This can then be loaded using load.abi as described below, specifying myfile.txt as the file argument.

Usage

load.abi(file, file.template, file.info, primer.translate, ...)
load.abi.create.template(file, file.template)
load.abi.create.info(file, file.template, file.info)

peakscanner.to.genemapper(filename, output)
load.abi(file, file.template, file.info, primer.translate, ...)
load.abi.create.template(file, file.template)
load.abi.create.info(file, file.template, file.info)

peakscanner.to.genemapper(filename, output)

Arguments

`file`	The name of the file from which the ABI data are to be read from.
`file.template`	The name of the file containing the “template” file (see Details).
`file.info`	(Optional) the name of the file containing extra information associated with each sample (see Details).
`primer.translate`	List used to translate dye codes into primers. The same codes are assumed to apply across the whole file. See Details for format.
`...`	Additional objects to incorportate into a `TRAMPsamples` object. See `TRAMPsamples` for details.
`filename`	In `peakscanner.to.genemapper`, the name of the csv file containing output.
`output`	In `peakscanner.to.genemapper`, the name of the file to be output in abi format (if omitted, this will be automatically generated).

Details

Some terminology: a “sample” refers to a physical sample (e.g. a root tip), while a “run” refers to an individual TRFLP run (i.e. one enzyme and one primer). Because two primers are run at once, each “runfile” contains information on two “runs”, but each “sample” may contain more than one “runfile”. Runfiles are distinguished by different sample.file.name values in the ABI file, while different samples are distinguished by different sample.fk/sample.pk values.

primer.translate is a list used to translate between the dyes recorded in the ABI file and the primers used. Each element corresponds to a different primer, and is a vector of different colour dyes. The list:

list(ITS1F="B", ITS4="G")

would translate all dyes with the value "B" to "ITS1F", and all dyes with the value "G" to "ITS4". The list:

list(ITS1F="B", ITS4=c("G", "Y"))

would do the same, except that both "G" and "Y" dyes would be converted to "ITS4". If a dye is used in the data that is not represented within primer.translate, then it will be excluded (e.g., all rows of data with dye as "R" will be excluded).

The procedure for loading in ABI data is:

Create the “template” file. Template files are required to record which enzymes were used for each run, since that is not included in the ABI output, and to group together separate runs (typically different enzymes) that apply to the same individual. The function load.abi.create.template will create a template that contains all the unique file names found in the ABI file (as sample.file.name), and blank columns titled enzyme and sample.index. Running

load.abi.create.template(x)

where x is the name of your ABI file will create a template file in the same directory as the ABI file. The function will print the name and location of the template file to the console.
Edit the template file and save. The enzyme and sample.index columns are initially empty and need filling in, which can be done in Excel, or another spreadsheet program. The sample.index column links sample.file.name back to an individual sample; multiple sample.file.names that share sample.index values come from the same individual sample. (If editing with Excel, ignore all the warnings about incompatible file formats when saving.) sample.index should be a positive integer (but see Note below).
Optionally create an “info” file, which is useful if you want to associate extra information against your samples. The function load.abi.create.info will create an info file that contains all the unique values of sample.index, and an empty column titled species. The species column can be filled in where the species is known (e.g. from collections of sporocarps). Any additional columns may be added. Running

load.abi.create.info(x)

where x is the name of your ABI file will create an info file in the same directory as the ABI file. The function will print the name and location of the info file to the console. Edit and save this file.
Create the TRAMPsamples object by running load.abi. This loads your ABI data, plus the new template file, plus an optional information file. Running

my.samples <- load.abi(x, primer.translate=primer.translate)

will create an object “my.samples” containing your data.

By default, the filenames of the template and info files will be automatically generated: <prefix>.<ext> becomes <prefix>_template.csv or <prefix>_info.csv. If you choose to specify file.template or file.info manually when running load.info.create.template or load.info.create.info, you must use the same values of file.template and file.info when running load.abi.

Warning

Do not change the names of any columns produced by load.abi.create.template or load.abi.create.info.

Note

There is no reason that data from other types of output files could not be manually imported using TRAMPsamples. We welcome contributions for other major data formats.

When creating sample.index values, these should be positive integers. If you enter strings (e.g. a1, b1), these will be automatically converted into integers. Once loaded, sample.pk/sample.fk is always a positive integer key, but sample.index will be retained as your string keys.

Plot a TRAMP Object

Description

Creates a graphical representation of matches performed by TRAMP. The plot displays (1) “matches”, showing how samples match the knowns and (2) “peak profiles”, showing the locations of peaks for individual enzyme/primer combinations.

Usage

## S3 method for class 'TRAMP'
plot(x, sample.fk, ...)
TRAMP.plotone(x, sample.fk, grouped=FALSE, ignore=FALSE,
              all.knowns=TRUE, all.samples=FALSE,
              all.samples.global=FALSE, col=1:10,
              pch=if (grouped) 15 else 16, xmax=NULL, horiz.lines=TRUE,
              mar.default=.5, p.top=.5, p.labels=1/3, cex.axis=NULL,
              cex.axis.max=1)
## S3 method for class 'TRAMP'
plot(x, sample.fk, ...)
TRAMP.plotone(x, sample.fk, grouped=FALSE, ignore=FALSE,
              all.knowns=TRUE, all.samples=FALSE,
              all.samples.global=FALSE, col=1:10,
              pch=if (grouped) 15 else 16, xmax=NULL, horiz.lines=TRUE,
              mar.default=.5, p.top=.5, p.labels=1/3, cex.axis=NULL,
              cex.axis.max=1)

Arguments

`x`	A `TRAMP` object.
`sample.fk`	The `sample.fk` to plot. If omitted, then all samples are plotted, one after the other (this is useful for generating a summary of all fits for printing out: see Example).
`grouped`	Logical: Should the matched knowns be grouped?
`ignore`	Logical: Should matches marked as ignored by `remove.TRAMP.match` be excluded?
`all.knowns`, `all.samples`, `all.samples.global`	Controls which enzyme/primer combinations are displayed (see Details)
`col`	Vector of colours to plot the different enzyme/primer combinations. There must be at least as many colours as there are different combinations.
`pch`	Plotting symbol to use (see `points` for possible values and their interpretation). By default, this will use filled circles when ungrouped and filled squares when grouped.
`xmax`	Maximum size (in base pairs) for the plots to cover. `NULL` (the default) uses the range of all data found in the `TRAMPsamples` object (rounded up to the nearest 100). `NA` will use the range of all data in the current sample.
`horiz.lines`	Logical: Should horizontal grid lines be used for each matched known?

The following arguments control the layout and margins of the plot:

`mar.default`	Margin size (in lines of text) to surround the plot.
`p.top`	Proportion of the plotting area to be used for the “matches”. The “peak profiles” will share the bottom `1-p.top` of the plot.
`p.labels`	Proportion of the plotting area to be used for labels to the left of the plots. `1-p.labels` will be used for the plots (try increasing this if you have very long species or group names).
`cex.axis`	Size of the text used for axes. If `NULL` (the default), then the largest cex that will exactly fit labels is chosen (up to `cex.axis.max`).
`cex.axis.max`	Maximum size of the text used for axes, if automatically determining the label size (i.e. `cex.axis` is `NULL`).
`...`	Additional arguments passed to `TRAMP.plotone`.

Details

This constructs a plot of a TRAMP fit, illustrating where knowns match the sample data, and which sample peaks remain unmatched.

The top portion of the plot displays “matches”, showing how samples match the knowns. Individual species (or groups if grouped is TRUE) are represented by different horizontal lines. Where the sample matches a particular known, a symbol is drawn (Beware: it may look like only one symbol is drawn when several symbols are plotted on top of one another).

The bottom portion of the plot displays the “peak profile” of the sample, showing the locations and heights of peaks for various enzyme/primer combinations (the exact combination depends on the values of all.knowns, all.samples and all.samples.global; see below). The height is arbitrary, so units are ommited.

The arguments all.knowns, all.samples and all.samples.global control which enzyme/primer combinations are displayed in the plot. all.knowns=TRUE displays all combinations present in the knowns database and all.samples=TRUE displays all combinations present in the samples; when all.samples.global=TRUE this is combinations across the entire samples data set, otherwise this is samples present in the current sample only. At least one of all.knowns and all.samples must be TRUE.

Note

While TRAMP.plotone does the actual plot, it should not be called directly; please use plot(x, sample.fk, ...).

Examples

data(demo.samples)
data(demo.knowns)
res <- TRAMP(demo.samples, demo.knowns)

plot(res, 101)
plot(res, 110)
plot(res, 117)

plot(res, 117, grouped=TRUE)

## Not run: 
# Create a PDF file with all matches:
pdf("all_matches.pdf")
plot(res)
dev.off()

## End(Not run)
data(demo.samples)
data(demo.knowns)
res <- TRAMP(demo.samples, demo.knowns)

plot(res, 101)
plot(res, 110)
plot(res, 117)

plot(res, 117, grouped=TRUE)

## Not run: 
# Create a PDF file with all matches:
pdf("all_matches.pdf")
plot(res)
dev.off()

## End(Not run)

Summary Plot of Knowns Data

Description

Creates a plot showing the clustering and profiles of a TRAMPknowns object (a “knowns database”). The plot has three vertical panels;

The leftmost contains a dendrogram, showing how similar the profiles of knowns are (see group.knowns for details).
The rightmost displays the TRFLP profile for each individual (with a different colour symbol for each different enzyme/primer combination).
The middle panel displays information on the species names and groups of the knowns.

Usage

## S3 method for class 'TRAMPknowns'
plot(x, cex=1, name="species", pch=1, peaks.col, p=.02,
     group.clusters=TRUE, groups.col=1:4, grid.by=5, grid.col="gray",
     widths=c(1, 2, 1), ...)
## S3 method for class 'TRAMPknowns'
plot(x, cex=1, name="species", pch=1, peaks.col, p=.02,
     group.clusters=TRUE, groups.col=1:4, grid.by=5, grid.col="gray",
     widths=c(1, 2, 1), ...)

Arguments

`x`	A `TRAMPknowns` object.
`cex`	Character size for the plot. Because knowns databases can be large, this should be small and may need to be adjusted. Most aspects of the plot will scale with this.
`name`	Column name to use when generating species names; must be one of `species` or `group.name`.
`pch`	Plotting symbol to use for peaks in the peak profiles.
`peaks.col`	Vector of colours to plot the different enzymes in the peak profiles. These will be used in the order of the columns of `summary(x)`.
`p`	Scaling factor for the middle plot; this specifies the proportion of the width that elements are spaced horizontally from one another. Columns of text are `p` apart, brackets grouping knowns are `p/2` apart, and cluster groups (if present) are `p*2/3` apart.
`group.clusters`	Logical: Should groups of clusters (determined by `group.strict` - see `group.knowns`) be joined together?
`groups.col`	Vector of colours to plot different group clusters in. This will be recycled as neccessary.
`grid.by`	Interval between horizontal grid lines. Grid lines start at `ceiling(grid.by/2)` from the bottom of the plot. A value of `NA` suppresses grid lines.
`grid.col`	Colour of the horizontal grid lines.
`widths`	Relative widths of the three panels of the plot (see `layout`). `widths` must be a vector of 3 elements, corresponding to the three panels from left to right.
`...`	Additional arguments (ignored).

Note

In general, there will probably be too many knowns to make a legible plot when displayed on the screen. We recommend creating a PDF of the plot and viewing that instead (see Example).

When plotted on the interactive plotting device, if the plot is resized, the plot is likely to look strange.

Examples

data(demo.knowns)
plot(demo.knowns)

## Not run: 
pdf("knowns_summary.pdf", paper="default", width=8, height=11)
plot(demo.knowns)
plot(demo.knowns, group.clusters=FALSE)
dev.off()

## End(Not run)
data(demo.knowns)
plot(demo.knowns)

## Not run: 
pdf("knowns_summary.pdf", paper="default", width=8, height=11)
plot(demo.knowns)
plot(demo.knowns, group.clusters=FALSE)
dev.off()

## End(Not run)

Plot a TRAMPsamples Object

Description

Shows the peak profiles of samples in a TRAMPsamples object, showing the locations and heights of peaks for individual enzyme/primer combinations. This is the same information that is displayed in the bottom portion of a plot.TRAMP plot, but may be useful where a TRAMP fit has not been performed yet (e.g. before a knowns database has been constructed).

Usage

## S3 method for class 'TRAMPsamples'
plot(x, sample.fk, ...)
TRAMPsamples.plotone(x, sample.fk, all.samples.global=FALSE, col=1:10,
                     xmax=NULL, mar.default=.5, mar.labels=8, cex=1)
## S3 method for class 'TRAMPsamples'
plot(x, sample.fk, ...)
TRAMPsamples.plotone(x, sample.fk, all.samples.global=FALSE, col=1:10,
                     xmax=NULL, mar.default=.5, mar.labels=8, cex=1)

Arguments

`x`	A `TRAMPsamples` object, containing profiles to plot.
`sample.fk`	The `sample.fk` to plot. If omitted, then all samples are plotted, one after the other (this is useful for generating a summary of all fits for printing out: see Example).
`all.samples.global`	Logical: Should plots be set up for all enzyme/primer combinations present in `x`, even if the combinations are not present for all individual cases? Analagous to the same argument in `plot.TRAMP`. (This is useful for keeping combinations in the same place, and plotted with the same colours.)
`col`	Vector of colours to plot the different enzyme/primer combinations. There must be at least as many colours as there are different combinations.
`xmax`	Maximum size (in base pairs) for the plots to cover. `NULL` (the default) uses the range of all data found in the `TRAMPsamples` object (rounded up to the nearest 100). `NA` will use the range of all data in the current sample.
`mar.default`	Margin size (in lines of text) to surround the plot.
`mar.labels`	Number of lines of text to be used for labels to the left of the plots. Increase this if labels are being truncated.
`cex`	Scaling factor for text.
`...`	Additional arguments (ignored).

Examples

data(demo.samples)

plot(demo.samples, 101)
plot(demo.samples, 117)

## Not run: 
# Create a PDF file with all profiles:
pdf("all_profiles.pdf")
plot(demo.samples)
dev.off()

## End(Not run)
data(demo.samples)

plot(demo.samples, 101)
plot(demo.samples, 117)

## Not run: 
# Create a PDF file with all profiles:
pdf("all_profiles.pdf")
plot(demo.samples)
dev.off()

## End(Not run)

Read ABI Output Files

Description

Read an Applied Biosystems Gene Mapper (ABI) output file, and prepare for analysis.

Note that this operates on the summarised output (a text file), rather than the .fsa files containing data for individual runs.

Usage

read.abi(file)
read.abi(file)

Arguments

file

The name of the file from which the data are to be read.

Details

The ABI file format contains a few features that make it difficult to interact with directly, so read.abi provides a wrapper around read.table to work around these. The three issues are (1) trailing tab characters, (2) mixed case and punctuation in column names, and (3) parsing the “Dye/Sample Peak” column.

Because each line of an ABI file contains a trailing tab character (\t), read.table fails to read the file correctly. read.abi renames all columns so that non-alphanumeric characters all become periods, and all uppercase letters are converted to lower case.

The column Dye/Sample Peak contains data of the form <Dye>,<Sample Peak>, where <Dye> is a code for the dye colour used and <Sample Peak> is an integer indicating the order of the peaks. Entries where the contents of Dye/Sample Peak terminates in a "*" character (indicating an internal size standard) are automatically excluded from the analysis.

The final column names are:

sample.file.name: Name of the file containing data.
size: Size of the peak (in base pairs).
height: Height of the peak (arbitrary units).
dye: Code for dye used.
sample.peak: Rank of peak within current sample.

In addition, other column names may be retained from ABI output, but not used.

Note

There is no reason that data from other types of output files could not be manually imported using TRAMPsamples. We welcome contributions for other major data formats.

Read/Write TRAMPknowns and TRAMPsamples Objects

Description

Saves and loads TRAMPknowns and TRAMPsamples objects as a series of “csv” (comma separated value) files for external editing.

If you do not want to edit your data, then saving with save is preferable; it is faster, creates smaller files, and will save any additional components in the objects (see Examples).

Usage

read.TRAMPknowns(file.pat, auto.save=TRUE, overwrite=FALSE)
write.TRAMPknowns(x, file.pat=x$file.pat, warn=TRUE)

read.TRAMPsamples(file.pat)
write.TRAMPsamples(x, file.pat)
read.TRAMPknowns(file.pat, auto.save=TRUE, overwrite=FALSE)
write.TRAMPknowns(x, file.pat=x$file.pat, warn=TRUE)

read.TRAMPsamples(file.pat)
write.TRAMPsamples(x, file.pat)

Arguments

`x`	A `TRAMPknowns` or `TRAMPsamples` object.
`file.pat`	Pattern, with the filename prefix: “info” and “data” objects will be read/written as `<file.pat>_info.csv` and `<file.pat>_data.csv`, respectively.
`auto.save`	Logical: Should `TRAMPknowns` object be automatically saved back to the loaded filename as it is modified (e.g. knowns added to the database). If this is `TRUE`, the original files will be backed up as `<file.pat>_(info\|data)_<YYYYMMDD>.csv`, where `<YYYYMMDD>` is the ISO date.
`overwrite`	Should previous backup files be overwritten when creating new backups?
`warn`	Should the function warn when no filename is given? (Because this function is called automatically when adding new knowns, and because `TRAMPknowns` objects need not contain a `file.pat` element, it may not be possible or neccesary to save).

Details

file.pat may contain a path. It is best to use forward slashes as directory separators (path/to/file), but on Windows (only), double backslashes will also work (path\\to\\file).

Paths may be either relative (e.g. path/to/file), or absolute (e.g. /path/to/file, or x:/path/to/file on Windows).

Examples

## Not run: 
# Preferred way of saving/loading objects, if editing is not required:
save(demo.knowns, file="my_knowns.Rdata")

# (possibly in a different session, but _after_ loading TRAMP)
load("my_knowns.Rdata") # -> creates 'demo.knowns' in global environment

## End(Not run)
## Not run: 
# Preferred way of saving/loading objects, if editing is not required:
save(demo.knowns, file="my_knowns.Rdata")

# (possibly in a different session, but _after_ loading TRAMP)
load("my_knowns.Rdata") # -> creates 'demo.knowns' in global environment

## End(Not run)

Rebuild a TRAMP Object

Description

This function rebuilds a TRAMP object. Typically this will be called automatically after adding knowns (see add.known); there should be little need to call this manually. The same parameters that were used in the original call to TRAMP are used again, and these cannot currently be modified during this call.

Usage

rebuild.TRAMP(x)rebuild.TRAMP(x)

Arguments

`x`	A `TRAMP` object.

Value

A new TRAMP object, with all components recalculated.

Mark a TRAMP Match as Ignored

Description

Mark a match in a TRAMP object as ignored; when this is set, a match will be ignored when producing presence/absence matrices (see summary.TRAMP) or when plotting (plot.TRAMP) when ignore is TRUE. update.TRAMP provides an interactive interface for doing this, but remove.TRAMP.match may be useful directly.

Usage

remove.TRAMP.match(x, sample.fk, knowns.fk)
remove.TRAMP.match(x, sample.fk, knowns.fk)

Arguments

`x`	A `TRAMP` object.
`sample.fk`, `knowns.fk`	Key of sample and known, respectively. See `TRAMPsamples` and `TRAMPknowns` for more information.

Value

A modified TRAMP object.

Warning

This should be regarded as experimental. There is currently no mechanism for restoring ignored matches, aside from recreating the TRAMP object, or through editing x$presence.ign directly (the format of that table is self-explanatory, but is not guaranteed not to change between TRAMP versions). Note that by default, summary.TRAMP and plot.TRAMP will not remove matches; you must specify ignore=TRUE to enable this.

Note

This function returns a modified object - the TRAMP object is not modified in place. You must do:

x <- remove.TRAMP.match(x, sample.fk, knowns.fk)

to mark a match as ignored in the object x.

Create Presence/Absence Matrices from TRAMP Objects

Description

Generate a summary of a TRAMP object, by producing a presence/absence matrix. This is the preferred way of extracting the presence/absence matrix from a TRAMP object, and allows for grouping, naming knowns, and ignoring matches (specified by remove.TRAMP.match).

Usage

## S3 method for class 'TRAMP'
summary(object, name=FALSE, grouped=FALSE, ignore=FALSE, ...)
## S3 method for class 'TRAMP'
summary(object, name=FALSE, grouped=FALSE, ignore=FALSE, ...)

Arguments

`object`	A `TRAMP` object.
`name`	Logical: Should the knowns be named?
`grouped`	Logical: Should the knowns be grouped?
`ignore`	Logical: Should matches marked as ignored be excluded?
`...`	Further arguments passed to or from other methods.

Value

A presence/absence matrix, with samples as rows and knowns as columns. If name is TRUE, then names of knowns (or groups of knowns) are used, otherwise the knowns.fk is used (group.strict if grouped). If grouped is TRUE, then the knowns are collapsed by group (using group.strict; see group.knowns). A group is present if any of the knowns belonging to it are present. If ignore is TRUE, then any matches marked by remove.TRAMP.match are excluded.

Examples

data(demo.knowns)
data(demo.samples)
res <- TRAMP(demo.samples, demo.knowns)

head(summary(res))
head(summary(res, name=TRUE))
head(summary(res, name=TRUE, grouped=TRUE))

## Extract the species richness for each sample (i.e. the number of
## knowns present in each sample)
rowSums(summary(res, grouped=TRUE))

## Extract species frequencies and plot a rank abundance diagram:
## (i.e. the number of times each known was recorded)
sp.freq <- colSums(summary(res, name=TRUE, grouped=TRUE))

sp.freq <- sort(sp.freq[sp.freq > 0], decreasing=TRUE)
plot(sp.freq, xlab="Species rank", ylab="Species frequency", log="y")
text(1:2, sp.freq[1:2], names(sp.freq[1:2]), cex=.7, pos=4, font=3)
data(demo.knowns)
data(demo.samples)
res <- TRAMP(demo.samples, demo.knowns)

head(summary(res))
head(summary(res, name=TRUE))
head(summary(res, name=TRUE, grouped=TRUE))

## Extract the species richness for each sample (i.e. the number of
## knowns present in each sample)
rowSums(summary(res, grouped=TRUE))

## Extract species frequencies and plot a rank abundance diagram:
## (i.e. the number of times each known was recorded)
sp.freq <- colSums(summary(res, name=TRUE, grouped=TRUE))

sp.freq <- sort(sp.freq[sp.freq > 0], decreasing=TRUE)
plot(sp.freq, xlab="Species rank", ylab="Species frequency", log="y")
text(1:2, sp.freq[1:2], names(sp.freq[1:2]), cex=.7, pos=4, font=3)

TRFLP Analysis and Matching Program

Description

Determine if TRFLP profiles may match those in a database of knowns. The resulting object can be used to produce a presence/absence matrix of known profiles in environmental samples.

The TRAMPR package contains a vignette, which includes a worked example; type vignette("TRAMPRdemo") to view it.

Usage

TRAMP(samples, knowns, accept.error=1.5, min.comb=4, method="maximum")
TRAMP(samples, knowns, accept.error=1.5, min.comb=4, method="maximum")

Arguments

`samples`	A `TRAMPsamples` object, containing unidentified samples.
`knowns`	A `TRAMPknowns` object, containing identified TRFLP patterns.
`accept.error`	The largest acceptable difference (in base pairs) between any peak in the sample data and the knowns database (see Details; interpretation will depend on the value of `method`).
`min.comb`	Minimum number of enzyme/primer combinations required before presence will be tested. The default (4) should be reasonable in most cases. Setting `min.comb` to `NA` will require that all enzyme/primer combinations in the knowns database are present in the samples.
`method`	Method used in calculating the difference between samples and knowns; may be one of `"maximum"`, `"euclidian"` or `"manhattan"` (or any unambiguous abbreviation).

Details

TRAMP attempts to determine which species in the ‘knowns’ database may be present in a collection of samples.

A sample matches a known if it has a peak that is “close enough” to every peak in the known for every enzyme/primer combination that they share. The default is to accept matches where the largest distance between a peak in the knowns database and the sample is less than accept.error base pairs (default 2), and where at least min.comb enzyme/primer combinations are shared between a sample and a known (default 4).

The three-dimensional matrix of match errors is generated by create.diffsmatrix. In the resulting array, m[i,j,k] is the difference (in base pairs) between the ith sample and the jth known for the kth enzyme/primer combination.

If $p_k$ and $q_k$ are the sizes of peaks for the $k$ th enzyme/primer combination for a sample and known (respectively), then maximum distance is defined as

$\max(|p_k - q_k|)$

Euclidian distance is defined as

$\frac{1}{n}\sqrt{\sum (p_k - q_k)^2}$

and Manhattan distance is defined as

$\frac{1}{n}\sum{|p_k - q_k|}$

where $n$ is the number of shared enzyme/primer combinations, since this may vary across sample/known combinations. For Euclidian and Manhattan distances, accept.error then becomes the mean distance, rather than the total distance.

Value

A TRAMP object, with elements:

`presence`	Presence/absence matrix. Rows are different samples (with rownames from `labels(samples)`) and columns are different knowns (with colnames from `labels(knowns)`). Do not access the presence/absence matrix directly, but use `summary.TRAMP`, which provides options for labelling knowns, grouping knowns, and excluding “ignored” matches.
`error`	Matrix of distances between the samples and known, calculated by one of the methods described above. Rows correspond to different samples, and columns correspond to different knowns. The matrix dimension names are set to the values `sample.pk` and `knowns.pk` for the samples and knowns, respectively.
`n`	A two-dimensional matrix (same dimensions as `error`), recording the number of enzyme/primer combinations present for each combination of samples and knowns.
`diffsmatrix`	Three-dimensional array of output from `create.diffsmatrix`.
`enzyme.primer`	Different enzyme/primer combinations present in the data, in the order of the third dimension of `diffsmatrix` (see `create.diffsmatrix` for details).
`samples`, `knowns`, `accept.error`, `min.comb`, `method`	The input data objects and arguments, unmodified.

In addition, an element presence.ign is included to allow matches to be ignored. However, this interface is experimental and its current format should not be relied on - use remove.TRAMP.match rather than interacting directly with presence.ign.

Matching is based only on peak size (in base pairs), and does not consider peak heights.

Examples

data(demo.knowns)
data(demo.samples)

res <- TRAMP(demo.samples, demo.knowns)

## The resulting object can be interrogated with methods:

## The goodness of fit of the sample with sample.pk=101 (see
## ?\link{plot.TRAMP}).
plot(res, 101)

## Not run: 
## To see all plots (this produces many figures), one after another.
op <- par(ask=TRUE)
plot(res)
par(op)

## End(Not run)

## Produce a presence/absence matrix (see ?\link{summary.TRAMP}).
m <- summary(res)
head(m)
data(demo.knowns)
data(demo.samples)

res <- TRAMP(demo.samples, demo.knowns)

## The resulting object can be interrogated with methods:

## The goodness of fit of the sample with sample.pk=101 (see
## ?\link{plot.TRAMP}).
plot(res, 101)

## Not run: 
## To see all plots (this produces many figures), one after another.
op <- par(ask=TRUE)
plot(res)
par(op)

## End(Not run)

## Produce a presence/absence matrix (see ?\link{summary.TRAMP}).
m <- summary(res)
head(m)

Index (Subset) TRAMPsamples and TRAMPknowns Objects

Description

This provides very basic support for subsetting TRAMPsamples and TRAMPknowns objects.

Usage

## S3 method for class 'TRAMPknowns'
x[i, na.interp=TRUE, ...]
## S3 method for class 'TRAMPsamples'
x[i, na.interp=TRUE, ...]
## S3 method for class 'TRAMPknowns'
x[i, na.interp=TRUE, ...]
## S3 method for class 'TRAMPsamples'
x[i, na.interp=TRUE, ...]

Arguments

`x`	A `TRAMPsamples` or `TRAMPknowns` object.
`i`	A vector of `sample.fk` or `knowns.fk` values. For valid values, use `labels(x)`. If any index values are not present in `x`, then an error will be raised. Alternatively, this may be a logical vector, of the same length as the number of samples or knowns in `x`. See Examples for use of this.
`na.interp`	Logical: Controls how `NA` values should be interpreted when `i` is a logical vector.
`...`	Further arguments passed to or from other methods.

Details

When indexing by logical vectors, NA values do not make valid indexes, but may be produced when testing columns that contain missing values, so these must be converted to either TRUE or FALSE. If i is a logical index that contains missing values (NAs), then na.interp controls how they will be interpreted:

If na.interp=TRUE, then TRUE, FALSE, NA becomes TRUE, FALSE, TRUE.
If na.interp=FALSE, then TRUE, FALSE, NA becomes TRUE, FALSE, FALSE.

Warning

For TRAMPknowns objects, if the file.pat element is specified as part of the object (see TRAMPknowns), then the subsetted TRAMPknowns object will be written to a file. This may not be what you want, so it is probably best to disable knowns writing by doing x$file.pat <- NULL before doing any subsetting (where x is the name of your TRAMPknowns object).

Examples

data(demo.samples)
data(demo.knowns)

## Subsetting by sample.fk values
labels(demo.samples)
demo.samples[c(101, 102, 110)]
labels(demo.samples[c(101, 102, 110)])

## Take just samples from the first 10 soilcores:
demo.samples[demo.samples$info$soilcore.fk <= 10]

## Indexing also works on TRAMPknowns:
demo.knowns[733]
labels(demo.knowns[733])
data(demo.samples)
data(demo.knowns)

## Subsetting by sample.fk values
labels(demo.samples)
demo.samples[c(101, 102, 110)]
labels(demo.samples[c(101, 102, 110)])

## Take just samples from the first 10 soilcores:
demo.samples[demo.samples$info$soilcore.fk <= 10]

## Indexing also works on TRAMPknowns:
demo.knowns[733]
labels(demo.knowns[733])

TRAMPknowns Objects

Description

These functions create and interact with TRAMPknowns objects (collections of known TRFLP patterns). Knowns contrast with “samples” (see TRAMPsamples) in that knowns contain identified profiles, while samples contain unidentified profiles. Knows must have at most one peak per enzyme/primer combination (see Details).

Usage

TRAMPknowns(data, info, cluster.pars=list(), file.pat=NULL,
            warn.factors=TRUE, ...)


## S3 method for class 'TRAMPknowns'
labels(object, ...)
## S3 method for class 'TRAMPknowns'
summary(object, include.info=FALSE, ...)
TRAMPknowns(data, info, cluster.pars=list(), file.pat=NULL,
            warn.factors=TRUE, ...)


## S3 method for class 'TRAMPknowns'
labels(object, ...)
## S3 method for class 'TRAMPknowns'
summary(object, include.info=FALSE, ...)

Arguments

`data`	data.frame containing peak information.
`info`	data.frame, describing individual samples (see Details for definitions of both data.frames).
`cluster.pars`	Parameters used when clustering the knowns database. See Details.
`file.pat`	Optional partial filename in which to store knowns database after modification. Files `<file.pat>_info.csv` and `<file.pat>_data.csv` will be created.
`warn.factors`	Logical: Should a warning be given if any columns in `info` or `data` are converted into factors?
`object`	A `TRAMPknowns` object.
`include.info`	Logical: Should the output be augmented with the contents of the `info` component of the `TRAMPknowns` object?
`...`	`TRAMPknowns`: Additional objects to incorportate into a `TRAMPknowns` object. Other methods: Further arguments passed to or from other methods.

Details

The object has at least two components, which relate to each other (in the sense of a relational database). info holds information about the individual samples, and data holds information about individual peaks (many of which may belong to a single sample).

Column definitions:

info:

knowns.pk:

Unique positive integer, used to identify individual knowns (i.e. a “primary key”).

species:

Character, giving species name.
data:

knowns.fk:

Positive integer, indicating which sample the peak belongs to (by matching against info$knowns.pk) (i.e. a “foreign key”).

primer:

Character, giving the name of the primer used.

enzyme:

Character, giving the name of the restriction digest enzyme used.

size:

Numeric, giving size (in base pairs) of the peak.

In addition, TRAMPknowns will create additional columns holding clustering information (see group.knowns). Additional columns are allowed (and retained, but ignored) in both data.frames. Additional objects are allowed as part of the TRAMPknowns object, but these will not be written by write.TRAMPknowns; any extra objects passed (via ...) will be included in the final TRAMPknowns object.

The cluster.pars argument controls how knowns will be clustered (this will happen automatically as needed). Elements of the list cluster.pars may be any of the three arguments to group.knowns, and will be used as defaults in subsequent calls to group.knowns. If not provided, default values are: dist.method="maximum", hclust.method="complete", cut.height=2.5 (if only some elements of cluster.pars are provided, the remaining elements default to the values above). To change values of clustering parameters in an existing TRAMPknowns object, use group.knowns.

A known contains at most one peak per enzyme/primer combination. Where a species is known to have multiple TRFLP profiles, these should be treated as separate knowns with different, unique, knowns.pk values, but with identical species values. A sample containing either pattern will then be recorded as having that species present (see group.knowns).

Value

`TRAMPknowns`	A new `TRAMPknowns` object: a list with components `info`, `data` (the provided data.frames, with clustering information added to `info`), `cluster.pars` and `file.pat`, plus any extra objects passed as `...`.
`labels.TRAMPknowns`	A sorted vector of the unique samples present in `x` (from `info$knowns.pk`).
`summary.TRAMPknowns`	A data.frame, with the size of the peak (if present) for each enzyme/primer combination, with each known (indicated by `knowns.pk`) as rows and each combination (in the format `<primer>_<enzyme>`) as columns.

Note

Across a TRAMPknowns object, primer and enzyme names must be exactly the same (including case and whitespace) to be considered the same. For example "ITS4", "Its4", "ITS 4" and "ITS4 " would be considered to be four different primers.

Factors will not merge correctly (with combine.TRAMPknowns or add.known). TRAMPknowns will attempt to catch factor columns and convert them into characters for the info and data data.frames. Other objects (passed as part of ...) will not be altered.

Examples

## This example builds a TRAMPknowns object from completely artificial
## data:

## The info data.frame:
knowns.info <-
  data.frame(knowns.pk=1:8,
             species=rep(paste("Species", letters[1:5]), length=8))
knowns.info

## The data data.frame:
knowns.data <- expand.grid(knowns.fk=1:8,
                           primer=c("ITS1F", "ITS4"),
                           enzyme=c("BsuRI", "HpyCH4IV"))
knowns.data$size <- runif(nrow(knowns.data), min=40, max=800)

## Construct the TRAMPknowns object:
demo.knowns <- TRAMPknowns(knowns.data, knowns.info, warn.factors=FALSE)

## A plot of the pretend knowns:
plot(demo.knowns, cex=1, group.clusters=TRUE)
## This example builds a TRAMPknowns object from completely artificial
## data:

## The info data.frame:
knowns.info <-
  data.frame(knowns.pk=1:8,
             species=rep(paste("Species", letters[1:5]), length=8))
knowns.info

## The data data.frame:
knowns.data <- expand.grid(knowns.fk=1:8,
                           primer=c("ITS1F", "ITS4"),
                           enzyme=c("BsuRI", "HpyCH4IV"))
knowns.data$size <- runif(nrow(knowns.data), min=40, max=800)

## Construct the TRAMPknowns object:
demo.knowns <- TRAMPknowns(knowns.data, knowns.info, warn.factors=FALSE)

## A plot of the pretend knowns:
plot(demo.knowns, cex=1, group.clusters=TRUE)

TRAMPsamples Objects

Description

These functions create and interact with TRAMPsamples objects (collections of TRFLP patterns). Samples contrast with “knowns” (see TRAMPknowns) in that samples contain primarily unidentified profiles. In contrast with knowns, samples may have many peaks per enzyme/primer combination.

Usage

TRAMPsamples(data, info=NULL, warn.factors=TRUE, ...)

## S3 method for class 'TRAMPsamples'
labels(object, ...)
## S3 method for class 'TRAMPsamples'
summary(object, include.info=FALSE, ...)
TRAMPsamples(data, info=NULL, warn.factors=TRUE, ...)

## S3 method for class 'TRAMPsamples'
labels(object, ...)
## S3 method for class 'TRAMPsamples'
summary(object, include.info=FALSE, ...)

Arguments

`data`	data.frame containing peak information.
`info`	(Optional) data.frame, describing individual samples (see Details for definitions of both data.frames). If this is omitted, a basic data.frame will be generated.
`warn.factors`	Logical: Should a warning be given if any columns in `info` or `data` are converted into factors?
`object`	A `TRAMPsamples` object.
`include.info`	Logical: Should the output be augmented with the contents of the `info` component of the `TRAMPsamples` object?
`...`	`TRAMPsamples`: Additional objects to incorportate into a `TRAMPsamples` object. Other methods: Further arguments passed to or from other methods.

Details

Column definitions:

info:

sample.pk

Unique positive integer, used to identify individual samples (i.e. a “primary key”).

species

Character, giving species name if samples were collected from an identified species. If this column is missing, it will be initialised as NA.
data:

sample.fk

Positive integer, indicating which sample the peak belongs to (by matching against info$sample.pk) (i.e. a “foreign key”).

primer:

Character, giving the name of the primer used.

enzyme:

Character, giving the name of the restriction digest enzyme used.

size

Numeric, giving size (in base pairs) of the peak.

height

Numeric, giving the height (arbitrary units) of the peak.

Additional columns are allowed (and ignored) in both data.frames, and will be retained. This allows notes on data quality and treatments to be easily included. Additional objects are allowed as part of the TRAMPsamples object; any extra objects passed (via ...) will be included in the final TRAMPsamples object.

If info is omitted, then a basic data.frame will be generated, containing just the unique values of sample.fk, and NA for species.

Value

`TRAMPsamples`	A new `TRAMPsamples` object, as described above.
`labels.TRAMPsamples`	A sorted vector of the unique samples present in `x` (from `info$sample.pk`).
`summary.TRAMPsamples`	A data.frame, with the number of peaks per enzyme/primer combination, with each sample (indicated by `sample.pk`) as rows and each combination (in the format `<primer>_<enzyme>`) as columns.

Note

Across a TRAMPsamples object, primer and enzyme names must be exactly the same (including case and whitespace) to be considered the same. For example "ITS4", "Its4", "ITS4 " and "ITS 4" would be considered to be four different primers.

Factors will not merge correctly (with combine.TRAMPsamples). TRAMPsamples will attempt to catch factor columns and convert them into characters for the info and data data.frames. Other objects (passed as part of ...) will not be altered.

Interactively Alter a TRAMP Object

Description

This function allows some manual checking and correction of a TRAMP object. By default, it steps through each sample, and offers to (1) add a new known to the TRAMPknowns database within the TRAMP object (see add.known for details), (2) mark matches to be ignored in future calls to plot.TRAMP (see remove.TRAMP.match), (3) save the current plot as a PDF.

Usage

## S3 method for class 'TRAMP'
update(object, sample.fk=labels(object$samples), grouped=FALSE,
       ignore=TRUE, delay.rebuild=FALSE, default.species=NULL,
       filename.fmt="TRAMP_%d.pdf", ...)
## S3 method for class 'TRAMP'
update(object, sample.fk=labels(object$samples), grouped=FALSE,
       ignore=TRUE, delay.rebuild=FALSE, default.species=NULL,
       filename.fmt="TRAMP_%d.pdf", ...)

Arguments

`object`	A `TRAMP` object.
`sample.fk`	A vector of `sample.fk` to cycle through. If omitted, this will default to all samples present in the `TRAMPsamples` component of the `TRAMP` object.
`grouped`, `ignore`	Plotting parameters, as in `plot.TRAMP`. Currently these cannot be altered from their default values.
`delay.rebuild`	Logical: Should the rebuild of the `TRAMP` object be delayed until the function returns? If this is `FALSE` (the default), then the `TRAMP` object will rebuild every time a new known is added. This may take a while for large objects, so if set to `TRUE`, then the `TRAMP` object will not be rebuilt until all `sample.fk`s have been displayed. This means that any new samples added as knowns will not be included in plots.
`default.species`	Default species name for newly added knowns. Passed to `add.known`.
`filename.fmt`	Format used to generate filenames when saving PDFs. Include a `"%d"` to stand in for the `sample.fk` (so `"TRAMP_%d.pdf"` becomes `"TRAMP_12.pdf"` for `sample.fk` 12).
`...`	Further arguments passed to the plotting function `plot.TRAMP`.

Warning

If an error occurs while running update, all modifications will be lost.

Note

update.TRAMP returns a modified TRAMP object, and does not modify the original TRAMP object in place. You must use it like:

x <- update(x)

x2 <- update(x)

to modify the original object or create a new, modified object in place. Note that if creating mutiple objects, if the TRAMPknowns object has a file.pat element, then any changes to either of x or x2 will be written back to file, but the knowns contained in x and x2 may be different. See the note in add.known.

The action “Quit” will always exit the update function and save the object.

Be careful when using a TRAMP object whose TRAMPknowns element has a file.pat element; new knowns added will be immediately written to file.

Examples

## Since this function runs interactively, there can be no sample.
## Since this function runs interactively, there can be no sample.

Package 'TRAMPR'

Help Index

The TRAMPR Package (TRFLP Analysis and Matching Package for R)

Description

Details

Citation

Note

Author(s)

References

Add Knowns To TRAMPknowns Databases

Description

Usage

Arguments

Details

Value

Note

See Also

Examples

Automatically Build Knowns Database

Description

Usage

Arguments

Details

Value

Note

Examples

Combine Two Objects

Description

Usage

Arguments

See Also

Combine TRAMPsamples Objects

Description

Usage

Arguments

Details

See Also

Examples

Calculate Matrix of Distances between Peaks

Description

Usage

Arguments

Details

Value

See Also

Examples

Demonstration Knowns Database

Description

Usage

Licence

Demonstration Samples Database

Description

Usage

Format

Licence

Knowns Clustering

Description

Usage

Arguments

Details

Value

Warning

References

See Also

Examples

Load ABI Output Files

Description

Usage

Arguments

Details

Warning

Note

See Also

Plot a TRAMP Object

Description

Usage

Arguments

Details

Note

See Also