Total evidence versus conditional combination in the age of phylogenomics: Analyses of Pancrustacean phylogeny and divergence times

Abstract –

 

Keywords:  

The ever-intensifying deluge of molecular sequence information presents both challenges and opportunities for the reconstruction of the history and timing of life on earth.  A major challenge is that the sheer volume of data can quickly outstrip the computational power available to conduct the cutting edge methods with the most statistical rigor and sophistication, especially during exploratory phases of analysis.  While complex model-based phylogenetic techniques recently have made enormous strides in speed {raxml, garli, phyml, beagle}, enough large, multi-gene datasets to overload any supercomputer are now commonplace, owing to EST and next-generation sequencing technologies {EST; scorpion; arthMBE}. Yet the magnitude of available data also affords opportunities.  When data are cheap and abundant, the best data for the question at hand can be discovered and retained and data inappropriate for the question at hand can be culled {cite Phillipe 2007 paper}.  arthMBE took this approach to restricting EST data.  While these approaches are likely to rekindle philosophical debates on the merits of ‘total evidence’ versus conditional combination of data {Bull, 1993}, sound a priori definitions of appropriate data, coupled with the pragmatic necessity for computational tractability make attractive the conditional analysis of data.  Here, we explore the challenges and opportunities afforded by a large, concatenated data set including morphological, rDNA, mitochondrial protein, and Expressed Sequence Tag (EST) characters from Pancrustacea, with particular attention on an understudied, often neglected group of crustaceans, the Ostracoda.  Because a foundational goal for comparative biology is to elucidate the evolutionary tree of life, including both patterns and timing of common ancestry, and because understudied groups like Ostracoda have potential to contribute to deeper understanding of the evolutionary process, these neglected groups deserve the heightened attention of phylogenetic biologists.

As a riotously speciose and evolutionarily and ecologically important animal clade, the phylogeny and taxonomy of Pancrustacea has received considerable attention for decades.  Although some progress has been made toward consensus opinions on formerly contentious hypotheses - witness support for the monophyly of Crustacea plus Hexapoda (Friedrich & Tautz 1995; Zrzavỳ et al. 1997; Boore et al. 1998; Dohle 2001; Giribet et al. 2001; Hwang et al. 2001; Delsuc et al. 2003; Nardi et al. 2003; Regier et al. 2005; Harzsch (2004) ) and the polyphyly of Maxillopoda {Boxshall 1983; Abele et al. 1992; Shultz and Regier 2000; Regier et al. 2005} - a number of phylogenetic questions still remain.  One contentious phylogenetic question is particularly relevant to the current dataset.  

        Whether or not Ostracoda are monophyletic remains an open question.  Ostracods are small (usually 1–2 mm) crustaceans, which today live in virtually all aquatic habitats, including deep and shallow seas, and small temporary to large freshwater bodies worldwide. Ostracods fossilize well because they often live in ocean sediments, and possess a calcified, bivalved carapace, which fully encloses their body. Ostracoda are comprised of 3 major taxonomic groups, Myodocopa, Podocopa, and the monotypic Palaeocopa (Manawa staceyi).  Although based on rather limited taxon sampling, rDNA data did not support monophyly {Spears; Oakley, 2002}. Different carapace ontogeny in Podocopa and Myodocopa was also used to argue for polyphyly {Wakayama, 2007}, even though no intervening taxonomic groups were suggested.  In contrast, morphological phylogenetic analyses supported monophyly {Horne, 2005}[a], although multiple putative near outgroups were not included in the analysis. Recent analysis of 62 protein coding genes was consistent with monophyly, although based on only three ostracod species, and with low support values {Regier, 2010}.

        

  1. (Divergence Time of Holometabolous Insects)
  2. (Divergence Times with fossils included versus constraints)[b]

 

METHODS

Data

Existing molecular data. — We included previously published data, focusing on major pancrustacean clades and on species included in multiple previous data sets.  We analyzed data from 62 single-copy nuclear protein-coding genes of XX species {Regier, 2010}, plus Expressed Sequence Tag (EST) data from XX species, all 13 protein coding genes from XX species’ mitochondrial genomes, XX species’ entire genome sequence, 18s rDNA data from XX species, and 28S rDNA data from XX species.  The availability and sources of data for each species is detailed in Table 1.

Specimen collection. — To previously published data, we added new transcriptome data, generated by 454 pyrosequencing.  In particular, we focused on superfamilies of Ostracoda, because ostracods have a voluminous and ancient fossil record, but are often neglected in molecular phylogenies of Pancrustacea.  Collection information for the 8 species analyzed by pyrosequencing is detailed in Table 2.

Morphological data . — We scored 183 characters, mainly from literature sources, for 59 (**this includes Actinoseta AND Parasterope, also Manawa etc) extant pancrustaceans plus 21 (**excludes marrellomorph) fossils.  Characters came primarily from three previous publications. We used all 28 characters scored by Horne et al {Horne et al, 2005} for ostracod superfamiles. We used 36 of 97 characters from Hou et al {Hou, 2010}, which is based on the dataset of Wills {Wills et al, 1998}. We excluded those characters constant within Pancrustacea and those redundant with Horne.  In addition, we analyzed 86 characters from {Rota-Stabelli et al, 2011}. Additional characters came from (Høeg and Kolbasov, 2002; Huys and Boxshall, 1981; Olesen, 2009; Perez-Losada et al, 2004; Syme and Oakley, in ????; Wheeler et al, 2001).  New codings were added based on personal observation of fossils.  We used MorphoBank {Morphobank ref, P385} to concatenate morphological data sets and to score all taxa for as many characters as possible.

RNA extraction and cDNA synthesis. —Because our future studies will analyze genes expressed in ostracod eyes, we obtained tissue for cDNA from whole bodies, bodies minus eyes and/or eyes alone of pooled individuals for each species (see table XX for details). We usually extracted RNA using the organic solvent TRIzol (Invitrogen) according to manufacturer's protocol and treating with TurboDNAse (Applied Biosystems). For C. californica and A. jonesi, we used the Nucleospin RNA XS isolation kit (Macherey-Nagal).  Purified RNA was quantified on a Qubit Flurometer (Invitrogen).  We generated cDNA using the SMART or SMARTer cDNA synthesis kit (Clontech).  To reduce sequencing artifacts due to poly-T tracts, we used modified 3’ primers for first strand synthesis: (SMART) 5’- AAG CAG TGG TAT CAA CGC AGA GTG GCC GAG GCG GGC CTTTTTTTTTTCTTTTTTTTTT – 3’ and (SMARTer) 5’- AAG CAG TGG TAT CAA CGC AGA GTA CTTTTTTCTTTTTT -3’.  We conducted second strand synthesis using the amplification protocol outlined in the SMART/SMARTer cDNA kits, varying cycle number from 18-22 depending on initial RNA concentration (Table 3).  Amplified cDNA was purified using phenol:chloforom:isoamyl protocol and quantified on a Qubit fluorometer (Invitrogen).  We pooled separate second strand reactions for each species and tissue type to reach a concentration of 5-7 g for each cDNA pool.  The resulting cDNA samples were shipped either to Duke University or Brigham Young University for titanium pyrosequencing using the Roche 454 platform, according to manufacturer’s instructions, employing partial runs with either a manifold or barcodes (Table 3).

Transcriptome Analyses

Assembly. — We assembled new transcriptome data with GS De novo Assembler v2.3 (‘newbler’; 454 Life Sciences/Roche) to create a cDNA de novo assembly with default threshold options.  We used LUCY {Chou and Holmes 2001} to trim low quality nucleotide reads and deleted any assembled contig below 100 nucleotides in length. Assembled EST’s from public databases were provided by Roeding {Roeding}, except for XXX, which was assembled with newbler, as above.  We obtained data from Regier et al {Regier, 2010} from GenBank and treated those protein coding genes like ESTs in our analyses.

Ortholog determination. — We used HaMStR {Ebersberger et al. 2009} to determine orthologs.  HaMStR first employs genewise {genewise} to translate cDNA sequences in all reading frames. HaMStR then uses profile Hidden Markov Models (HMMs) and hmmr {hmmr} to search all translations for matching genes. We used the ‘arthropoda_hmmr3’ set of core orthologs, provided with HaMStR. XX of the 62 proteins of the Regier et al {2010} were not present in these core orthologs, so we trained new HMMs for those proteins using hmmr3 and alignments of each gene from 5 species from different clades: Skogsbergia lerneri, Cypridopsis vidua, Speleonectes tulumensis, Triops longicaudatus, Limulus polyphemus, and Scutigera coleoptrata. HaMStR next uses blast {blast} to search a reference genome, for which we used Drosophila melanogaster. If the putative ortholog did not hit the fly ortholog as the top hit, the gene in question was not retained as an ortholog.  As a result, genes containing in paralogs {sensu Koonin}, including for example the common phylogenetic marker  EF1alpha {EF1 paralogs ref}, were not always retained as an ortholog.

Alignment and Alignment Masking. — We next aligned each gene family using Muscle (Edgar, 2004), and estimated the ML tree topology and branch lengths assuming a WAG model, implemented in RAXML.  We used BioPerl to determine the average branch length of each gene family, and then excluded any genes whose terminal branches were more than 4 times the average.  We found this approach removed artifactual sequences, including poorly translated sequences.  Finally, we reduced noise in the data by identifying and removing aligned regions that did not show more similarity than random.  Here, we used ALISCORE (Misof and Misof 2009, Kück et al. 2010) and ALICUT (http://www.utilities.zfmk.de) using default settings, and including the ‘-e’ option recommended for EST data.  We placed all data in a local MySQL database and wrote custom perl and bash scripts to allow easy generation of data subsets based on criteria such as data types, species, and estimated rate of evolution of the gene family.

Rate of Evolution

        

Phylogenetic Analyses

Raxml. — Analyses with raxml 7.2.6 {Stamatakis} using HPC options {parallel rax ref} allowed us to concatenate all data types together, including morphological, rDNA, EST, and mitochondrial proteins.  We analyzed various subsets of the full dataset (explained in results), and each time partitioned data by type.  We divided morphological data into two partitions (binary and multi) to allow different models to be applied to each.  For the multi-state data, we report analyses using the MK model, as preliminary analyses with the GTR model gave non-sensical results.  We assumed a GTR model for the rDNA, which is routinely best-fit for these data {Tinn, 2008}. For EST’s we employed the WAG model in all cases, and for mitochondrial proteins, we employed the arthropod mitochondrial (mtART) model.

MrBayes. —

BEAST. —

Calibration and divergence times. —

An often overlooked element of divergence time estimation is analysis of the phylogenetic relationships of fossils, which can have strong influence on final results {e.g. Tinn, 2008}.  Instead, fossil placement is often assumed based solely on taxonomic information.  We used two different methods to determine the phylogenetic placement of fossils.  First, we used a Maximum Likelihood (ML) fossil placement algorithm developed by Berger and Stamatakis {Berger, 2009}.  This method assumes a molecular phylogeny for a set of extant taxa, then generates weights for each morphological character based on congruence with the molecular phylogeny.  Next, the method attaches the fossils to every possible branch of the molecular tree, and in each case calculates the likelihood of observing the weighted morphological data.  The placement of each fossil in the molecular tree is the placement with the maximum likelihood estimate.  This model is currently only implemented with binary characters in raxml 7.2.6, and so we could not include our multistate characters in this analysis without developing new software.  Second, we examined the placement of fossils through the concatenated analysis of molecular and morphological data in raxml 7.2.6.

RESULTS

Data. — Our final data set contained 70 species and 259694 characters. Of the 70 species, 46 were extant and 24 fossils.  The data set contained 136 binary and 46 multi-state morphological characters. The final aligned rDNA data (28S plus 18S) comprised 7013 nucleotide characters.  The nuclear protein coding genes numbered 971 genes and 249951 amino acid characters. The mitochondrial genome proteins numbered 12, totaling 2545 aligned amino acid characters.  We primarily analyzed numerous different subsets of this full data set.

Phylogenetic analysis

Topology.—  The analysis of all extant taxa and all character partitions present in six or more species, partitioned by data type in raxml, resulted in strong bootstrap support for most nodes (Figure XX).  We call this data set ‘Extant Total’ (Table 1).  Monophyly of major clades, including Thecostraca, Copepoda, Malacostraca, Hexapoda and Branchiopoda was supported, each with 100% bootstrap value.  As in Regier et al, we found support for Multicrustacea (Thecostraca, Copepoda and Malacostraca), and for Oligostraca (Ostracoda, Pentastomida, Mystacocarida, and Branchiura), again with 100% bootstrap support.  Unlike Regier et al, we recovered a clade of Hexapoda, Branchiopoda, and Xenocarida (Remipedia plus Cephalocarida).  Like Regier et al {Regier, 2010} and Giribet et al {Giribet, 2001}, we found support for Xenocarida with the ‘Extant Total’ data set.

Although most nodes had very high bootstrap support in the Extant Total analysis, three nodes did not.  First, the relationships within Multicrustacea were equivocal, placing Thecostraca and Copepoda together in 28% of bootstrap replicates.  The analysis of Regier et al {Regier, 2010} instead placed together Thecostraca and Malacostraca.  Second, we found the sister group of Hexapoda to be equivocal in the Extant Total analysis.  This analysis placed Branchiopoda and Xenocarida together, but only with 24% support.  Together, these two clades formed the sister group of Hexapoda, unlike Regier et al {Regier, 2010}, who found Xenocarida as the sister group of Hexapoda, and found Branchiopoda as the sister group of Multicrustacea.  The third poorly supported node involves the monophyly of Ostracoda, a prime topological focus of this paper.  We found non-monophyly, with Podocopa grouping with Ichthyostraca (Pentastomida, Branchiura, and Mystacocarida), with 75% bootstrap support. Regier et al found Ostracoda monophyly, while including three ostracod species, but with weak support (<78%).  To better understand the three equivocal nodes, we analyzed different data subsets (Table 1).  Two of the uncertain nodes were clarified, but one (the sister group of Hexapoda) was not.

Table 1 - Analyses exploring phylogenetic topology of extant species

Dataset Name

N Species

N Characters

N Genes*

% Gaps

Extant Total

47

937

Extant Slow 2.5

47

Extant Slow 2.0

47

129780

484

Multicrustacea Total

17

14152

61

42.99

Hexapod Sister

17

86889

XX

58.14

Oligostraca Total

22

58440

262

75.31**

* To  to be included in the data set, we required that a gene be present in 6 or more species, and its alignment contain 50 or more characters.  All data sets include morphology and rDNA characters.

**note the inclusion of 5 fossils here, that cannot have molecular data reduces the overall completeness of the matrix

In analyses of data subsets, we found support for a clade grouping Thecostraca and Copepoda.  First, we analyzed alone the well supported Multicrustacea clade (Thecostraca, Copepoda, and Malacostraca), using all data partitions and Limulus and Scutigera as outgroups.  We call this data set ‘Multicrustacea Total’ and it yields support for Thecostraca plus Copepoda at 92% (Figure XX).  In addition, we analyzed all extant species removing the fastest-evolving genes (those with a rate of 2.5 or more), plus morphology and rDNA. We call this data set ‘Extant Slow 2.5’ and it showed slightly increased bootstrap support (42%), compared to ‘Extant Full’ (28%) for the Thecostraca plus Copepoda clade. We then removed additional fast-evolving genes (those with a rate of 2.0 or higher), still retaining morphological data and rDNA.  We call this data set ‘Extant Slow 2.0’ and it showed even higher bootstrap support for Thecostraca plus Copepoda (62%).

Analyses of data subsets did not clarify the sister group of Hexapoda, which remains one of the most important open questions in the phylogeny of Pancrustacea.  First, we analyzed alone the well supported clade of Hexapoda + Branchiopoda + Xenocarida, using Scutigera and Limulus as outgroups.  We call this dataset ‘Hexapod Sister’, and we found that support for Xenocarida (Remipedia plus Cephalocarida) decreased to 49%, and support for Xenocarida plus Branchiopoda remained low at 34%.  However, in the analyses of more slowly evolving genes, bootstrap values for for Branchiopoda + Xenocarida did increase. In the ‘Extant Slow 2.5’ data set, bootstrap value was 53% and for the ‘Extant Slow 2’ data set 54%.

We also investigated ostracod monophyly further, and found reasonable support in analyses of multiple subsets of data.  First, we analyzed alone the well supported Oligostraca clade, using all data partitions and Limulus and Scutigera as outgroups.  We call this data set ‘Oligostraca Total’ and it yields support for ostracod monophyly at 88%, and Ichthyostraca monophyly at 96% (Figure XX).  Second, when excluding the most rapidly evolving genes, we found ostracod monophyly in the ML tree, and found higher bootstrap support when removing additional fast evolving genes.  For the ‘Extant Slow 2.5 dataset’, bootstrap support for Ostracoda was 27% and for Icthyostraca was 81%, and for ‘Extant Slow 2’, values were 53% for Ostracoda and 92% for Ichthyostraca.

Divergence Time Analyses

Fossil Placement.— 

The placement of most fossil species was consistent under both models (Figure XX). Exceptions are Nahecaris, Nymphatelina, Pattersoncypris, Waptia, and Yicaris (these are indicated by both white circles, representing regular ML, and black boxes, representing the RWM/BS model). In most cases, the positions inferred by regular ML are more congruent with existing morphological hypotheses (see Discussion).

Possibly a sentence about performing divtime analysis on competing topologies

DISCUSSION

Fossil Placement

OSTRACODS AND OSTRACOD-LIKE FOSSILS

Due to our particular aim to resolve the placement of ostracods within pancrustaceans, we followed Hou et al (2010) in studying two groups of bivalved fossils that have in the past been allied with ostracods: bradoriids (Sylvester-Bradley 1961) and phosphatocopines (Müller 1964). We also included several crown-group ostracods from the Silurian, Triassic, and Cretaceous, which have been hypothesized as members of Myodocopa and Podocopa. This is especially important, as the incongruence in ostracod divergence times estimated from molecular vs. fossil data by Tinn and Oakley (2008) may have been driven by problems with fossil placement. In particular, characteristics of the carapace may be homoplastic (Tinn and Oakley 2008; Siveter et al 2007). As well, the divisions between Myodocopa and Podocopa are based on soft anatomy. Therefore, for more accurate comparison, we only included fossils with exceptionally preserved limbs.

As in Hou et al. (2010), the bradoriids, which have in the past been allied with ostracods (Sylvester-Bradley 1961) due to the bivalved carapace, fall outside of crown-group Mandibulata. This is not surprising, as they lack differentiated tritocerebral appendages (mandibles), instead bearing biramous trunk limbs. Kunyangella also has only four cephalic limbs (Hou et al 2010), and five cephalic limbs are a key apomorphy of Pancrustacea (Rota-Stabelli et al 2011).

The phosphatocopines, traditionally assumed to be related to ostracods (e.g. Müller 1964; Williams et al. 2008; Hou et al. 2010), are here allied with a rhizocephalan barnacle, Loxothylacus, for which most morphological characters could not be coded, as the adult is a highly reduced endoparasite lacking limbs or organs. Four morphological characters are implicated in relating phosphatocopines with Thecostraca: an all-encompassing ventral carapace, nauplius larval stage, lack of a differentiated limbless abdomen, and inwardly directed spines on the antennal exopods. This placement is surprising, as recent analyses by Hou et al (2010) placed phosphatocopines as either sister to ostracods or basal within Crustacea (with only remipedes more basally). Phosphatocopines are distinct from ostracods primarily in their lack of maxillae (they bear trunk limbs on the posterior cephalic segments) (Maas et al 2003). In the latter tree, phosphatocopines were excluded from the other crustaceans by a lack of externally visible tagmosis (Hou et al 2010); however this feature is homoplastic as it is at least shared by ostracods. No phylogenetic analysis testing the position of phosphatocopines relative to multiple extant pancrustacean groups has yet recovered the hypothesis of Maas et al (2003), where phosphatocopines are the sister group to all extant crustaceans (=Eucrustacea). Clearly the affinities of this group are still under debate.

        Two of the fossil ostracods had placements that differed depending on the model of evolution. The Cretaceous Pattersoncypris was described as a member of the extant podocope family Cyprididae (Bate 1971; Smith 2000). Under regular ML, this is confirmed, but the BS/RWM model contradicts this entirely and places the fossil on the stem lineage of the Myodocopa. We agree that the Cyprididae placement is more likely, as Pattersoncypris possesses very similar limbs (especially fifth, sixth and seventh, as noted by Smith, 2000). On the other hand, Nymphatelina was described by Siveter et al (2007) as a stem group myodocope, and the alternate positions of stem myodocopid (BS) or stem halocyprid (regular ML) agree with that suggestion. Our confirmation of the hypotheses of Siveter et al (2003; 2010) that the Herefordshire ostracods Colymbosathon and Nasunaris are members of the extant family Cylindroleberididae is particularly interesting, because that would push back the origins of the Myodocopida to at least ___[c] mya. This exacerbates the discrepancy between rates of evolution in the sister lineages Myodocopida (slow) and Halocyprida (fast) (Tinn and Oakley 2008).

*should we mention anything about the position of Manawa (since we didn’t analyze), or would that go in another section?

MULTICRUSTACEA

Morphological studies have suggested the Malacostraca are basal to all other living pancrustaceans (=Entomostraca), based primarily on tagmosis (presence of pleon), secondary flagella on the antennule, and retention of the mandibular palp (endopod) in the adult (Walossek 1999? 2003?). This hypothesis, however, has never been supported with molecular data (Jenner 2010). Therefore, inclusion of fossils at the base of the Malacostraca may provide crucial shared character states which would link them to other groups.

Waptia is one of the most enigmatic Burgess Shale arthropods. Using regular ML it falls at the base of Multicrustacea, and BS/RWM at the base of Malacostraca. The position at the base of Malacostraca is supported mainly by eye morphology, which is fairly weak evidence (e.g. Oakley and Cunningham 2002).

Cinerocaris is depicted here as the sister taxon of Nebalia. This is consistent with the suggestion by Briggs et al (2004) that it is a stem-group leptostracan due to morphology of the trunk epipods. However, our analysis does not support a basal malacostracan affinity, as suggested by Boxshall (2007), who suggested a mosaic identity with entomostracan limbs and malacostracan tagmosis. The Devonian Nahecaris has also been regarded as a stem-group leptostracan (e.g. Bergström et al 1987). This affinity is supported by the regular ML analysis. Interestingly, the BS method places it on the stem lineage of the Malacostraca. This seems to occur due to the lack of leptostracan epipod morphology (the same characters supporting the position of Cinerocaris in our analysis). A recent phylogeny based on carapace characters (Collette and Hagadorn 2010) placed both fossils in a polytomy with extant leptostracans. From our results, it seems soft-part morphology does provide additional resolution.

HEXSISTER (LARVAE)

Interestingly, several of the ‘Orsten-type’ fossils (Bredocaris, Rehbachiella, Yicaris) cluster together basal to Branchiopoda under regular ML (name for analysis??). These fossils are unique in that they are known mainly from larval stages, with adults presumably not preserved (Boxshall 2007; for an interpretation of the adult Bredocaris as a highly neotenic meiofaunal species, see Müller and Walossek 1988). A number of limb morphology characters and presence of the neck organ seem to support this relationship, but note that the codings herein do not account for differences in morphology through ontogeny beyond presence/absence in nauplius larvae (for taxa that hatch as nauplii). Coding of characters for each larval stage (for fossils and the extant species to which they are compared) is beyond the scope of this paper, but could drastically improve the accuracy of phylogenetic placement of Orsten species.

Additionally, the position of Yicaris is sensitive to analytical parameters. Under the BS method, it falls basal to Xenocarida. In particular, it is allied to Cephalocarida by characters from the matrix of Edgecombe (2010), including trunk epipodites (Maas et al 2009), elongated fleshy protopodites, and lobate endites on the maxillule (Zhang et al 2007). These characters, however, are also shared by many branchiopods. It remains to be seen whether a fossil arthropod preserved mainly in early larval stages (and the posterior end of a later larva) can be placed among living taxa.

OVERALL

Might want to discuss performance of BS/RWM method, i.e. I don’t support any of the placements where they conflict (except Nymphatelina). I don’t really support either placement of Yicaris but that is outlined above.

 

Acknowledgments

Shigetaka Yamaguchi, Garret Tom , Tom Near, Greg Edgecombe, Tom Hegna, NSF. ID help: Tom Cronin, Robin Smith, Mark Angelos, Anna Syme.  SEM: James Weaver.  Collection help: I. Oakley, R. Lampe, A. Oakley, TA Oakley, C.Fong. Duke and BYU sequence techs. N. Shaner and S. Haddock kindly provided unpublished 454 data for the halocyprid ostracod.  German scorpion guy sent data by email

If JMW does any phylogenies: This work was supported in part by the facilities and staff of the Yale University Faculty of Arts and Sciences High Performance Computing Center.

Author contributions: THO, AKZ and BJ collected and identified ostracod specimens.  ARL and AKZ developed 454 protocols for Ostracoda with help from THO. THO performed bioinformatic analyses, with assistance from ARL, AKZ, and BJ.  JMW collected and scored morphological characters, with assistance from THO, BJ, and ARL.  All authors contributed to writing and all approved the final manuscript.

 

 

 

 

 

 References

 

 

Figure Captions

Figure 1.

Figure 2.

Figure 3.

 

 

Table 1. Existing Molecular data

Clade

Genus

62 genes (Regier et al)

Source of additional genes

Mt Genome

18S

28S

OUTGROUP

Limulus

LpoXIPHOS

EST (Roeding et al)

NC_003057

3403240

OUTGROUP

Scutigera

ScolCHILO

EST (Roeding et al)

NC_005870

8574582

HEXAPODA

Periplanata

PamNEOPT

--

none available

Eumesocampa

EfrDIPLUR

--

no 18 s

Bombyx

Full Genome

NC_002355

84310305

Acyrthosiphon[d]

Full Genome

886117

Apis

Full Genome

Drosophila

Full Genome

Tribolium

Full Genome

REMIPEDIA

Speleonectes

StuREMI

--

NC_005938

2 18 s

CEPHALOCARIDA

Hutchinsoniella

HmaCEPHAL

--

NC_005937

3851184

THECOSTRACA

Semibalanus

BbaTHECOS

--

4 thecostraca

41353770

Chthalamus

CfrTHECOS

--

none

Lepas

LeanTHECOS

--

158633989

Loxothylacus

LoxTHECOS

--

603458

MALACOSTRACA

Libinia

LemMALA

--

37 malacostraca

54402070

Armadillidium

Avu3MALA

--

7228260

Neogonodactylus

NeoMALA

--

none

Nebalia

--

none

COPEPODA

Mesocyclops

MesoCOPE

--

260534318

Acanthocyclops

A369COPE

--

54401768

Eurytemora

MesoCOPE

--

no 18s

Calanus

EST (Roeding et al)

13877160

Lemaeocera

EST (Roeding et al)

no 18 s

Caligus

EST (Roeding et al)

27657712

Lepeophtheirus

EST (Roeding et al)

NC_007215

11493983

BRANCHIOPODA

Daphnia

DmaBRANCH

Full genome

NC_000844

2317765

Limnadia

Lle2BRANCH

?

4733881

Lynceus

LynBRANCH

?

7208203 218s

Triops

TloBRANCH

?

NC_006079

7208207

Streptocephalus

ufsBRANCH

?

7208205

Artemia

Asa3BRANCH

EST (Roeding et al)

NC_001620

25251406 218s

PENTASTOMIDA

Armillifer

AarPENTA

--

NC_005934

223049411

BRANCHIURA

Argulus

Arg2BIURA

New 454

NC_005935

no18s

MYSTACOCARIDA

Derocheilocaris

DtyMYSTACO

--

??

18s not ava

OSTRACODA (M)

Actinoseta

New 454

Harbansus

HapaOST

--

2 na 1 ava

Euphilomedes

New 454

?

Skogsbergia

SkleOST

New 454

17432251

Vargula

New 454

NC_005306

no data

Halocyprida

New 454

many18s

OSTRACODA (P)

Cypridopsis

OstOST

many18s

Heterocypris

New 454

172072969

Puriana

New 454

no 18s

Darwinula

New 454

many 18s

Cytherelloidea

New 454

34556149

Table 2.  Collection information for material processed for 454 pyrosequencing.

Species

Locality

Method

Latitude

Longitude

Date(s)

Depth

Museum Number

Actinoseta jonesi

Cayo Enrique, La Parguera, Puerto Rico

net collecting

17°57.335’N

67°03.185’W

September 12th, 2010

2-3m

?

Argulus sp.

Purchased from Gulf Coast Marine Specimens

?

?

?

?

?

?

Halocyprida sp.

GET INFO FROM HADDOCK

Trawl

December 10th, 2009

Cytherelloidea californica

Camino de la Costa Beach Access, La Jolla, San Diego

algae collecting

24º46.9'N

80º54.58'W

May 14th, 2010

intertidal only on very low tide

?

Darwinula sp.

Isla Colon, Bocas del Toro, Panama

net collecting

9º21.17'N

82º15.45'W

July 29th, 2009

10cm

?

Euphilomedes morini

Stern’s Wharf Pier, Santa Barbara

Eckman grab

34º24.4'N

119º40.5'W

?

10m

?

Puriana sp.

Isla Colon, Bocas del Toro, Panama

net collecting

9º21'N

82º15.45'W

July 23rd, 2009

1m

?

Heterocypris sp.

More Mesa, Santa Barbara, CA

net collecting

34º25.23'N

119º47.29'W

?

10cm

?

Skogsbergia lerneri

Duck Key Viaduct, FL

bait trap

24º46.9'N

80º54.58'W

July 16th thru July 18th, 2009

2-3m

?

Vargula tsujii

Fishermen’s Cove, Twin Harbors, Catalina Island, CA

bait trap

33º26.66'N

118º29.34'W

July 10th and 11th, 2009

5-10m

?

 Table 3.  Description of material collected and processed for 454 pyrosequencing.  In cases where multiple tissue types were processed, data was pooled prior to assembly and phylogenetic analyses.  Abbreviations: B = Bodies, CE = Compound Eyes, ME = Median Eyes.

Species

Tissue type

# Individuals/

Tissue Type

Facility

Proportion

# cycles

Actinoseta jonesi

Bodies with carapace

10 B

BYU, SMARTer

1/7

24

Argulus sp.

Compound and median eyes

50 CE

25 ME

Duke, SMART

1/8

1/8

22

22

Cytherelloidea californica

Bodies with carapace

26 B

BYU, SMARTer

1/8

22

Darwinula sp.

Bodies

~25 B

Duke, SMARTer

1/8

24

Euphilomedes morini

Bodies, compound and median eyes

30 B

27 CE

75 ME

Duke, SMARTer

Duke, SMART

Duke, SMART

1/8

1/8

1/8

24

24

24

Puriana sp.

Bodies, median eyes

?? B

?? ME

Duke, SMARTer

1/8

1/8

24

24

Halocyprida sp.

Bodies

3 B

GET INFO FROM HADDOCK

Heterocypris sp.

Bodies, median eyes

30 B

100 ME

Duke, SMART

1/8

1/8

22

22

Skogsbergia lerneri

Compound and median eyes

>50 CE

>50 ME

Duke, SMARTer

1/8

1/8

21

21

Vargula tsujii

Bodies

33 B

Duke, SMARTer

1/8

22

Table - Ages of fossils scored for morphological data matrix

Genus

Known Localities

Period

Stage (Type Locality)

Age (mya) (Type Locality)

References

Branchiocaris

Burgess Shale, Chengjiang, Wheeler Shale

Cambrian

Stage 5/Drumian

505

Briggs 1976; Budd 2002; 2008

Bredocaris

Orsten

Cambrian

Paibian

501

Müller 1983; Müller and Walossek 1988

Canadaspis

Burgess Shale, Chengjiang, Wheeler Shale

Cambrian

Stage 5/Drumian

505

Briggs 1978; Budd 2002; 2008; Orlov 1960

Cinerocaris

Herefordshire

Silurian

Wenlock

425

Briggs et al. 2004

Colymbosathon

Herefordshire

Silurian

Wenlock

425

Siveter et al. 2003a

Klausmuelleria

Comley

Cambrian

Toyonian

514-511

Siveter et al. 2001; 2003b

Kunmingella

Chengjiang

Cambrian

Atdabanian

525

Hou et al. 1996; 2010

Kunyangella

Chengjiang

Cambrian

Atdabanian

525

Huo 1965; Hou et al. 2010

Lepidocaris

Rhynie Chert, Windyfield Chert

Devonian

Pragian

410-396

Anderson and Trewin 2003; Scourfield 1926; 1940

Martinssonia

Orsten

Cambrian

Paibian

501

Haug et al. 2010; Müller and Walossek 1986

Nahecaris

Hunsrück Slate

Devonian

Emsian

392-388

Bergström et al. 1987

Nasunaris

Herefordshire

Silurian

Wenlock

425

Siveter et al. 2010

Nymphatelina

Herefordshire

Silurian

Wenlock

425

Siveter et al. 2007

Odaraia

Burgess Shale

Cambrian

Stage 5/Drumian

505

Briggs 1981; Budd 2002; 2008; Walcott 1912

Pattersoncypris

Santana

Cretaceous

Aptian/Albian

108-92

Bate 1971; 1972; 1973; Smith 2000

Rehbachiella

Orsten

Cambrian

Paibian

501

Müller 1983; Walossek 1993; 1995

Skara

Hel Peninsula, Bitiao Formation, Orsten

Cambrian

Paibian

501

Dong et al. 2005; Liu and Dong 2007; Müller 1983; Walossek and Szaniawski 1991

Triadocypris

Spitzbergen

Triassic

Lower Triassic

251-245

Weitschat 1983a; 1983b

Vestrogothia

Bitiao Formation, Orsten

Cambrian

Paibian

501

Müller 1964; Zhang and Dong 2009

Waptia

Burgess Shale, Wheeler Shale

Cambrian

Stage 5/Drumian

505

Strausfeld in press; Walcott 1912

Yicaris

Yunnan

Cambrian

Atdabanian

525-520

Zhang et al. 2007

 

Table - Ages of oldest crown-group fossils

Class

Earliest Probable Crown-Group Genus

Locality

Period

Stage

Age (mya)

References

Branchiopoda

Lepidocaris

Rhynie Chert

Devonian

Pragian

410-396

Anderson and Trewin 2003; Scourfield 1926; 1940

Branchiura

No known fossils

Cephalocarida

No known fossils

Copepoda

Unnamed

Al Khlata

Carboniferous

Westphalian/

Stephanian

303

Selden et al. 2010

Hexapoda

Rhyniella (Collembola), Rhyniognatha (Insecta)

Rhynie Chert

Devonian

Pragian

410-396

Engel and Grimaldi 2004; Hirst and Maulik 1926; Whalley and Jarzembowski 1981

Malacostraca

Cinerocaris

Herefordshire

Silurian

Wenlock

425

Briggs et al. 2004

Mystacocarida

No known fossils

Ostracoda

Nanopsis

Lashkarak Formation

Ordovician

Tremadoc

485

Williams et al. 2008; Ghobadi Pour et al. 2011

Pentastomida

No known crown-group fossils

Waloszek et al. 2007

Remipedia

No known fossils*

Thecostraca

Rhamphoverritor**

Herefordshire

Silurian

Wenlock

425

Briggs et al. 2005

* Tesnusocaris (Brooks 1955) has been allied with Remipedia in e.g. (Emerson and Schram 1991; Koenemann et al. 2007; Yager 1981), but this fossil is demonstrably not a remipede (two specimens are polychaetes and the other is an arthropod of unknown affinity; Neiber et al 2011).

** There is a possible stalked barnacle from the Fezouata biota (Ordovician, Tremadoc, 488-479 mya), but it has not yet been described. See Figures 2c and S3h in Van Roy et al. (2010). This could be a stem-group member of Cirripedia (or Pedunculata), and would fall in the Thecostraca crown-group, extending their range by 50+ my.

DELETED TEXT:

(May not focus on the sister to hex question).

First, there is little consensus about the crustacean sister-group to the Hexapoda. Contenders include Malacostraca, Branchiopoda, Remipedia, and Cephalocarida.  Complete mitochondrial genome analyses support a monophyletic Malacostraca plus Branchiopoda clade as the sister group to Hexapoda (Cook et al. 2005). Some other molecular analyses indicate Branchiopoda alone as the sister group to Hexapoda (Regier et al. 2005; Mallatt & Giribet 2006).  The analysis of 62 nuclear genes placed Remipedia plus Cephalocarida (termed Xenocarida) as the sister group of hexapods {Regier, 2010}, a result suggested to be influenced by long branch attraction in a smaller subset of genes {Regier, 2007?}. [could add morphological ideas here].

[a]although he used probably insufficient OG: phosphatocopines and cephalocarid —jo.wolfe

[b]are we going to mention that, I assume, part of the motivation is to test Tinn & Oakley with better OG and explicit fossil sampling? —jo.wolfe

[c]our estimate or 425 —jo.wolfe

[d]exclude from the table if we haven’t put it in the analysis? —jo.wolfe