A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Description | Author | ESDA Types: P, R, A | ESDA Goals | ||||||||||||||||||||||
2 | Preparation, Reduction, Analysis | #1 to #10 | ||||||||||||||||||||||||
3 | TOOLS | See: https://docs.google.com/spreadsheets/d/11fobj9rkwOBtYwqSg5ne2LgouPteDCW15PC42wxmzEU/edit#gid=0 for descriptions of Types and Goals | ||||||||||||||||||||||||
4 | R | R is a programming language and software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. (Wikipedia) | Steve | P,R,A | 1,2,4,5,8 | What languages are used for; what peple are doing with them Google 'tool' and "analytics" Many tools used together; Tools have useful capabilities for specific tasks Facilitate Data Analytics: To determine what is available to facilitate data analytics Categorize by discipline | ||||||||||||||||||||
5 | SAS | SAS (Statistical Analysis System) is a software suite developed by SAS Institute for advanced analytics, multivariate analyses, business intelligence, data management, and predictive analytics. SAS was developed at North Carolina State University from 1966 until 1976, when SAS Institute was incorporated. (Wikipedia) | Steve | P,R,A | 1,2,4,5,8 | |||||||||||||||||||||
6 | Python | Python is a widely used general-purpose, high-level programming language. Its design philosophy emphasizes code readability, and its syntax allows programmers to express concepts in fewer lines of code than would be possible in languages such as C++ or Java. The language provides constructs intended to enable clear programs on both a small and large scale. (Wikipedia) | Steve | P,R,A | Use of programming language to reach all goals? | |||||||||||||||||||||
7 | Java | Java is a general-purpose computer programming language that is concurrent, class-based, object-oriented,[13] and specifically designed to have as few implementation dependencies as possible. | Steve | P,R,A | Use of programming language to reach all goals? | |||||||||||||||||||||
8 | C++ | C++ is a general-purpose programming language. It has imperative, object-oriented and generic programming features, while also providing facilities for low-level memory manipulation. | Steve | P,R,A | Use of programming language to reach all goals? | |||||||||||||||||||||
9 | SPSS | SPSS Statistics is a software package used for statistical analysis. | Beth | A (There is also an available module that does Data Preparation) | 5, 6, 7, and 9 | |||||||||||||||||||||
10 | MATLAB | MATLAB is a multi-paradigm numerical computing environment and fourth-generation programming language. A proprietary programming language developed by MathWorks, MATLAB allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages, including C, C++, Java, Fortran and Python. | Beth | P, R, A | 1, 2, 4, 5, 8 | It might be useful to differentiate between COTS tools and programming languages/development enviroments. I think something like Matlab can be used to develop programs that achieve the ESDA goals mentioned, but there may be a lot of programming involved in getting there. Some other tools, on the other hand, are already programmed to perform certain functions (even if they still need some customization.) Also, my impression is that some of the tools are better at handling very large data sets than others, and this may affect their ability to achieve ESDA goals, even if they have a desired capability. That is, they may be capable of performing some sort of analytical task on data, but unable to do it (efficiently) on very large data sets. I wonder if that sort of information should be captured somehow. | ||||||||||||||||||||
11 | Minitab | Minitab is a statistics package developed at the Pennsylvania State University | Beth | A | 6 and 7 | |||||||||||||||||||||
12 | CPLEX | IBM ILOG CPLEX Optimization Studio (often informally referred to simply as CPLEX) is an optimization software package. | Beth | P, R, A | ||||||||||||||||||||||
13 | GAMS | GAMS is a high-level modeling system for mathematical optimization. | Barb | |||||||||||||||||||||||
14 | Tableau | A tool that enables data visualization using a drag and drop interface. | Barb | |||||||||||||||||||||||
15 | Spotfire | A tool that enables data mining and visualization of very large data sets. Similar to Excel but apparently easier to use for large data sets. | Barb | |||||||||||||||||||||||
16 | VBA | (Visual Basic for Applications) An implementation of Visual Basic that enables user defined functions and interaction with Windows API and libraries. | Barb | |||||||||||||||||||||||
17 | Excel | A spreadsheet program created by Microsoft that enables data analysis and visualization. It includes VBA. | Barb | |||||||||||||||||||||||
18 | MySQL | MySQL is an open-source relational database management system (RDBMS); | Steve | A | 5,8 | |||||||||||||||||||||
19 | Javascript | A high level interpreted language used by most websites and browsers. | Steve | N/A? | ||||||||||||||||||||||
20 | Perl | A high level interpreted scripting language frequently used on UNIX computers. It is frequently used to wrap other programs together. | Steve | P, A | Could be used for any goals? | |||||||||||||||||||||
21 | PHP | A scripting language designed for web development. It can be used to create CGI (Common Gateway Interface) executable for web pages. | Steve | N/A? | ||||||||||||||||||||||
22 | Open Source Databases | http://webresourcesdepot.com/25-alternative-open-source-databases-engines/ | Steve | A | 5,8 | |||||||||||||||||||||
23 | Parallel NetCDF | Parallel NetCDF is a library providing high-performance parallel I/O while still maintaining file-format compatibility with Unidata's NetCDF, specifically the formats of CDF-1 and CDF-2. | Brian | |||||||||||||||||||||||
24 | AWS | Amazon Web Services (AWS), is a collection of cloud computing services that make up the on-demand computing platform offered by Amazon.com | Brian | |||||||||||||||||||||||
25 | Hadoop | Apache Hadoop is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware | Brian | |||||||||||||||||||||||
26 | GIS | A geographic information system is a system designed to capture, store, manipulate, analyze, manage, and present all types of spatial or geographical data. | Brian | |||||||||||||||||||||||
27 | ROI-PAC | ROI-PAC is a software package created by the Jet Propulsion Laboratory division of NASA and CalTech for processing SAR images to create InSAR images, named interferograms | Chung-Lin | P, A | 4, 6, (2, 3, 5) | |||||||||||||||||||||
28 | GDAL | Geospatial Data Abstraction Library (GDAL) is a computer software library for reading and writing raster and vector geospatial data formats, and is released under the permissive X/MIT style free software license by the Open Source Geospatial Foundation. | Chung-Lin | P, A | 4, (2, 3, 5, 6) | |||||||||||||||||||||
29 | TECHNIQUES | |||||||||||||||||||||||||
30 | Machine Learning | Machine learning is a subfield of computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence. Machine learning explores the study and construction of algorithms that can learn from and make predictions on data. | Chung-Lin | P, A, R | 4, 6, 7, 8, (9) | |||||||||||||||||||||
31 | Data Mining | Data mining, an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets (“big data”) involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. | Bob | |||||||||||||||||||||||
32 | Natural Language Processing | Natural language processing (NLP) is a field of computer science, artificial intelligence, and computational linguistics concerned with the interactions between computers and human (natural) languages. As such, NLP is related to the area of human–computer interaction. Many challenges in NLP involve natural language understanding, that is, enabling computers to derive meaning from human or natural language input, and others involve natural language generation.N | Bob | |||||||||||||||||||||||
33 | Linear/Non-linear Regression | In statistics, linear regression is an approach for modeling the relationship between a scalar dependent variable Y (e.g., a sounding temperature) and one or more explanatory variables (or independent variables) denoted X, (or X1, X2...) (e.g., the satellite retrieved temperature(s)). The case of one explanatory variable is called simple linear regression. In statistics, nonlinear regression is a form of regression analysis in which observational data (e.g., Y) are modeled by a function which is a nonlinear combination of the model parameters (e.g., aX + bX2 +….) and depends on one or more independent variables (e.g., X or X1, X2,….). The data are fitted by a method of successive approximations. | Bob | |||||||||||||||||||||||
34 | Time Series Models | Time Series Models are used to represent trends, often graphically, by applying temporal measurements within a sequence. | Bob | |||||||||||||||||||||||
35 | Clustering | Clustering is an approach to organize objects into a classification and can be accomplished utilizing various methods, including statical techniques. | Bob | |||||||||||||||||||||||
36 | Decision Tree | A Decision Tree is a graphical representation of the sequence of decisions to be completed when answering a particular question. | Tiffany | |||||||||||||||||||||||
37 | Factor Analysis | Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. | Tiffany | |||||||||||||||||||||||
38 | Principal Component Analysis | Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components | Tiffany | |||||||||||||||||||||||
39 | Neural Networks | Tiffany | ||||||||||||||||||||||||
40 | Bayesian Techniques | Bayesian analysis, a method of statistical inference (named for English mathematician Thomas Bayes) that allows one to combine prior information about a population parameter with evidence from information contained in a sample to guide the statistical inference process. | Tiffany | |||||||||||||||||||||||
41 | Text Analytics | Text analytics is the process of analyzing unstructured text, extracting relevant information, and transforming it into useful intelligence | Ethan | |||||||||||||||||||||||
42 | Graph Analytics | Graph analytics leverage graph structures to understand, codify, and visualize relationships that exist in a network. | Ethan | |||||||||||||||||||||||
43 | Visual Analytics | Visual analytics is a form of inquiry in which data that provides insight into solving a problem is displayed in an interactive, graphical manner. | Ethan | |||||||||||||||||||||||
44 | Map Reduce | MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster. | Ethan | |||||||||||||||||||||||
45 | INTEGRATED PRODUCTS | |||||||||||||||||||||||||
46 | EarthServer (http://www.earthserver.eu) | EarthServer has established open ad-hoc analytics on massive Earth Science data | Bar | P, R, A | 1, 5, 6, 7, 9 | |||||||||||||||||||||
47 | NASA Earth Exchange (https://nex.nasa.gov/nex/) | NEX is a platform for scientific collaboration, knowledge sharing and research for the Earth science community. | Bar | A | 5, 6, 7, 8, 9, | |||||||||||||||||||||
48 | EDEN (http://cda.ornl.gov/projects/eden/#) | EDEN is a visual analytics tool for exploring multivariate data sets. | Bar | A | 1, 5, 6, 7, 8, 9 | |||||||||||||||||||||
49 | EARTHDATA (https://earthdata.nasa.gov) | Earthdata.nasa.gov offers access to community-based resources that enable the analysis of Earth science data, including web-accessible tools and services for data discovery, data products and services that pertain to disciplinary topics, and links to the Distributed Active Archive Centers (DAACs) and to relevant community initiatives and organizations. | Bar | A | 4 | |||||||||||||||||||||
50 | Giovanni (http://giovanni.gsfc.nasa.gov/giovanni/) | Quick data visualization, exploration, and analysis tool, with data download capability | Steve | P, R, A | 2, 4, 5, 6, 9 | |||||||||||||||||||||
51 | ||||||||||||||||||||||||||
52 | ||||||||||||||||||||||||||
53 | ||||||||||||||||||||||||||
54 | ||||||||||||||||||||||||||
55 | ||||||||||||||||||||||||||
56 | ||||||||||||||||||||||||||
57 | ||||||||||||||||||||||||||
58 | “The Field Guide to DATA SCIENCE” Booz/Allen/Hamilton, 2015 | |||||||||||||||||||||||||
59 | ||||||||||||||||||||||||||
60 | Data Science: | |||||||||||||||||||||||||
61 | ||||||||||||||||||||||||||
62 | - Describe | |||||||||||||||||||||||||
63 | - Processing | |||||||||||||||||||||||||
64 | - Enrichment | |||||||||||||||||||||||||
65 | ||||||||||||||||||||||||||
66 | - Discover | |||||||||||||||||||||||||
67 | - Regression | |||||||||||||||||||||||||
68 | - Clustering | |||||||||||||||||||||||||
69 | - Hypothesis Testing | |||||||||||||||||||||||||
70 | ||||||||||||||||||||||||||
71 | - Predict | |||||||||||||||||||||||||
72 | - Regression | |||||||||||||||||||||||||
73 | - Recommendation | |||||||||||||||||||||||||
74 | ||||||||||||||||||||||||||
75 | - Advise | |||||||||||||||||||||||||
76 | - Local reasoning | |||||||||||||||||||||||||
77 | - Optimization | |||||||||||||||||||||||||
78 | - Simulation | |||||||||||||||||||||||||
79 | ||||||||||||||||||||||||||
80 | ||||||||||||||||||||||||||
81 | In Atmospheric Research (study of gases): | |||||||||||||||||||||||||
82 | - Regression Analysis; Bivariant Regression | |||||||||||||||||||||||||
83 | - Correlation Analysis; Bias Correlation | |||||||||||||||||||||||||
84 | - Decision Tree | |||||||||||||||||||||||||
85 | - Machine Learning | |||||||||||||||||||||||||
86 | - Data Mining | |||||||||||||||||||||||||
87 | - Data Fusion | |||||||||||||||||||||||||
88 | - Computational Tools | |||||||||||||||||||||||||
89 | - Constrained Variational Analysis | |||||||||||||||||||||||||
90 | - Model Simulations | |||||||||||||||||||||||||
91 | - Ratios | |||||||||||||||||||||||||
92 | - Time Series Analysis | |||||||||||||||||||||||||
93 | - Spectral Analysis | |||||||||||||||||||||||||
94 | - Temporal Trending; Trend Analysis | |||||||||||||||||||||||||
95 | - Revised Averaging Scheme | |||||||||||||||||||||||||
96 | - Forward Modeling; Inverse Modeling | |||||||||||||||||||||||||
97 | - Radiative Transfer Model | |||||||||||||||||||||||||
98 | - Baysian Synthesis Inversion | |||||||||||||||||||||||||
99 | - Gaussian Distribution | |||||||||||||||||||||||||
100 | - Exponential Differentiation |