ABCDEFGHIJKLMNOPQRSTUVWXYZ
1
DescriptionAuthor
ESDA Types: P, R, A
ESDA Goals
2
Preparation, Reduction, Analysis#1 to #10
3
TOOLS
See: https://docs.google.com/spreadsheets/d/11fobj9rkwOBtYwqSg5ne2LgouPteDCW15PC42wxmzEU/edit#gid=0

for descriptions of Types and Goals
4
RR is a programming language and software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. (Wikipedia)SteveP,R,A1,2,4,5,8What languages are used for; what peple are doing with them
Google 'tool' and "analytics"
Many tools used together; Tools have useful capabilities for specific tasks
Facilitate Data Analytics: To determine what is available to facilitate data analytics
Categorize by discipline
5
SASSAS (Statistical Analysis System) is a software suite developed by SAS Institute for advanced analytics, multivariate analyses, business intelligence, data management, and predictive analytics. SAS was developed at North Carolina State University from 1966 until 1976, when SAS Institute was incorporated. (Wikipedia)SteveP,R,A1,2,4,5,8
6
PythonPython is a widely used general-purpose, high-level programming language. Its design philosophy emphasizes code readability, and its syntax allows programmers to express concepts in fewer lines of code than would be possible in languages such as C++ or Java. The language provides constructs intended to enable clear programs on both a small and large scale. (Wikipedia)SteveP,R,AUse of programming language to reach all goals?
7
JavaJava is a general-purpose computer programming language that is concurrent, class-based, object-oriented,[13] and specifically designed to have as few implementation dependencies as possible.SteveP,R,AUse of programming language to reach all goals?
8
C++C++ is a general-purpose programming language. It has imperative, object-oriented and generic programming features, while also providing facilities for low-level memory manipulation.SteveP,R,AUse of programming language to reach all goals?
9
SPSSSPSS Statistics is a software package used for statistical analysis. Beth
A (There is also an available module that does Data Preparation)
5, 6, 7, and 9
10
MATLABMATLAB is a multi-paradigm numerical computing environment and fourth-generation programming language. A proprietary programming language developed by MathWorks, MATLAB allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages, including C, C++, Java, Fortran and Python.BethP, R, A1, 2, 4, 5, 8It might be useful to differentiate between COTS tools and programming languages/development enviroments. I think something like Matlab can be used to develop programs that achieve the ESDA goals mentioned, but there may be a lot of programming involved in getting there. Some other tools, on the other hand, are already programmed to perform certain functions (even if they still need some customization.) Also, my impression is that some of the tools are better at handling very large data sets than others, and this may affect their ability to achieve ESDA goals, even if they have a desired capability. That is, they may be capable of performing some sort of analytical task on data, but unable to do it (efficiently) on very large data sets. I wonder if that sort of information should be captured somehow.
11
MinitabMinitab is a statistics package developed at the Pennsylvania State UniversityBethA6 and 7
12
CPLEXIBM ILOG CPLEX Optimization Studio (often informally referred to simply as CPLEX) is an optimization software package.BethP, R, A
13
GAMSGAMS is a high-level modeling system for mathematical optimization. Barb
14
TableauA tool that enables data visualization using a drag and drop interface.Barb
15
SpotfireA tool that enables data mining and visualization of very large data sets. Similar to Excel but apparently easier to use for large data sets.Barb
16
VBA(Visual Basic for Applications) An implementation of Visual Basic that enables user defined functions and interaction with Windows API and libraries.Barb
17
ExcelA spreadsheet program created by Microsoft that enables data analysis and visualization. It includes VBA.Barb
18
MySQLMySQL is an open-source relational database management system (RDBMS);SteveA5,8
19
JavascriptA high level interpreted language used by most websites and browsers.SteveN/A?
20
PerlA high level interpreted scripting language frequently used on UNIX computers. It is frequently used to wrap other programs together.SteveP, ACould be used for any goals?
21
PHPA scripting language designed for web development. It can be used to create CGI (Common Gateway Interface) executable for web pages.SteveN/A?
22
Open Source Databaseshttp://webresourcesdepot.com/25-alternative-open-source-databases-engines/SteveA5,8
23
Parallel NetCDFParallel NetCDF is a library providing high-performance parallel I/O while still maintaining file-format compatibility with Unidata's NetCDF, specifically the formats of CDF-1 and CDF-2.Brian
24
AWSAmazon Web Services (AWS), is a collection of cloud computing services that make up the on-demand computing platform offered by Amazon.comBrian
25
HadoopApache Hadoop is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardwareBrian
26
GISA geographic information system is a system designed to capture, store, manipulate, analyze, manage, and present all types of spatial or geographical data.Brian
27
ROI-PACROI-PAC is a software package created by the Jet Propulsion Laboratory division of NASA and CalTech for processing SAR images to create InSAR images, named interferogramsChung-LinP, A4, 6, (2, 3, 5)
28
GDALGeospatial Data Abstraction Library (GDAL) is a computer software library for reading and writing raster and vector geospatial data formats, and is released under the permissive X/MIT style free software license by the Open Source Geospatial Foundation.Chung-LinP, A4, (2, 3, 5, 6)
29
TECHNIQUES
30
Machine LearningMachine learning is a subfield of computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence. Machine learning explores the study and construction of algorithms that can learn from and make predictions on data.Chung-LinP, A, R4, 6, 7, 8, (9)
31
Data MiningData mining, an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets (“big data”) involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.Bob
32
Natural Language ProcessingNatural language processing (NLP) is a field of computer science, artificial intelligence, and computational linguistics concerned with the interactions between computers and human (natural) languages. As such, NLP is related to the area of human–computer interaction. Many challenges in NLP involve natural language understanding, that is, enabling computers to derive meaning from human or natural language input, and others involve natural language generation.NBob
33
Linear/Non-linear RegressionIn statistics, linear regression is an approach for modeling the relationship between a scalar dependent variable Y (e.g., a sounding temperature) and one or more explanatory variables (or independent variables) denoted X, (or X1, X2...) (e.g., the satellite retrieved temperature(s)). The case of one explanatory variable is called simple linear regression.
In statistics, nonlinear regression is a form of regression analysis in which observational data (e.g., Y) are modeled by a function which is a nonlinear combination of the model parameters (e.g., aX + bX2 +….) and depends on one or more independent variables (e.g., X or X1, X2,….). The data are fitted by a method of successive approximations.
Bob
34
Time Series ModelsTime Series Models are used to represent trends, often graphically, by applying temporal measurements within a sequence. Bob
35
ClusteringClustering is an approach to organize objects into a classification and can be accomplished utilizing various methods, including statical techniques. Bob
36
Decision TreeA Decision Tree is a graphical representation of the sequence of decisions to be completed when answering a particular question.Tiffany
37
Factor AnalysisFactor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors.Tiffany
38
Principal Component AnalysisPrincipal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal componentsTiffany
39
Neural NetworksTiffany
40
Bayesian TechniquesBayesian analysis, a method of statistical inference (named for English mathematician Thomas Bayes) that allows one to combine prior information about a population parameter with evidence from information contained in a sample to guide the statistical inference process. Tiffany
41
Text AnalyticsText analytics is the process of analyzing unstructured text, extracting relevant information, and transforming it into useful intelligenceEthan
42
Graph AnalyticsGraph analytics leverage graph structures to understand, codify, and visualize relationships that exist in a network.Ethan
43
Visual AnalyticsVisual analytics is a form of inquiry in which data that provides insight into solving a problem is displayed in an interactive, graphical manner.Ethan
44
Map ReduceMapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster.Ethan
45
INTEGRATED PRODUCTS
46
EarthServer (http://www.earthserver.eu)EarthServer has established open ad-hoc analytics on massive Earth Science dataBarP, R, A1, 5, 6, 7, 9
47
NASA Earth Exchange (https://nex.nasa.gov/nex/)NEX is a platform for scientific collaboration, knowledge sharing and research for the Earth science community.BarA5, 6, 7, 8, 9,
48
EDEN (http://cda.ornl.gov/projects/eden/#)EDEN is a visual analytics tool for exploring multivariate data sets.BarA1, 5, 6, 7, 8, 9
49
EARTHDATA (https://earthdata.nasa.gov)Earthdata.nasa.gov offers access to community-based resources that enable the analysis of Earth science data, including web-accessible tools and services for data discovery, data products and services that pertain to disciplinary topics, and links to the Distributed Active Archive Centers (DAACs) and to relevant community initiatives and organizations.BarA4
50
Giovanni (http://giovanni.gsfc.nasa.gov/giovanni/)Quick data visualization, exploration, and analysis tool, with data download capabilitySteveP, R, A2, 4, 5, 6, 9
51
52
53
54
55
56
57
58
“The Field Guide to DATA SCIENCE” Booz/Allen/Hamilton, 2015
59
60
Data Science:
61
62
- Describe
63
- Processing
64
- Enrichment
65
66
- Discover
67
- Regression
68
- Clustering
69
- Hypothesis Testing
70
71
- Predict
72
- Regression
73
- Recommendation
74
75
- Advise
76
- Local reasoning
77
- Optimization
78
- Simulation
79
80
81
In Atmospheric Research (study of gases):
82
- Regression Analysis; Bivariant Regression
83
- Correlation Analysis; Bias Correlation
84
- Decision Tree
85
- Machine Learning
86
- Data Mining
87
- Data Fusion
88
- Computational Tools
89
- Constrained Variational Analysis
90
- Model Simulations
91
- Ratios
92
- Time Series Analysis
93
- Spectral Analysis
94
- Temporal Trending; Trend Analysis
95
- Revised Averaging Scheme
96
- Forward Modeling; Inverse Modeling
97
- Radiative Transfer Model
98
- Baysian Synthesis Inversion
99
- Gaussian Distribution
100
- Exponential Differentiation