ABCDEFGHIJKLMNOPQRSTUVW
1
NameLinkResource or datasetcategories and labelsNotes
2
Open Officehttps://www.openoffice.org/Data analysisOpen source and free version of the Microsoft Office suite including a spreadshet tool. Often handles CSV files better than Excel.
3
ExcelData analysisThe indsutry standard spreadsheet programme and powerful analytical tool
4
Google Sheetshttps://drive.google.com/Data analysisCollaborative and onlne version of Microsoft Excel has some addtional tools for extracting data from other web sources
5
Open Refinehttp://openrefine.org/Data cleaningData cleaning power tool for your spreadsheets. Very good for getting naming consistency and doing routine data cleaning tasks such as stripping out white space and changing capitalisation. Can also be used to extract data from web applications, for instance the OpenCorporates API
6
MetaCleanhttp://www.adarsus.com/en/metaclean.htmlData cleaningManage and scrub metadata on a range of documents including PDF and MS Office files. Can also be used to look at the metadata from a range of files.
7
Abby Fine Readerhttps://www.abbyy.com/finereader/Data extractionVery good OCR tool for extracting searchable, machine-readable data from scans. There is also a simiplified online version that charges you per sheet.
8
PDF Tableshttps://pdftables.com/Data extractionExporting PDFs into Excel or Google spreadsheets. Only appropriate for relatively "clean" PDFs. Will not recognise hand writing or poor quality scans.
9
Morph.iohttps://morph.io/Data extractionOpen soure platform for hosting web scrapers that extract data from websites
10
Import.iohttps://www.import.io/Data extractionPlatform that lets you scrape data from websites using an intuitive graphical user interface
11
wgethttps://www.gnu.org/software/wget/Data extractionCommand line tool for downloading whole contents of websites
12
HttTrackhttps://www.httrack.com/Data extractionDesktop tool for downloading the whole contents of websites to a local drive. Good for tracking and monitoring web content.
13
Google reverse image searchhttps://images.google.com/Data extractionSearching for the origin of images or where else it occurs on the web by uploading it to Google
14
Exif viewerhttp://araskin.webs.com/exif/exif.htmlData extractionMetdata extraction from photos
15
Google Chrome Scraper
https://chrome.google.com/webstore/detail/scraper/mbigbapnjcgaffohmbkdlecaccepngjd?hl=en
Data extractionGoogle Chrome plug-in for extracting tables on websites into spreadsheets for analysis.
16
SQLitehttps://www.sqlite.org/Data storageLightweight relational database. Requires knowledge of SQL.
17
VIShttps://vis.occrp.org/Data visualizationOnline network visualization tool. Still in BETA but already used by a number of GW staff to map out connections between people, assets and legal entities. Best for use with very hetrogeneous and smaller datasets. If you have large amounts of structured networked data, Linkurious is a better option e.g. a scraped company registry.
18
Datawrapperhttp://datawrapper.de/Data visualizationEasy-to-use tool for creating simple, interactive charts. We have a subscription that enables us to create branded charts with our colour scheme and logo. Email Data Lead for a log-in.
19
Tableauhttp://www.tableau.com/Data visualizationPowerful data visualisation tool for creating maps and charts for online and print. Email Data Lead for an account.
20
Gephihttps://gephi.org/Data visualizationNetwork visualization and analysis tool, good for working with large volumes of highy strcutured data
21
CartoDBhttps://cartodb.com/Data visualizationPowerful online mapping tool for producing a range of different maps. Particualrly good for working with point data where you have precise coordiantes for events. GW has a subscription that enables to create private maps. Can also be used for geocoding addresses.
22
Document Cloudhttps://www.documentcloud.org/Document analysisPlatform for storing, publishing and analysic documents and other text data. Has built in OCR for scans and also attempts to extract named entities (e..g places, companies, people) from dociments.
23
Linkurioushttps://linkurious.globalwitness.orgData visualizationTool for visualising large sets of networked data. Used in Companies We Keep Report and Narco-a-lago. Requires support from technician to import databases into but can be a powerful way to explore.
24
Alephhttps://github.com/alephdata/alephData storageOCCRP developed tool for making large datasets and document sets searchable. See data.occrp.og. Global Witness in process of implementing one at at data.globalwitness.org. Data import requires support of Data Team.
25
QGishttps://www.qgis.org/en/site/GeospatialFree tool
26
Jupyter Notebookshttps://jupyter.org/softwareData analysisTool for making code for data processing and analysis transparent. Normally used with the Python programming language. Example: https://github.com/Global-Witness/the-companies-we-keep/blob/master/companies_we_keep_analysis.ipynb
27
Pythonhttps://www.python.org/softwareAllHighly flexible programming language for extracting, transforming and analysing data. Can suit a range of tasks where off the shelf solutions like the tools here can't fix. The Data Team can talk to you about what it can do. Particularly useful for web scraping.
28
Neo4Jhttps://neo4j.com/softwareData storageDatabase that powers Linkurious. Used to store Panama Papers and Paradise Papers public data. Useful for linking and storing large sets of networked data and for pattern querying. For an exmaple of it in use see Companes We Keep report and idnetification of circular ownership patterns: https://www.globalwitness.org/it/blog/how-we-mined-worlds-first-open-data-register-company-control/. Talk to the Data Team if you think it may be useful.
29
CronosData storageSoftware for querying certain types of Russian database. Global Witness has a virtual machine with a license for this software. Talk to the Data Team to get access and if you think you have or will be receiving data in Cronos format.
30
Tabulahttps://tabula.technology/Data extractionTool for extracting tables from PDFs. Does not in genreal work on scanned documents.
31
WorldBank microdata
https://microdata.worldbank.org/index.php/home
datasetworld socioeconomic data
32
Central Bank of Armenia Databank
https://www.cba.am/en/SitePages/statdatabank1.aspx
datasettime series of aggregated statistical data (in Armenia)
33
IPUMS international data
https://international.ipums.org/international-action/variables/group
datasetcencus data, social science and health
34
ՀՀ վիճակագրական կոմիտե (Statistical Committee of RA)https://www.armstat.am/am/datasetmany sectors: agriculture, finances, labour market, etc.
35
DHS Statcompilerhttps://www.statcompiler.com/en/datasetdemographic and health indicators
36
DHS Program
https://dhsprogram.com/publications/Journal-Articles-Search.cfm?C_id=71&cn=Armenia&page=1
datasetjournal search by population indicator
37
Intro to Econometrics with Rhttps://www.econometrics-with-r.org/resourcedigital textbook, economics, programming
38
Principles of Econometrics with Rhttps://bookdown.org/ccolonescu/RPoE4/resourcedigitla textbook, economics, programming
39
Data Science: Theories, Models, Algorithms, and Analytics
https://srdas.github.io/MLBook/resourcedigital textbook, data science
40
R for Data Sciencehttps://r4ds.had.co.nz/resourcedigital textbook, programming, data science
41
Python Data Science Handbook
https://jakevdp.github.io/PythonDataScienceHandbook/
resourcehandbook, programming, data science,
42
edXhttps://www.edx.org/resourceonline courses
43
USAID Foreign Aid
https://explorer.usaid.gov/cd/ARM?implementing_agency_id=1
datasetUS Foreign Aid to Armenia
44
Foreign Aid & Assistnace to Armenia
https://foreignassistance.gov/explore/country/Armenia
dataset
Comprehensive Datasets for Foreign Assistance & Aid to Armenia
45
USAID Development Data Libraryhttps://data.usaid.gov/datasetData on a Wide Range of Indicators Worldwide
46
USAID Development Cooperation Landscapehttps://explorer.usaid.gov/donor/armeniadatasetCountry-level Development Cooperation Data
47
IDEA (International Data & Economic Analysis)https://idea.usaid.gov/querydatasetUSAID's comprehensive source of economic and social data
48
IPUMS international datahttps://ipums.orgdatasetIPUMS international has data on Armenian economy
49
Kagglehttps://www.kaggle.comdataset
50
Data Science: Theories, Models, Algorithms, and Analytics
https://srdas.github.io/MLBook/resourcebookReally good applied ML Book
51
Data 8 at UC Berkeleyhttp://data8.orgcourse
52
CS 188 UC Berkeleyhttps://inst.eecs.berkeley.edu/~cs188/sp20/courseBerkeley's Introductin to AI
53
Data 100 at UC Berkeleyhttp://www.ds100.orgcourse
54
Zil (powered by ISTC)https://zil.am/courses/courseonline course: AI, softwared, business, etc.
55
Introduction To Statistical Learning
http://faculty.marshall.usc.edu/gareth-james/ISL/
bookData Analysis
56
quanteconhttps://quantecon.org/CourseData AnalysisBig Community of Economics teaching and applications in Julia and Python
57
Weapons of Math Destructionhttps://weaponsofmathdestructionbook.combookMath for Data Science
If you want to be able to trust your AI outputs, then you need to read this book. It explains some of the different avenues by which bias can infiltrate your data and algorithms and what you can do about it.
58
Cancer Today: World Health Organizationhttps://gco.iarc.fr/today/homeData Visualization Data VisualizationData Visualization for global burden of cancer
59
Armenian Named Entity Recognitionhttps://arxiv.org/abs/1810.08699NLP163000-token named entity corpus automatically generated and annotated from Wikipedia, and another 53400-token corpus of news sentences with manual annotation of people, organization and location named entities
60
Quantopiahttps://www.quantopian.com/lecturescourseData Analysis / Algo TraadingLectures and Tutorials on algorithmic trading
61
DataSet Search by Googlehttps://datasetsearch.research.google.comdataset
62
Google Colabs
https://colab.research.google.com/notebooks/welcome.ipynb
resourceonline codinga platform for online coding collaborations
63
Datacampdatacamp.comcourseData AnalysisGreat resource to help beggineers get acquinted with data science, the resource is paid but you can find their slides online for free
64
Jupyter Notebookshttps://jupyter.orgresourcethis should have been the first thing on the list :/
65
Armenian Khan Academy
https://hy.khanacademy.org/math/statistics-probability
courseStatistics and probabilityArmenian video walkthroughs
66
Stitch Fix Bloghttps://multithreaded.stitchfix.com/blog/resourceData Science BlogStitch Fix is one of the top companies applying DS to retail, they have a blog resource dedicating to explaining alot o fthe stuff they do
67
Medium Towards Data Sciencehttps://towardsdatascience.com/resourceData Science BlogMedium's Towards Data Science page has become the go to page to post data science articles, tutorials, and resources.
68
Yerevan NNhttps://yerevann.com/resourceData Science Research
69
Stanford Neural Network Course
https://www.youtube.com/watch?v=NfnWJUyUJYU&list=PLkt2uSq6rBVctENoVBg1TpCC7OQi31AlC
courseNN
70
Linear Digressionhttp://lineardigressions.com/resourceData Science PodcastKatie Malone and Ben Jaffe host Linear Digressions, a weekly podcast that explores recent developments in data science, machine learning, and artificial intelligence. The hosts are good friends, and their rapport makes each episode very accessible and easy to understand.
71
Data Skeptichttps://dataskeptic.com/resourceData Science PodcastData Skeptic is one of the best-known data science podcasts. This weekly show explores topics in data science, statistics, machine learning and artificial intelligence.
72
Towards Data Sciencehttps://towardsdatascience.comresourceData Scienceconcepts explained
73
GapMinder (Global Facts) https://www.gapminder.orgdatasets/visualizations Datasets and visualizations on global indicators
74
Netflix's Tech Bloghttps://netflixtechblog.com/resourceData Science TechThis is netflix's tech blog where they psot anything tech related
75
Open Source Data Science Master'shttp://datasciencemasters.orgcourseData Science Open source data science master's -
has many references to online course and books
76
List of Armenian Startups
https://angel.co/companies?locations[]=22926-Yerevan
resource
77
Atomhttps://atom.io/resourcetext editor
78
Sublime Texthttps://www.sublimetext.com/softwaretext editor
79
Pycharmhttps://www.jetbrains.com/pycharm/softwareIDE
80
Rodeohttps://rodeo.yhat.com/softwareIDE
81
IntelliJhttps://www.jetbrains.com/idea/softwareIDE
82
Apache Zeppelinhttps://zeppelin.apache.org/softwareIDE
83
DataBricks Notebooks
https://docs.databricks.com/notebooks/index.html
softwareIDE
84
Analyzing Big Data with Twitter
https://blogs.ischool.berkeley.edu/i290-abdt-s12/
courseBig DataOpen Source Course at UC Berkeley
85
Data Cataloghttps://catalog.data.gov/datasetdatasetDataset Search
86
Spyderhttps://www.spyder-ide.org/softwareIDE
87
Anacondahttps://www.anaconda.com/softwareIDE
88
https://github.com/jvns/pandas-cookbookresourceData StructuresData Structure Library
89
Data.worldhttps://data.world/datasets/armeniadatasetsocioeconomic, etc.
90
List of Armenian Startups
https://trello.com/b/dhtZ7B6Q/armenian-tech-landscape
resourceArmenia Startup
91
Framingham Heart Study https://framinghamheartstudy.org
92
Talking Machines
93
Five Thirty Eighthttps://fivethirtyeight.com/resource / datasetData Science BlogNate Silver's Data Science
94
Nurses Health Study https://www.nurseshealthstudy.org/researchersPublications/request access to datasources
95
UCI Machine Learning Repository https://archive.ics.uci.edu/ml/index.phpdatasetSeveral comprehensive datasets
96
Quandlhttps://www.quandl.com/datasetsource for financial, economic, and alternative datasetsserving investment professionals
97
Academic Torrentshttp://academictorrents.com/browse.phpdatasetSeveral comprehensive datasets
98
Caucasus Research Resource Centerhttp://www.crrccenters.org/2datasets / resourceDatasets and Analysis toolsData on all the Caucasus
99
Humanitarian Data Exchange https://data.humdata.org/group/armdataset
100
Canva for designhttps://www.canva.com/resourceDesign