A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Name | Link | Resource or dataset | categories and labels | Notes | ||||||||||||||||||
2 | Open Office | https://www.openoffice.org/ | Data analysis | Open source and free version of the Microsoft Office suite including a spreadshet tool. Often handles CSV files better than Excel. | |||||||||||||||||||
3 | Excel | Data analysis | The indsutry standard spreadsheet programme and powerful analytical tool | ||||||||||||||||||||
4 | Google Sheets | https://drive.google.com/ | Data analysis | Collaborative and onlne version of Microsoft Excel has some addtional tools for extracting data from other web sources | |||||||||||||||||||
5 | Open Refine | http://openrefine.org/ | Data cleaning | Data cleaning power tool for your spreadsheets. Very good for getting naming consistency and doing routine data cleaning tasks such as stripping out white space and changing capitalisation. Can also be used to extract data from web applications, for instance the OpenCorporates API | |||||||||||||||||||
6 | MetaClean | http://www.adarsus.com/en/metaclean.html | Data cleaning | Manage and scrub metadata on a range of documents including PDF and MS Office files. Can also be used to look at the metadata from a range of files. | |||||||||||||||||||
7 | Abby Fine Reader | https://www.abbyy.com/finereader/ | Data extraction | Very good OCR tool for extracting searchable, machine-readable data from scans. There is also a simiplified online version that charges you per sheet. | |||||||||||||||||||
8 | PDF Tables | https://pdftables.com/ | Data extraction | Exporting PDFs into Excel or Google spreadsheets. Only appropriate for relatively "clean" PDFs. Will not recognise hand writing or poor quality scans. | |||||||||||||||||||
9 | Morph.io | https://morph.io/ | Data extraction | Open soure platform for hosting web scrapers that extract data from websites | |||||||||||||||||||
10 | Import.io | https://www.import.io/ | Data extraction | Platform that lets you scrape data from websites using an intuitive graphical user interface | |||||||||||||||||||
11 | wget | https://www.gnu.org/software/wget/ | Data extraction | Command line tool for downloading whole contents of websites | |||||||||||||||||||
12 | HttTrack | https://www.httrack.com/ | Data extraction | Desktop tool for downloading the whole contents of websites to a local drive. Good for tracking and monitoring web content. | |||||||||||||||||||
13 | Google reverse image search | https://images.google.com/ | Data extraction | Searching for the origin of images or where else it occurs on the web by uploading it to Google | |||||||||||||||||||
14 | Exif viewer | http://araskin.webs.com/exif/exif.html | Data extraction | Metdata extraction from photos | |||||||||||||||||||
15 | Google Chrome Scraper | https://chrome.google.com/webstore/detail/scraper/mbigbapnjcgaffohmbkdlecaccepngjd?hl=en | Data extraction | Google Chrome plug-in for extracting tables on websites into spreadsheets for analysis. | |||||||||||||||||||
16 | SQLite | https://www.sqlite.org/ | Data storage | Lightweight relational database. Requires knowledge of SQL. | |||||||||||||||||||
17 | VIS | https://vis.occrp.org/ | Data visualization | Online network visualization tool. Still in BETA but already used by a number of GW staff to map out connections between people, assets and legal entities. Best for use with very hetrogeneous and smaller datasets. If you have large amounts of structured networked data, Linkurious is a better option e.g. a scraped company registry. | |||||||||||||||||||
18 | Datawrapper | http://datawrapper.de/ | Data visualization | Easy-to-use tool for creating simple, interactive charts. We have a subscription that enables us to create branded charts with our colour scheme and logo. Email Data Lead for a log-in. | |||||||||||||||||||
19 | Tableau | http://www.tableau.com/ | Data visualization | Powerful data visualisation tool for creating maps and charts for online and print. Email Data Lead for an account. | |||||||||||||||||||
20 | Gephi | https://gephi.org/ | Data visualization | Network visualization and analysis tool, good for working with large volumes of highy strcutured data | |||||||||||||||||||
21 | CartoDB | https://cartodb.com/ | Data visualization | Powerful online mapping tool for producing a range of different maps. Particualrly good for working with point data where you have precise coordiantes for events. GW has a subscription that enables to create private maps. Can also be used for geocoding addresses. | |||||||||||||||||||
22 | Document Cloud | https://www.documentcloud.org/ | Document analysis | Platform for storing, publishing and analysic documents and other text data. Has built in OCR for scans and also attempts to extract named entities (e..g places, companies, people) from dociments. | |||||||||||||||||||
23 | Linkurious | https://linkurious.globalwitness.org | Data visualization | Tool for visualising large sets of networked data. Used in Companies We Keep Report and Narco-a-lago. Requires support from technician to import databases into but can be a powerful way to explore. | |||||||||||||||||||
24 | Aleph | https://github.com/alephdata/aleph | Data storage | OCCRP developed tool for making large datasets and document sets searchable. See data.occrp.og. Global Witness in process of implementing one at at data.globalwitness.org. Data import requires support of Data Team. | |||||||||||||||||||
25 | QGis | https://www.qgis.org/en/site/ | Geospatial | Free tool | |||||||||||||||||||
26 | Jupyter Notebooks | https://jupyter.org/ | software | Data analysis | Tool for making code for data processing and analysis transparent. Normally used with the Python programming language. Example: https://github.com/Global-Witness/the-companies-we-keep/blob/master/companies_we_keep_analysis.ipynb | ||||||||||||||||||
27 | Python | https://www.python.org/ | software | All | Highly flexible programming language for extracting, transforming and analysing data. Can suit a range of tasks where off the shelf solutions like the tools here can't fix. The Data Team can talk to you about what it can do. Particularly useful for web scraping. | ||||||||||||||||||
28 | Neo4J | https://neo4j.com/ | software | Data storage | Database that powers Linkurious. Used to store Panama Papers and Paradise Papers public data. Useful for linking and storing large sets of networked data and for pattern querying. For an exmaple of it in use see Companes We Keep report and idnetification of circular ownership patterns: https://www.globalwitness.org/it/blog/how-we-mined-worlds-first-open-data-register-company-control/. Talk to the Data Team if you think it may be useful. | ||||||||||||||||||
29 | Cronos | Data storage | Software for querying certain types of Russian database. Global Witness has a virtual machine with a license for this software. Talk to the Data Team to get access and if you think you have or will be receiving data in Cronos format. | ||||||||||||||||||||
30 | Tabula | https://tabula.technology/ | Data extraction | Tool for extracting tables from PDFs. Does not in genreal work on scanned documents. | |||||||||||||||||||
31 | WorldBank microdata | https://microdata.worldbank.org/index.php/home | dataset | world socioeconomic data | |||||||||||||||||||
32 | Central Bank of Armenia Databank | https://www.cba.am/en/SitePages/statdatabank1.aspx | dataset | time series of aggregated statistical data (in Armenia) | |||||||||||||||||||
33 | IPUMS international data | https://international.ipums.org/international-action/variables/group | dataset | cencus data, social science and health | |||||||||||||||||||
34 | ՀՀ վիճակագրական կոմիտե (Statistical Committee of RA) | https://www.armstat.am/am/ | dataset | many sectors: agriculture, finances, labour market, etc. | |||||||||||||||||||
35 | DHS Statcompiler | https://www.statcompiler.com/en/ | dataset | demographic and health indicators | |||||||||||||||||||
36 | DHS Program | https://dhsprogram.com/publications/Journal-Articles-Search.cfm?C_id=71&cn=Armenia&page=1 | dataset | journal search by population indicator | |||||||||||||||||||
37 | Intro to Econometrics with R | https://www.econometrics-with-r.org/ | resource | digital textbook, economics, programming | |||||||||||||||||||
38 | Principles of Econometrics with R | https://bookdown.org/ccolonescu/RPoE4/ | resource | digitla textbook, economics, programming | |||||||||||||||||||
39 | Data Science: Theories, Models, Algorithms, and Analytics | https://srdas.github.io/MLBook/ | resource | digital textbook, data science | |||||||||||||||||||
40 | R for Data Science | https://r4ds.had.co.nz/ | resource | digital textbook, programming, data science | |||||||||||||||||||
41 | Python Data Science Handbook | https://jakevdp.github.io/PythonDataScienceHandbook/ | resource | handbook, programming, data science, | |||||||||||||||||||
42 | edX | https://www.edx.org/ | resource | online courses | |||||||||||||||||||
43 | USAID Foreign Aid | https://explorer.usaid.gov/cd/ARM?implementing_agency_id=1 | dataset | US Foreign Aid to Armenia | |||||||||||||||||||
44 | Foreign Aid & Assistnace to Armenia | https://foreignassistance.gov/explore/country/Armenia | dataset | Comprehensive Datasets for Foreign Assistance & Aid to Armenia | |||||||||||||||||||
45 | USAID Development Data Library | https://data.usaid.gov/ | dataset | Data on a Wide Range of Indicators Worldwide | |||||||||||||||||||
46 | USAID Development Cooperation Landscape | https://explorer.usaid.gov/donor/armenia | dataset | Country-level Development Cooperation Data | |||||||||||||||||||
47 | IDEA (International Data & Economic Analysis) | https://idea.usaid.gov/query | dataset | USAID's comprehensive source of economic and social data | |||||||||||||||||||
48 | IPUMS international data | https://ipums.org | dataset | IPUMS international has data on Armenian economy | |||||||||||||||||||
49 | Kaggle | https://www.kaggle.com | dataset | ||||||||||||||||||||
50 | Data Science: Theories, Models, Algorithms, and Analytics | https://srdas.github.io/MLBook/ | resource | book | Really good applied ML Book | ||||||||||||||||||
51 | Data 8 at UC Berkeley | http://data8.org | course | ||||||||||||||||||||
52 | CS 188 UC Berkeley | https://inst.eecs.berkeley.edu/~cs188/sp20/ | course | Berkeley's Introductin to AI | |||||||||||||||||||
53 | Data 100 at UC Berkeley | http://www.ds100.org | course | ||||||||||||||||||||
54 | Zil (powered by ISTC) | https://zil.am/courses/ | course | online course: AI, softwared, business, etc. | |||||||||||||||||||
55 | Introduction To Statistical Learning | http://faculty.marshall.usc.edu/gareth-james/ISL/ | book | Data Analysis | |||||||||||||||||||
56 | quantecon | https://quantecon.org/ | Course | Data Analysis | Big Community of Economics teaching and applications in Julia and Python | ||||||||||||||||||
57 | Weapons of Math Destruction | https://weaponsofmathdestructionbook.com | book | Math for Data Science | If you want to be able to trust your AI outputs, then you need to read this book. It explains some of the different avenues by which bias can infiltrate your data and algorithms and what you can do about it. | ||||||||||||||||||
58 | Cancer Today: World Health Organization | https://gco.iarc.fr/today/home | Data Visualization | Data Visualization | Data Visualization for global burden of cancer | ||||||||||||||||||
59 | Armenian Named Entity Recognition | https://arxiv.org/abs/1810.08699 | NLP | 163000-token named entity corpus automatically generated and annotated from Wikipedia, and another 53400-token corpus of news sentences with manual annotation of people, organization and location named entities | |||||||||||||||||||
60 | Quantopia | https://www.quantopian.com/lectures | course | Data Analysis / Algo Traading | Lectures and Tutorials on algorithmic trading | ||||||||||||||||||
61 | DataSet Search by Google | https://datasetsearch.research.google.com | dataset | ||||||||||||||||||||
62 | Google Colabs | https://colab.research.google.com/notebooks/welcome.ipynb | resource | online coding | a platform for online coding collaborations | ||||||||||||||||||
63 | Datacamp | datacamp.com | course | Data Analysis | Great resource to help beggineers get acquinted with data science, the resource is paid but you can find their slides online for free | ||||||||||||||||||
64 | Jupyter Notebooks | https://jupyter.org | resource | this should have been the first thing on the list :/ | |||||||||||||||||||
65 | Armenian Khan Academy | https://hy.khanacademy.org/math/statistics-probability | course | Statistics and probability | Armenian video walkthroughs | ||||||||||||||||||
66 | Stitch Fix Blog | https://multithreaded.stitchfix.com/blog/ | resource | Data Science Blog | Stitch Fix is one of the top companies applying DS to retail, they have a blog resource dedicating to explaining alot o fthe stuff they do | ||||||||||||||||||
67 | Medium Towards Data Science | https://towardsdatascience.com/ | resource | Data Science Blog | Medium's Towards Data Science page has become the go to page to post data science articles, tutorials, and resources. | ||||||||||||||||||
68 | Yerevan NN | https://yerevann.com/ | resource | Data Science Research | |||||||||||||||||||
69 | Stanford Neural Network Course | https://www.youtube.com/watch?v=NfnWJUyUJYU&list=PLkt2uSq6rBVctENoVBg1TpCC7OQi31AlC | course | NN | |||||||||||||||||||
70 | Linear Digression | http://lineardigressions.com/ | resource | Data Science Podcast | Katie Malone and Ben Jaffe host Linear Digressions, a weekly podcast that explores recent developments in data science, machine learning, and artificial intelligence. The hosts are good friends, and their rapport makes each episode very accessible and easy to understand. | ||||||||||||||||||
71 | Data Skeptic | https://dataskeptic.com/ | resource | Data Science Podcast | Data Skeptic is one of the best-known data science podcasts. This weekly show explores topics in data science, statistics, machine learning and artificial intelligence. | ||||||||||||||||||
72 | Towards Data Science | https://towardsdatascience.com | resource | Data Science | concepts explained | ||||||||||||||||||
73 | GapMinder (Global Facts) | https://www.gapminder.org | datasets/visualizations | Datasets and visualizations on global indicators | |||||||||||||||||||
74 | Netflix's Tech Blog | https://netflixtechblog.com/ | resource | Data Science Tech | This is netflix's tech blog where they psot anything tech related | ||||||||||||||||||
75 | Open Source Data Science Master's | http://datasciencemasters.org | course | Data Science | Open source data science master's - has many references to online course and books | ||||||||||||||||||
76 | List of Armenian Startups | https://angel.co/companies?locations[]=22926-Yerevan | resource | ||||||||||||||||||||
77 | Atom | https://atom.io/ | resource | text editor | |||||||||||||||||||
78 | Sublime Text | https://www.sublimetext.com/ | software | text editor | |||||||||||||||||||
79 | Pycharm | https://www.jetbrains.com/pycharm/ | software | IDE | |||||||||||||||||||
80 | Rodeo | https://rodeo.yhat.com/ | software | IDE | |||||||||||||||||||
81 | IntelliJ | https://www.jetbrains.com/idea/ | software | IDE | |||||||||||||||||||
82 | Apache Zeppelin | https://zeppelin.apache.org/ | software | IDE | |||||||||||||||||||
83 | DataBricks Notebooks | https://docs.databricks.com/notebooks/index.html | software | IDE | |||||||||||||||||||
84 | Analyzing Big Data with Twitter | https://blogs.ischool.berkeley.edu/i290-abdt-s12/ | course | Big Data | Open Source Course at UC Berkeley | ||||||||||||||||||
85 | Data Catalog | https://catalog.data.gov/dataset | dataset | Dataset Search | |||||||||||||||||||
86 | Spyder | https://www.spyder-ide.org/ | software | IDE | |||||||||||||||||||
87 | Anaconda | https://www.anaconda.com/ | software | IDE | |||||||||||||||||||
88 | https://github.com/jvns/pandas-cookbook | resource | Data Structures | Data Structure Library | |||||||||||||||||||
89 | Data.world | https://data.world/datasets/armenia | dataset | socioeconomic, etc. | |||||||||||||||||||
90 | List of Armenian Startups | https://trello.com/b/dhtZ7B6Q/armenian-tech-landscape | resource | Armenia Startup | |||||||||||||||||||
91 | Framingham Heart Study | https://framinghamheartstudy.org | |||||||||||||||||||||
92 | Talking Machines | ||||||||||||||||||||||
93 | Five Thirty Eight | https://fivethirtyeight.com/ | resource / dataset | Data Science Blog | Nate Silver's Data Science | ||||||||||||||||||
94 | Nurses Health Study | https://www.nurseshealthstudy.org/researchers | Publications/request access to datasources | ||||||||||||||||||||
95 | UCI Machine Learning Repository | https://archive.ics.uci.edu/ml/index.php | dataset | Several comprehensive datasets | |||||||||||||||||||
96 | Quandl | https://www.quandl.com/ | dataset | source for financial, economic, and alternative datasets | serving investment professionals | ||||||||||||||||||
97 | Academic Torrents | http://academictorrents.com/browse.php | dataset | Several comprehensive datasets | |||||||||||||||||||
98 | Caucasus Research Resource Center | http://www.crrccenters.org/2 | datasets / resource | Datasets and Analysis tools | Data on all the Caucasus | ||||||||||||||||||
99 | Humanitarian Data Exchange | https://data.humdata.org/group/arm | dataset | ||||||||||||||||||||
100 | Canva for design | https://www.canva.com/ | resource | Design |