Data Sources and Types �
Idongesit Efaemiode Eteng(Ph.D.)�Computer Science Department�University of Calabar.�
Presentation Outline
Brief overview of the importance of data in social science and other research areas
Primary Data
Primary data is original data collected
directly by researchers for a particular purpose .
Examples of primary data include:
Primary Data
Pros of Primary Data:
Cons of Primary Data:
Secondary Data
Secondary data - refers to data that has been collected by someone else for a purpose other than the current research at hand.
Researchers use secondary data in their own studies to answer research questions or investigate phenomena without having to collect new data.
Secondary Data
Examples of Secondary Data:
Pros of Secondary Data:
Cost and Time Efficiency:
Historical and Longitudinal Analysis:
Large Sample Sizes:
Comparative Studies:
Ethical Considerations
Cons of Secondary Data:
Limited Control:
Data Availability:
Data Quality:
Lack of Specific Variables: Contextual Understanding:
Public vs. Private Data
Public Data:
Private Data:
Section 2: Types of Data
Numeric Data:
String Data:
Types of Data ��
Boolean Data:
Temporal Data:
Types of Data� �� ��
Geospatial Data:
Categorical Data:
·
Ordinal Data:
Network Data:
Types of Data� �� ��
Panel Data:
Bibliometric Data:
Bioinformatics Data:
Digital Trace Data:
·
Types of Data – Other classification schemes� �� ��
Hands on session 1�1. Using the data instruments �provided in class, fill in the �survey questions.�https://forms.gle/BfvrFZ7feeg7ehFx8�2. separate into groups of five members�3.Braintorm on the hypothesis that can be �postulated from the survey.�4. What sort of Analysis can be carried out.� ���� ���� �� ��
.
SUGGESTED ANSWERS TO HANDS-ON 1
SUGGESTED ANSWERS TO HANDS– Potential Analyses
Section 3: Data Sources
Databases:
Relational Databases (e.g., MySQL, PostgreSQL)
NoSQL Databases (e.g., MongoDB, Cassandra)
Online Databases:
Web Scraping:
Collecting data from websites by extracting information using tools like (Python) or Scrapy.
Data Sources
Digital Trace Data:
Government Data Sources:
Data Sources� ���� ���� �� ��
Academic Databases:
Examples: Accessing academic literature and datasets from sources like PubMed, JSTOR, and IEEE Xplore.
Citations:
PubMed (https://pubmed.ncbi.nlm.nih.gov/)
JSTOR (https://www.jstor.org/)
IEEE Xplore (https://ieeexplore.ieee.org/)
Social Media APIs:
Examples: Utilizing APIs of platforms like Facebook, Instagram, or LinkedIn to collect social media data.Citations: API documentation for each platform (e.g., Facebook Graph API: https://developers.facebook.com/docs/graph-api/)
Subscription-based Data Providers:
Examples: Paid access to specialized data sources, such as financial market data from Bloomberg or consumer behavior data from Nielsen.
Citations: Provider-specific sources, as these often require subscriptions.
Data Sources� ���� ���� �� ��
Market Research Firms:
Nielsen (https://www.nielsen.com/)
Forrester (https://go.forrester.com/)
Gartner (https://www.gartner.com/)
Environmental and Scientific Data Repositories:
Section 4: Data Collection Software� �� ���� ���� �� ��
Google Forms is a free survey tool integrated with Google Workspace.
Popular online survey platform for creating and distributing surveys and questionnaires
Robust survey and data collection platform.
Research Electronic Data Capture (REDCap) is a secure web application for building and managing online surveys and databases.
ODK is an open-source data collection and survey platform designed for mobile data collection in challenging environments.�
Data Collection Software� �� ���� ���� �� ��
Qualitative data analysis software designed for researchers working with unstructured or text-based data.
Powerful data visualization tool that helps transform data into interactive and visually appealing charts, graphs, and dashboards.
Spreadsheet software that plays a fundamental role in data analysis. It allows users to perform basic data cleaning, manipulation, and statistical analysis.
(Statistical Package for the Social Sciences):SPSS is specialized statistical software used for in-depth quantitative data analysis.
Data science platform that combines data preparation, machine learning, and advanced analytics.
Data Collection Software - R� �� ���� ���� �� ��
Using R for Data Collection:�Web Scraping: R has packages like rvest and httr that enable web scraping. Researchers can extract data from websites by sending HTTP requests, parsing HTML, and collecting structured information. The collected data can be stored in data frames for further analysis.�API Access: R provides packages like httr and jsonlite that facilitate interaction with APIs. Researchers can access data from various online sources, including social media platforms, government databases, and third-party data providers.�Data Entry and Surveys: R can be used to create custom data entry forms or surveys using packages like shiny or questionr. This allows researchers to collect structured data directly from participants or data entry personnel.�Data Import: R supports importing data from various file formats, including Excel, CSV, JSON, and databases. Researchers can use packages like readxl, readr, and DBI to import data into R for analysis.�
Data Collection Software - PYTHON� �� ���� ���� �� ��
Using Python for Data Collection:
Web Scraping: Python is widely used for web scraping, with libraries like Beautiful Soup and Requests. Researchers can automate the process of navigating web pages, extracting data, and storing it in various formats, such as CSV or databases.
API Access: Python has libraries like requests and pandas that allow researchers to make API requests and work with JSON data. This is useful for collecting data from platforms like Twitter, Reddit, or web services.
Data Entry and Forms: Python web frameworks like Django and Flask can be used to create web-based data entry forms or survey applications. Researchers can customize forms for data collection and store the submitted data in databases.
Data Import: Python has extensive support for importing data from different file formats. Libraries like pandas, openpyxl, and sqlite3 can be used to read data from Excel files, CSV files, SQLite databases, and more.
�
Ethical Considerations� ���� �� �� ���� ���� �� ��
Importance of ethical data usage and protection
Guidelines and best practices
Conclusion� ���� ���� �� �� ���� ���� �� ��
Hands on Session 2
Using MYSQL, CREATE A Database called ICT. Create a table called FINANCE using five identified data types from the survey data gathered from hands on session 1.
Hint
1. Create the Database:
CREATE DATABASE ICT;
2. Use the Database:
USE ICT;
Create the "FINANCE" Table:
Now, let's create the "FINANCE" table with five columns, each using a different data type. You need to replace the data types and column names with those relevant to your specific survey data.
To create a database called "ICT" and a table called "FINANCE" in MySQL using five identified data types from survey data, you would typically follow these steps:
Create the Database:
sql
CREATE DATABASE ICT;
This SQL command creates a new database named "ICT."
Use the Database:
sql
USE ICT;
This command selects the "ICT" database for further operations.
Create the "FINANCE" Table:
Now, let's create the "FINANCE" table with five columns, each using a different data type. You need to replace the data types and column names with those relevant to your specific survey data.
sql
CREATE TABLE FINANCE (
ID INT AUTO_INCREMENT PRIMARY KEY,
Name VARCHAR(255),
Age INT,
Income DECIMAL(10, 2),
Education VARCHAR(50),
Timestamp TIMESTAMP
);
References� ���� ���� �� �� ���� ���� �� ��
�
Q&A
THANK
YOU