1 of 31

Data Sources and Types

2 of 31

Idongesit Efaemiode Eteng(Ph.D.)�Computer Science Department�University of Calabar.

3 of 31

Presentation Outline

  • Introduction 
  • Brief overview of the importance of data in social science and other research
  • Data Sources
  • Data Types
  • Hands on Practice 1
  • Data collection Software
  • R and Python as Data Collection tools
  • Ethical Considerations for data use
  • Hands on Practice 2 and 3

4 of 31

Brief overview of the importance of data in social science and other research areas

  • Empirical Foundation:  
  • Informed Decision-Making
  • Understanding Complex Social Phenomena:
  • Testing Hypotheses
  • Predictive Modeling
  • Policy Evaluation:
  • Cross-Cultural and Cross-National Comparisons
  • Longitudinal Studies:
  • Ethical Considerations:

5 of 31

Primary Data

Primary data is original data collected

directly by researchers for a particular purpose .

Examples of primary data include:

  • Surveys
  • Observations
  • Experiments
  • Interviews
  • Focus Groups
  • Dairies or journals:
  • Sensor Data:
  • Biology Samples:

6 of 31

Primary Data

Pros of Primary Data:

  • Relevance:
  • Control:
  • Freshness:
  • Customization:
  • Uniqueness:

Cons of Primary Data:

  • Cost and Time-Consuming:
  • Sampling Issues:
  • Data Collection Errors:
  • Ethical Considerations:
  • Limited Historical Data:
  • Availability:

7 of 31

Secondary Data

Secondary data - refers to data that has been collected by someone else for a purpose other than the current research at hand.

Researchers use secondary data in their own studies to answer research questions or investigate phenomena without having to collect new data.

8 of 31

Secondary Data

Examples of Secondary Data:

  • Census Data:
  • Surveys: WHO
  • Academic Journals:
  • Published as archives, newspapers, and diaries.
  • Company Reports: Financial reports, market research, and sales data published by companies

Pros of Secondary Data:

Cost and Time Efficiency:

Historical and Longitudinal Analysis:

Large Sample Sizes:

Comparative Studies:

Ethical Considerations

Cons of Secondary Data:

Limited Control:

Data Availability:

Data Quality:

Lack of Specific Variables: Contextual Understanding:

9 of 31

Public vs. Private Data

Public Data:

  • Accessibility: government agencies, public institutions, or organizations
  • Ownership: government, public entities, or organizations
  • Privacy: Anonymzed or aggregated to protect personal privacy.
  • Use: research, policy analysis, transparency, and public awareness.
  • Examples: Census data, Government reports, publicly available research publications, public records (e.g., birth certificates), and data published on government websites are examples of public data.

Private Data:

  • Accessibility: Private
  • Privacy: Private data often contains sensitive or confidential information.
  • Use: Private data is typically used for internal purposes, such as decision-making, analytics, and operations within organizations.
  • Examples: Personal financial records, customer databases, proprietary research data, and confidential business information are examples of private data.

10 of 31

Section 2: Types of Data

Numeric Data:

  • Examples: Age, income, test scores
  • Practical Usage: Numeric data is fundamental for statistical analysis, hypothesis testing, and exploring relationships between variables..

String Data:

  • Examples: Textual responses to open-ended survey questions, interview transcripts
  • Practical Usage: String data is used to capture qualitative information in the form of text. Researchers analyze text data to identify themes, sentiments, and patterns in responses.

11 of 31

Types of Data ��

Boolean Data:

  • Examples: Yes/No responses, binary-coded survey answers
  • Practical Usage: Boolean data is used to record binary or dichotomous choices. Researchers use it to analyze binary outcomes, such as yes/no decisions, agreement/disagreement, or presence/absence of certain attributes in a study.

Temporal Data:

  • Examples: Dates of events, time spent on tasks
  • Practical Usage: Trends, changes over time, and the timing of events. It's essential for longitudinal studies and time series analysis.

12 of 31

Types of Data� �� ��

Geospatial Data:

  • Examples: Latitude and longitude coordinates, GIS data
  • Practical Usage: Geospatial data is vital for analyzing spatial patterns, location-based phenomena, and spatial relationships. Social scientists use it for studying urban planning, regional disparities, and environmental factors affecting communities.

Categorical Data:

  • Examples: Gender, ethnicity, education level
  • Practical Usage: Categorical data classifies individuals into distinct categories. Researchers use it for demographic analysis, studying group differences, and conducting cross-tabulations.

·

Ordinal Data:

  • Examples: Survey responses on a Likert scale (e.g., strongly agree, agree, neutral, disagree, strongly disagree)
  • Practical Usage: Ordinal data represents ordered categories with meaningful rank. Researchers use it to assess attitudes, preferences, and satisfaction levels.

Network Data:

  • Examples: Social network connections, communication networks
  • Practical Usage: Network data helps analyze social interactions, influence, and information diffusion within social structures. Social scientists use it for studying social ties, collaborations, and communication patterns.

13 of 31

Types of Data� �� ��

Panel Data:

  • Examples: Repeated observations of the same individuals over time
  • Practical Usage: Panel data combines temporal and numeric data to analyze individual-level changes over time. It's valuable for longitudinal studies in social sciences

Bibliometric Data:

  • Examples: Citation records, publication metadata (authors, titles, journals), citation networks
  • Practical Usage in Social Sciences: Bibliometric data is used to analyze scholarly publications and their impact etc

Bioinformatics Data:

  • Examples: Genomic sequences, protein structures, gene expression data
  • Practical Usage: Bioinformatics data can be applied to interdisciplinary studies involving genetics, health behavior, and epidemiology

Digital Trace Data:

  • Examples: Social media posts, website clickstreams, online reviews, mobile phone call record.
  • Practical Usage : Digital trace data offers insights into human behavior and interactions in online spaces. Social scientists use it to analyze social media sentiments, study information diffusion, examine online political discourse, and understand patterns of digital communication.

·

14 of 31

Types of Data – Other classification schemes� �� ��

  • Structured , Semi structured, Unstructured
  • Quantitative vs. Qualitative
  • Categorical vs. Continuous Data
  • Primary vs. Secondary
  • Cross sectional vs. Longitudinal
  • Operational vs. Analytical
  • Mon modal and Multimodal

15 of 31

Hands on session 1�1. Using the data instruments �provided in class, fill in the �survey questions.�https://forms.gle/BfvrFZ7feeg7ehFx8�2. separate into groups of five members�3.Braintorm on the hypothesis that can be �postulated from the survey.�4. What sort of Analysis can be carried out.� ���� ���� �� ��

.

16 of 31

SUGGESTED ANSWERS TO HANDS-ON 1

  • Hypothesis 1: Age is a significant predictor of ICT usage for wealth creation.
    • Analysis: Perform a one-way ANOVA or Kruskal-Wallis test to determine if there are age-related differences in the use of ICT for wealth creation.
  • Hypothesis 2: Access to the internet has a relationship with intention to expand ICT usage for wealth creation.
    • Analysis: Use a chi-squared test or logistic regression to examine the relationship between internet access and future plans for ICT use in wealth creation.
  • Hypothesis 3: The primary benefits of using ICT for wealth creation vary based on location (urban, suburban, rural).
    • Analysis: Perform chi-squared tests or logistic regression to explore if location influences the selection of primary benefits.
  • Hypothesis 4: Respondents who have received formal ICT training are more likely to report increased income through ICT.
    • Analysis: Conduct t-tests or ANOVA to compare income levels between those who received training and those who did not
  • Hypothesis 5 : Gender influences the perception of benefits and challenges in wealth creation through ICT.
    • Analysis: Conduct chi-squared tests or logistic regression to assess whether gender is associated with specific benefits and challenges related to ICT-based wealth creation.
  • Hypothesis 6: The sentiment expressed in additional comments is related to the perceived benefits or challenges of ICT use for wealth creation.

17 of 31

SUGGESTED ANSWERS TO HANDS– Potential Analyses

  • Descriptive Statistics: Calculate descriptive statistics (e.g., means, percentages) for demographic variables like age, gender, and location to understand the survey sample's characteristics.
  • Chi-Squared Tests: Use chi-squared tests to examine associations between categorical variables, such as gender and benefits/challenges of ICT use for wealth creation.
  • Logistic Regression: Perform logistic regression to assess the likelihood of future plans for ICT use based on demographic factors like age, gender, and internet access.
  • ANOVA/Kruskal-Wallis: Analyze the impact of age on various aspects of ICT usage for wealth creation, such as income levels or the number of ICT devices used.
  • Content Analysis: For open-ended questions (e.g., Additional Comments), conduct content analysis to identify recurring themes, concerns, and suggestions related to wealth creation through ICT.
  • Correlation Analysis: Examine correlations between variables, such as the relationship between income and perceived benefits of ICT use.
  •  Sentiment Analysis:
  • Causation Analysis:

18 of 31

Section 3: Data Sources

Databases:

  • Examples:

Relational Databases (e.g., MySQL, PostgreSQL)

NoSQL Databases (e.g., MongoDB, Cassandra)

Online Databases:

Web Scraping:

  • Examples:

Collecting data from websites by extracting information using tools like (Python) or Scrapy.

19 of 31

Data Sources

Digital Trace Data:

  • Examples: Analyzing digital traces left by users, such as clickstreams, app usage logs, or online behavior.

Government Data Sources:

  • Examples: Data provided by government agencies, such as census data, economic indicators, and public health statistics.
  • Citations: U.S. Census Bureau (https://www.census.gov/), Bureau of Labor Statistics (https://www.bls.gov/), World Bank Data (https://data.worldbank.org/)

20 of 31

Data Sources� ���� ���� �� ��

Academic Databases:

Examples: Accessing academic literature and datasets from sources like PubMed, JSTOR, and IEEE Xplore.

Citations:

PubMed (https://pubmed.ncbi.nlm.nih.gov/)

JSTOR (https://www.jstor.org/)

IEEE Xplore (https://ieeexplore.ieee.org/)

Social Media APIs:

Examples: Utilizing APIs of platforms like Facebook, Instagram, or LinkedIn to collect social media data.Citations: API documentation for each platform (e.g., Facebook Graph API: https://developers.facebook.com/docs/graph-api/)

Subscription-based Data Providers:

Examples: Paid access to specialized data sources, such as financial market data from Bloomberg or consumer behavior data from Nielsen.

Citations: Provider-specific sources, as these often require subscriptions.

21 of 31

Data Sources� ���� ���� �� ��

Market Research Firms:

  • Examples: Subscription-based access to market research reports and data from firms like Nielsen, Forrester, and Gartner.
  • Citations:

Nielsen (https://www.nielsen.com/)

Forrester (https://go.forrester.com/)

Gartner (https://www.gartner.com/)

Environmental and Scientific Data Repositories:

  • Examples: Accessing climate data from sources like NOAA or astronomical data from NASA.
  • Citations:National Oceanic and Atmospheric Administration (NOAA): https://www.noaa.gov/
  • NASA Open Data Portal: https://data.nasa.gov/

22 of 31

Section 4: Data Collection Software� �� ���� ���� �� ��

  • Google Forms:

Google Forms is a free survey tool integrated with Google Workspace.

  • SurveyMonkey:

Popular online survey platform for creating and distributing surveys and questionnaires

  • Qualtrics

Robust survey and data collection platform.

  • REDcap:

Research Electronic Data Capture (REDCap) is a secure web application for building and managing online surveys and databases.

  • Open Data Kit (ODK):

ODK is an open-source data collection and survey platform designed for mobile data collection in challenging environments.

23 of 31

Data Collection Software� �� ���� ���� �� ��

  • NVivo

Qualitative data analysis software designed for researchers working with unstructured or text-based data.

  • Tableau

Powerful data visualization tool that helps transform data into interactive and visually appealing charts, graphs, and dashboards.

  • Excel

Spreadsheet software that plays a fundamental role in data analysis. It allows users to perform basic data cleaning, manipulation, and statistical analysis.

  • SPSS

(Statistical Package for the Social Sciences):SPSS is specialized statistical software used for in-depth quantitative data analysis.

  • RapidMiner

Data science platform that combines data preparation, machine learning, and advanced analytics.

24 of 31

Data Collection Software - R� �� ���� ���� �� ��

Using R for Data Collection:Web Scraping: R has packages like rvest and httr that enable web scraping. Researchers can extract data from websites by sending HTTP requests, parsing HTML, and collecting structured information. The collected data can be stored in data frames for further analysis.�API Access: R provides packages like httr and jsonlite that facilitate interaction with APIs. Researchers can access data from various online sources, including social media platforms, government databases, and third-party data providers.�Data Entry and Surveys: R can be used to create custom data entry forms or surveys using packages like shiny or questionr. This allows researchers to collect structured data directly from participants or data entry personnel.�Data Import: R supports importing data from various file formats, including Excel, CSV, JSON, and databases. Researchers can use packages like readxl, readr, and DBI to import data into R for analysis.�

25 of 31

Data Collection Software - PYTHON� �� ���� ���� �� ��

Using Python for Data Collection:

Web Scraping: Python is widely used for web scraping, with libraries like Beautiful Soup and Requests. Researchers can automate the process of navigating web pages, extracting data, and storing it in various formats, such as CSV or databases.

API Access: Python has libraries like requests and pandas that allow researchers to make API requests and work with JSON data. This is useful for collecting data from platforms like Twitter, Reddit, or web services.

Data Entry and Forms: Python web frameworks like Django and Flask can be used to create web-based data entry forms or survey applications. Researchers can customize forms for data collection and store the submitted data in databases.

Data Import: Python has extensive support for importing data from different file formats. Libraries like pandas, openpyxl, and sqlite3 can be used to read data from Excel files, CSV files, SQLite databases, and more.

26 of 31

Ethical Considerations� ���� �� �� ���� ���� �� ��

Importance of ethical data usage and protection

  • Privacy Preservation:
  • Trust and Reputation:
  • Data Security:
  • Fairness and Equity:
  • Transparency and Accountability:
  • Legal and Regulatory Compliance:
  • Data Governance and Management:
  • Research and Innovation:

Guidelines and best practices

  • Data Privacy and Consent:
  • Data Security:
  • Data Quality:
  • Data Governance:
  • Data Retention and Deletion:
  • Transparency:
  • Data Sharing and Collaboration:
  • Data Access Control:
  • Data Ethics and Bias:
  • Compliance with Regulations:
  • Data Backup and Disaster Recovery:
  • Data Documentation:
  • Data Training and Awareness
  • Data Monitoring and Auditing:
  • Ethical Data Research:
  • Data Use for Decision-Making:
  • Data Destruction:

27 of 31

Conclusion� ���� ���� �� �� ���� ���� �� ��

  • Data is the foundation for all research
  • Databases are organized data
  • Data can be sourced from several sources
  • Technologies and tools for capturing data are varied.
  • The major reason for gathering data is to make informed decisions through analysis
  • There are ethical guidelines for the proper use of data.�

28 of 31

Hands on Session 2

Using MYSQL, CREATE A Database called ICT. Create a table called FINANCE using five identified data types from the survey data gathered from hands on session 1.

Hint

1. Create the Database:

CREATE DATABASE ICT;

2. Use the Database:

USE ICT;

Create the "FINANCE" Table:

Now, let's create the "FINANCE" table with five columns, each using a different data type. You need to replace the data types and column names with those relevant to your specific survey data.

To create a database called "ICT" and a table called "FINANCE" in MySQL using five identified data types from survey data, you would typically follow these steps:

Create the Database:

sql

CREATE DATABASE ICT;

This SQL command creates a new database named "ICT."

Use the Database:

sql

USE ICT;

This command selects the "ICT" database for further operations.

Create the "FINANCE" Table:

Now, let's create the "FINANCE" table with five columns, each using a different data type. You need to replace the data types and column names with those relevant to your specific survey data.

sql

CREATE TABLE FINANCE (

ID INT AUTO_INCREMENT PRIMARY KEY,

Name VARCHAR(255),

Age INT,

Income DECIMAL(10, 2),

Education VARCHAR(50),

Timestamp TIMESTAMP

);

29 of 31

References� ���� ���� �� �� ���� ���� �� ��

  • Bryman, A. (2016). Social Research Methods. Oxford University Press.
  • Field, A., Miles, J., & Field, Z. (2012). Discovering Statistics Using R. Sage Publications.
  • Bryman, A. (2016). Social Research Methods. Oxford University Press.
  • Creswell, J. W., & Creswell, J. D. (2017). Research Design: Qualitative, Quantitative, and Mixed Methods Approaches. Sage Publications.
  • Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2018). Multivariate Data Analysis. Pearson. - While primarily focused on data analysis, this book discusses the use of secondary data in research.
  • King, G., Keohane, R. O., & Verba, S. (1994). Designing Social Inquiry: Scientific Inference in Qualitative Research. Princeton University Press.
  • Neuman, W. L. (2014). Social Research Methods: Qualitative and Quantitative Approaches. Pearson.
  • Sekaran, U., & Bougie, R. (2016). Research Methods for Business: A Skill Building Approach. Wiley.

30 of 31

Q&A

31 of 31

THANK

YOU