1 of 19

Intro To Biostatistics

Statistics is the branch of applied mathematics involving the collection, organization, analysis, interpretation, and presentation of data to inform decisions and draw conclusions across various fields.
Biostatistics: Biostatistics is the application of statistical methods to biological and health-related data, used for collecting, analyzing, interpreting, and presenting information in fields like medicine, public health, and genetics.

2 of 19

Intro To Biostatistics

Data: A collection of individual facts, numbers, measurements, or observations that are collected for analysis.

Datum: Datum is a single piece of information, while data is a collection of these pieces

Variable: Any characteristic, number, or quantity that can be measured or counted and that may change or vary in value among different individuals or over time. Examples include age, gender, height, or test scores. In computer science, variables are containers for storing data values

5 of 19

Classificationof data (1)�

Quantitative Data:

Data that can be measured numerically.

Examples: Height, weight, age, or the number of bacteria colonies on a plate.

Qualitative Data

Data that represents categories or qualities and cannot be measured numerically.

Examples: Gender, eye color, stage of a disease (e.g., I, II, III, IV), or blood type

7 of 19

Classificationof data (2)�

Nominal Data: Data that can only be categorized into names or labels without an inherent order or rank. You can count the frequency of each category, but not order them meaningfully.
Examples: Gender (male, female), hair color (blonde, brown, black), nationality.

Ordinal Data: Data with categories that have a natural, meaningful order or ranking, but the differences (intervals) between the values are not quantifiable or necessarily equal.
Examples: Customer satisfaction ratings (poor, fair, good, excellent), educational levels (high school, college, graduate), letter grades

8 of 19

Quantitative Data

Discrete Data: Data that can only take on distinct, separate values, typically whole numbers, obtained by counting.
Examples: The number of students in a class, the number of cars in a parking lot, the number of phone calls received.
Continuous Data: Data that can assume any value within a specific range or interval and can be infinitely subdivided into finer levels of precision, obtained by measurement.
Examples: Height, weight, temperature, time

9 of 19

Dichotomous polytomous

Meaning	Divided into two parts or values	Divided into more than two parts or values
Example Variable	A Yes/No question; Gender (Male/Female)	A rating scale (e.g., Strongly Disagree, Disagree, Agree, Strongly Agree)

Dichotomous refers to something with only two possible values, while

Polytomous has more than two

This distinction applies to variables, items, and scoring methods: a dichotomous variable has two categories (e.g., Yes/No), while a polytomous variable has multiple

Dichotomous Polytomous

10 of 19

Levels of measurement

Levels of measurement, also called scales of measurement, tell you how precisely variables are recorded.

There are 4 levels of measurement:

Nominal: the data can only be categorized
Ordinal: the data can be categorized and ranked
Interval: the data can be categorized, ranked, and evenly spaced
Ratio: the data can be categorized, ranked, evenly spaced, and has a natural zero.

14 of 19

Data collection and types�

Primary Data:

Original data collected for the first time from a population for a specific study.

Secondary Data:

Data that has already been collected by another agency or for a different purpose and is then used for a new analysis

15 of 19

Advantages Of Primary Data

Highly Relevant and Specific: The data is tailored precisely to the researcher's needs and research questions, ensuring it is highly applicable to the problem at hand.

Accuracy and Reliability: The researcher has full control over the data collection methods and quality, leading to a higher degree of confidence in the accuracy and consistency of the data.

Up-to-Date: Because the researcher collects the data in real-time, the information is current and reflects the latest trends or conditions.

Proprietary Information: The data is unique to the research and not available to competitors, which can provide a competitive edge

16 of 19

Disadvantages Of Primary Data

Time-Consuming: The process of designing, implementing, and analyzing primary data can be a lengthy process, from months to years.

Costly: Collecting original data can be expensive, requiring significant resources in terms of manpower, materials, and funding for recruitment and fieldwork.

Resource-Intensive: Requires expertise in research design and data collection to avoid bias or data quality issues.

Limited Scope: Due to resource constraints, primary data collection may be limited to a specific population or location, potentially limiting the generalizability of findings

17 of 19

Advantages Of Secondary Data

Cost-Effective: Utilizing existing data is much cheaper than collecting new data from scratch.

Time-Saving: Data is immediately available, allowing for quick insights and a faster start to the analysis phase.

Broad Scope and Context: Secondary data often provides large-scale, historical, or longitudinal data (e.g., census data), which would be impossible for an individual researcher to gather alone.

Foundation for Primary Research: Secondary research can help define research questions, identify gaps in existing knowledge, and provide a solid background before beginning costly primary research

18 of 19

Disadvantages Of Secondary Data

May Not Be Specific: The data was collected for another purpose and may not perfectly align with the current research needs, potentially requiring the researcher to adjust their research question.

Data Quality Concerns: The researcher has no control over the original data collection methodology, sampling, or quality control, which could introduce unknown biases or errors.

Outdated Information: The data might be old and not reflect current market conditions or trends, which is particularly problematic in fast-moving fields.

Lack of Control: Limited control over the data format, structure, or content, which might make analysis or integration with other datasets challenging.

1 of 19

2 of 19

3 of 19

4 of 19

5 of 19

6 of 19

7 of 19

8 of 19

9 of 19

10 of 19

11 of 19

12 of 19

13 of 19

14 of 19

15 of 19

16 of 19

17 of 19

18 of 19

19 of 19