Intro To Biostatistics
Intro To Biostatistics
Data: A collection of individual facts, numbers, measurements, or observations that are collected for analysis.
Datum: Datum is a single piece of information, while data is a collection of these pieces
Variable: Any characteristic, number, or quantity that can be measured or counted and that may change or vary in value among different individuals or over time. Examples include age, gender, height, or test scores. In computer science, variables are containers for storing data values
Classificationof data (1)�
Data that can be measured numerically.
Examples: Height, weight, age, or the number of bacteria colonies on a plate.
Data that represents categories or qualities and cannot be measured numerically.
Examples: Gender, eye color, stage of a disease (e.g., I, II, III, IV), or blood type
Classificationof data (2)�
Quantitative Data
Dichotomous polytomous
Meaning | Divided into two parts or values | Divided into more than two parts or values |
Example Variable | A Yes/No question; Gender (Male/Female) | A rating scale (e.g., Strongly Disagree, Disagree, Agree, Strongly Agree) |
Dichotomous refers to something with only two possible values, while
Polytomous has more than two
This distinction applies to variables, items, and scoring methods: a dichotomous variable has two categories (e.g., Yes/No), while a polytomous variable has multiple
Dichotomous Polytomous
Levels of measurement
Levels of measurement, also called scales of measurement, tell you how precisely variables are recorded.
There are 4 levels of measurement:
Data collection and types�
Original data collected for the first time from a population for a specific study.
Data that has already been collected by another agency or for a different purpose and is then used for a new analysis
Advantages Of Primary Data
Highly Relevant and Specific: The data is tailored precisely to the researcher's needs and research questions, ensuring it is highly applicable to the problem at hand.
Accuracy and Reliability: The researcher has full control over the data collection methods and quality, leading to a higher degree of confidence in the accuracy and consistency of the data.
Up-to-Date: Because the researcher collects the data in real-time, the information is current and reflects the latest trends or conditions.
Proprietary Information: The data is unique to the research and not available to competitors, which can provide a competitive edge
Disadvantages Of Primary Data
Time-Consuming: The process of designing, implementing, and analyzing primary data can be a lengthy process, from months to years.
Costly: Collecting original data can be expensive, requiring significant resources in terms of manpower, materials, and funding for recruitment and fieldwork.
Resource-Intensive: Requires expertise in research design and data collection to avoid bias or data quality issues.
Limited Scope: Due to resource constraints, primary data collection may be limited to a specific population or location, potentially limiting the generalizability of findings
Advantages Of Secondary Data
Cost-Effective: Utilizing existing data is much cheaper than collecting new data from scratch.
Time-Saving: Data is immediately available, allowing for quick insights and a faster start to the analysis phase.
Broad Scope and Context: Secondary data often provides large-scale, historical, or longitudinal data (e.g., census data), which would be impossible for an individual researcher to gather alone.
Foundation for Primary Research: Secondary research can help define research questions, identify gaps in existing knowledge, and provide a solid background before beginning costly primary research
Disadvantages Of Secondary Data
May Not Be Specific: The data was collected for another purpose and may not perfectly align with the current research needs, potentially requiring the researcher to adjust their research question.
Data Quality Concerns: The researcher has no control over the original data collection methodology, sampling, or quality control, which could introduce unknown biases or errors.
Outdated Information: The data might be old and not reflect current market conditions or trends, which is particularly problematic in fast-moving fields.
Lack of Control: Limited control over the data format, structure, or content, which might make analysis or integration with other datasets challenging.