Presentation on�Introduction to Statistics�Presented by�Mr. Somnath D. Karande�Department of Statistics�Raje Ramrao Mahavidyalaya, Jath�Dist- Sangli
History: The existence of statistics is as old as the existence of human society. Statistics is in use from the time when man begain to count and measure.
The word ‘Statistics’ seems to be derived from the Italian word ‘Statista’ or the Latin word ‘Status’ or the German word ‘Statistik’ or the French word ‘Statistique’. All these words have same meaning ‘Political state’.
Prof. Gottfried Achenwall used the word ‘Statistik’ at German university in 1749. In 1771, W. Hooper (Englishman) used the word Statistics in his translation of the famous book ‘Elements of Universal Erudition’ by Baran F. Von. Bieford.
Meaning: The word ‘Statistics’ has three different meanings (sense) i) Plural Sense, ii) Singular Sense and iii) Plural of the word ‘Statistic’, which are discussed below.
i) Plural Sense: In plural sense, the word ‘statistics’ refers to numerical facts and figures collected in systematic manner with a definite purpose in any field of study. In this case statistics are also aggregates of facts which are expressed in numerical form. For example, statistics on industrial production, population of a country in different years, income, imports and exports etc.
ii) Singular Sense: In singular sense, it refers to the science subject ‘Statistics’, whose principles and methods are used in collection, analysis, interpretation and presentation of data. These methods are used to draw conclusion about the population parameter.
iii) Plural of the world ‘Statistic’: The word ‘Statistics’ is also used as plural of the word ‘Statistic’ which refers to numerical quantity like Mean, Median, Mode, Standard deviation, etc., calculated from sample values.
In brief, statistics is used in both Singular and Plural form, It is also used to represent i) Numerical data, ii) Statistical science subject and iii) Statistical measure, which is clear from ‘Tate’s’ this statement “you compute statistics, by statistics from statistics.”
The importance of statistics will be clear from the following points.
Thus statistics reveals several aspects of phenomena.
Statistics has been defined in different ways by different writers. Generally the statistics may be defined as “Statistics is the branch of science which deals with the study of collection, classification, analysis and interpretation of the numerical data.”
Statistics plays a vital role in every field of human activity. Statistics has important role in determining the existing position of per capita income, unemployment, population growth rate, housing, schooling, medical facilities etc. in a country. Statistics is indispensible for bankers, brokers, insurance companies and in almost all fields.
In this age of cut-throat competition, the industrialists are using statistical procedures to maximize profit. They collect, analyse data relating to sales, raw materials, wages and salaries. And expert consult on it by using statistical methods such as measures of central tendency and dispersion, correlation and regression analysis, index numbers, sampling, statistical quality control etc.
Industry makes use of statistics at several places such as administration, planning, production, growth and development. In many industries ‘Statistical Quality Control’ division is separately operating. Mainly, whether manufactured goods possess a desirable standard or not is examined using various control charts. On-line process capability study is conducted to set up the machines to give desired standards. Moreover purchased goods or semi-finished goods are inspected using acceptance sampling plans of various types.
Statistical techniques have proved immensely useful in the solution of a variety of economic problems. Construction of cost of living and wholesale price index numbers, compilation of national income accounts. Time series analysis and forecasting techniques are some of the very powerful statistical tools used in the analysis of economic data and for framing various economic policies. Following are some of the fields of economics where statistics is extensively used.
Sampling, Time series, Index Numbers, Probability, Correlation and Regression are some other concepts which are used in economic analysis.
Now a days, because of the never ending complexity in the business and industrial environment, most of the decision making processes rely upon different quantitative techniques. As far as statistics is concerned, sampling plays an important role in the development of certain criteria. Statistical decision theory focuses on the analysis of complicated business strategies.
A shrewd businessman always makes a proper and scientific analysis of the past records in order to predict the future course of the business conditions, Index numbers help in predicting the future course of business events. Statistics or Statistical methods help the business establishment in analysing the business activities such as:
Statistical techniques are being used in accountancy and auditing. The financial analysis can well be performed on the basis of data related to cost, expenditure, production etc. Invention of Method of Inflation Accounting itself is the indicator of importance of statistics. Correlation and regression analysis, theoretical distributions are some important statistical tools which are being used by accountants. In auditing, sampling techniques are used widely for test checking.
Statistics is indispensable for the government for the proper implementation of its policies. A government needs data on population (to know the man-power) agricultural production, industrial production, consumption, saving, unemployment, literacy etc. because they affect the policies of government in the sphere of taxation, money to be spent upon public works and so on.
The government needs statistics on sales, purchases, personnel, prices etc. for running the public sector industries (such as railways, waterworks, power supply, post offices, schools, hospitals etc.) upon profitable basis.
A government needs the records of army personnel and weapons. Statistical methods are employed to evaluate the performance of every sort of military equipment from bullets used in pistols to huge missiles.
The methodology of statistics is constantly used in research conducted in various fields. For example, in medical research, statistical methods are used to determine the effectiveness of a new drug.
It is impossible to understand the meaning and implications of the research findings in various disciplines of knowledge without having at least a speaking acquaintance with the subject statistics.
There is hardly any branch of knowledge where statistics is not used. Statistics is useful in any area where decisions are made on the bases of incomplete information.
& more.
Population is a group of items or units of objects which is under reference of study. The number of individuals belonging to a population is known as population size and it is denoted by N. Population may consist of finite or infinite number of units.
For example: If we want to study the expenditure habits of the families in a particular city, then the population will consist of all the households in that city. If we want to study the quality of the manufactured product in an industrial concern during the day, then the population will consist of the day’s total production. Thus population may be group of objects, animate like men or inanimate like trees or even group of nonlivings like cars or houses.
Finite population: A population consisting of finite number of individuals (or objects) is called finite population. E.g. Students in the class, The workers in a factory, An industrial production in a day, etc.
Infinite population: A population containing an infinite number of individuals (or objects) is called an infinite population. E.g. trees in the jungle, stars in the sky, all real numbers lying between 10 and 20, etc.
There are two methods for collecting data.
i) Census method
ii) Sampling method
Census Method:
The procedure of collecting data from every member of the population i.e. from entire population is called the Census method or Complete Enumeration or 100 % Inspection.
This method is suitable when the scope of inquiry is limited, or when the universe contains units having different characteristics, or when the greatest accuracy is expected and the resources in hand of investigator are sufficient.
Limitations of census method:
1) Census method provides reliable results; but due to voluminous work, it is expensive and time consuming. It requires a large amount of manpower.
2) There are some situations where census is possible but impracticable. For example, testing blood of an individual. In this case, entire blood cannot be tested. Thus census cannot be used here. Similarly in testing explosives, testing of average life of electric bulb produced in a lot, testing strength of construction material, census method cannot be used.
3) If the population is infinite, census cannot be used.
Because of these limitations, we take the help of sampling.
Sample: Sample is any part or any fraction of a population selected some basis. The number of individuals in a sample is called sample size or sample number and it is denoted by n.
Sampling: The procedure of collecting data from a sample is called sampling or sampling method.
From the study of the sample, we draw inferences about the entire population. Sampling is quite often used in our day-to-day practical life. For example, in a shop we assess the quality of sugar, wheat or any other commodity by taking a handful of it from the bag and then decide to purchase it or not. A housewife normally tests the cooked product to find if they are properly cooked and contain the proper quantity of salt.
The main advantages / merits of sampling method over census method are as follows:
(a) When it comes to destructive sampling where the quality of an article can be determined only by destroying the article in the process of testing e.g. testing of crackers and explosives, testing the life of an electric bulb or tube, testing the quality of milk or chemical salt by analysis. If we use census method, none will be available for sale or use. So in such cases sampling is the only way.
(b) If the population is too large or infinite or spread over a large geographical area, e.g. trees in a jungle, the number of fishes in the sea, it becomes very difficult to collect information about any unit. In this case sampling is the only method.
b) Simple Random Sampling Without Replacement (SRSWOR):
There is another procedure of selecting units, in which units are selected one by one from the population without replacement, that is the units once drawn are not replaced back to the population. This method of selecting sample is called as simple random sampling without replacement (SRSWOR). In this method population size goes on decreasing at each draw. The number of possible samples of size n out of N population units without replacement is NCn.
This process is done when the population is infinitely large. The drawback of getting the same element selected more than once is overcome in SRSWOR.
When population is composed of different groups or classes, then simple random sampling does not give proper representative sample. In this case stratified random sampling can be used.
Stratification means division into layers or groups or classes. Some of the factors which are used for stratification are age, sex, occupation, educational level, geographical area, economic status etc.
In stratified random sampling, items of each group are included in the right proportion. When group totals are known, we divide population of size N into K groups of sizes N1, N2…… Nk respectively, such that Σ Ni = N. Suppose we want to take sample of size n units, then we select simple random samples (without replacement) of sizes n1, n2,…… nk units from the respective groups proportional to group sizes, such that Σ ni = n. Here the sample of n = Σni, units is known as ‘Stratified Random Sample’ and the technique of selecting such a sample is known as ‘Stratified Random Sampling’.
The systematic sampling technique is operationally more convenient than the simple random sampling. It also ensures at the same time that each unit has equal probability of inclusion in the sample. In this method of sampling, the first unit is selected with the help of random numbers and the remaining units are selected automatically according to a predetermined pattern. This method is known as systematic sampling.
Suppose the N units in the population are numbered 1 to N in some order. Suppose further that N is expressible as a product of two integers n and k , so that N = nk.
To draw a sample of size n,
- select a random number between 1 and k.
- Suppose it is i.
- Select the first unit whose serial number is i.
- Select every kth unit after ith unit.
- Sample will contain i, i + k, 1+ 2k…... i + (n - 1)k serial number units.
So first unit is selected at random and other units are selected systematically. This systematic sample is called kth systematic sample and k is termed as sampling interval. This is also known as linear systematic sampling.
Data: Data is organized information. It can numbers, words, measurement, observation even just a description of things.
Types of data:
There are two types of data
1) Primary data
2) Secondary data
1) Primary data: Primary data are those which are collected from the unit or individuals directly and this data have never been used for any purpose earlier.
Methods of collecting the Primary data are,
i) Direct Personal Observation or Interview
ii) Indirect Personal Investigation
iii) Information from correspondents
iv) Mailed Questionnaire
v) Questionnaire sent through enumerates
2) Secondary data: The data which had been collected by some individual or agency and statistically treated to draw certain conclusions again the same data are used and analyzed to extract some other information are treated as a Secondary data.
The main sources of secondary data are,
i) Government Publications
ii) Publications of Foreign Governments & International Bodies
iii) Publications of Private Organisations
iv) Bulletin, Journal, Magazines etc
v) Publications of Educational & Research Institutes
Further the variable can be divided into two categories
1) Discrete variable
2) Continuous variable
A variable that can only take certain values is called as discrete variable.
e.g. Number of students in a class, number of articles produced by a machine, population of country, number of workers in a factory etc. are discrete variables.
Most of the discrete variables have integral values.
2) Continuous variable:
A variable that can take any value within a range is called as continuous variable.
e.g. Weight of a person, Height of a person, length of screw produced by a machine, temperature at a certain place, agriculture production, electricity consumption of a family, speed of vehicle are the examples of continuous variable.
Qualitative data tends to be reported in nominal and ordinal scale while quantitative data are reported in interval and ratio scale.
1) Nominal scale: Nominal scale consists of two or more named categories into which the objects are classified.
e.g. a) Classification of individuals using blood groups constitutes a nominal scale.
b) Classification of students in various divisions of the same standard also represents nominal scale.
c) Classification of individuals using sex, caste, nationality etc. also represents nominal scale.
Remarks:- 1) In nominal scale if numbers are used, then those are allotted in purely arbitrary manner. Those numbers are just for identification purpose used in place of labels.
2) Those numbers are interchangeable.
2) Ordinal scale: Ordinal scale of measurement gives numbers to groups of objects using some quantitative characteristics. Therefore ordered arrangement of groups is possible in this type of scale.
e.g. a) Groups of individuals according to income such as poor, middle class, rich.
b) Groups of students according to grades in examination such as fail, second class, first class, first class with distinction.
c) Groups using weight such as light, heavy
d) Groups using height such as short, medium, tall are all situations where ordinal scale can be used.
Remark:- In the ordinal scale, numbers given to groups as labels serve the purpose of rank. Hence those labels are not interchangeable.
4) Ratio scale: In this scale a variable is measured in well-defined unit starting with zero.
For example, height, weight, time, volume, area, age etc. are measured in ratio scale. In all these scales the measurement starts with zero. Unlike interval scales by ratio scale we measure the quantities and not the level. A man of 120 kg is twice heavy as a man 60kg. A man of 80 years is twice as old as a man of 40 years.
Thank you