澳洲論文代寫 > Statistics Assignment > > 正文

## Statistics Assignment

•

•Chapter 2

•Types of data, data collection and sampling

•Chapter outline

2.1 Types of data2.2 Methods of collecting data

2.3 Sampling

2.4 Sampling plans

•Learning objectives

LO1 Describe different types of dataLO2 Understand the primary and secondary sources of statistical data

LO3 Explain the various methods of collecting data

LO4 Explain the basic sampling plans

LO5 Identify the appropriate sampling plan for data collection in a particular experiment

•Introduction and re-cap…

Descriptive statisticsinvolves arranging, summarising, and presenting a

__set of data__in such a way that useful

__information__is produced.

Its methods make use of graphical techniques and numerical descriptive measures (such as averages) to summarise and present the data.

•Populations and samples

The graphical and tabular methods presented here apply to both entire populations and samples drawn from populations.
•2.1 Types of data

DefinitionsA variable is some characteristic of a population or sample.

E.g. student marks.

A variable is typically denoted with a capital letter: X, Y, Z…

The values of the variable are the range of possible values for a variable.

E.g. student marks (0,…,100)

Data are the observed values of a variable.

E.g. student marks: {67, 74, 71, 83, 93, 55, 48}

•Types of data…

Data (at least for purposes of Statistics) fall into three main groups:Numerical data

Nominal Data

Ordinal Data

•Numerical data

Numerical dataThe values of numerical data are real numbers.

E.g. heights, weights, prices, waiting time at a medical practice, etc.

Arithmetic operations can be performed on numerical data, thus its meaningful to talk about 2*Height, or Price + $1, and so on.

Numerical data are also called quantitative or interval.

•Nominal data

Nominal DataThe values of nominal data are categories.

E.g. Responses to questions about marital status are categories, coded as:

Single = 1, Married = 2, Divorced = 3, Widowed = 4

These data are categorical in nature; arithmetic operations don’t make any sense (e.g. does Married ÷ 2 = Divorced?!)

Nominal data are also called qualitative or categorical.

•Ordinal data

Ordinal DataOrdinal data appear to be categorical in nature, but their values have an order; a ranking to them:

E.g. University course evaluation system:

Poor = 1, fair = 2, good = 3, very good = 4, excellent = 5

While its still not meaningful to do arithmetic on this data (e.g. does 2*fair = very good?!), we can say things like:

excellent > poor or fair < very good

That is, order is maintained no matter what numeric values are assigned to each category.

Ordinal data are also called ranked.

•Types of data – Examples

•Calculations for types of data

As mentioned above,
•All calculations are permitted on numerical data.

•No calculations are allowed for nominal data, except counting the number of observations in each category and calculating their proportions.

•Only calculations involving a ranking process are allowed for ordinal data.

This lends itself to the following ‘hierarchy of data’…

•Hierarchy of data

Numerical
•Values are real numbers.

•All calculations are valid.

•Data may be treated as ordinal or nominal.

Nominal
•Values are the arbitrary numbers that represent categories.

•Only calculations based on the frequencies of occurrence are valid.

•Data may not be treated as ordinal or numerical.

Ordinal
•Values must represent the ranked order of the data.

•Calculations based on an ordering process are valid.

•Data may be treated as nominal but not as numerical.

•Other forms of data

Cross-sectional data is collected at a certain point in time across a number of units of interest
•marketing survey (observe preferences by gender, age)

•students’ marks in a statistics course exam

•starting salaries of graduates of an MBA program in a particular year.

Time-series data is collected over successive points in time
•weekly closing price of gold

•monthly tourist arrivals in Australia.

•2.2 Methods of collecting data

Recall,Statistics is a tool for converting data into useful information:

•

•Data quality

The reliability and accuracy of the data affect the validity of the results of a statistical analysis.The reliability and accuracy of the data depend on the method of data collection.

There are many methods used to collect or obtain data for statistical analysis.

•Sources of data

Four of the most popular sources of statistical data are:
•Published data

•Data collected from observational studies (Observational data)

•Data collected from experimental studies (Experimental data)

•Data collected from surveys

•Published data

This is often a preferred source of data due to low cost and convenience.Published data is found as printed material, tapes, disks, and on the Internet.

Types of published data

§Primary data

§Secondary data.

•Published data…

Primary dataData published by the organisation that has collected it is called primary data.

E.g. Data published by the Australian Bureau of Statistics (ABS).

Secondary data

Data published by an organisation different from the one that was originally collected and published is called secondary data.

E.g. 1. The Yearbook of National Accounts Statistics (United Nations, New York), compiles data from primary sources of various country departments of statistics, like ABS in Australia;

2. Compustat sells a variety of financial data tapes compiled from several primary sources.

•Observational and experimental data

When published data is unavailable, one needs to conduct a study to generate the data.
•Observational study is one in which measurements representing a variable of interest are observed and recorded, without controlling any factor that might influence their values

–e.g. measuring the height of a tree in the rainforest over time.

•Experimental study is one in which measurements representing a variable of interest are observed and recorded, while controlling factors that might influence their values

–e.g. measuring the yield of different type of rice using a certain amount of fertilizer (control factor).

•Surveys

A survey solicits information from survey participants;e.g. Gallup polls; pre-election polls; marketing surveys.

The response rate (i.e. the proportion of selected participants who completed the survey) is a key survey parameter.

Surveys may be administered in a variety of ways,

e.g.

•Personal interview

•Telephone interview

•Self-administered questionnaire.

•Questionnaire design

Over the years, a lot of thought has been put into the science of the design of survey questions. Key design principles:
§Keep the questionnaire as short as possible.

§Ask short, simple and clearly worded questions.

§Start with demographic questions to help respondents get started comfortably.

§Use dichotomous and multiple choice questions.

§Use open-ended questions cautiously.

§Avoid using leading questions.

§Pretest a questionnaire on a small number of people.

§Think about the way you intend to use the collected data when preparing the questionnaire.

•

•2.3 Sampling

•Sampling…

•Sampling…

•2.4 Sampling plans

A sampling plan is just a method or procedure for specifying how a sample will be taken from a population.Most commonly used sampling plans,

•Simple random sampling

•Stratified random sampling

•Cluster sampling.

•Simple random sampling

A simple random sample is a sample selected in such a way that every possible sample of the same size is equally likely to be chosen.For example, drawing three names from a hat containing all the names of the students in a class of 200 is an example of a simple random sample: any group of three names is as equally likely as picking any other group of three names.

•Simple random sampling…

To conduct simple random sampling…
•assign a number to each element of the chosen population (or use already given numbers),

e.g. Medicare card number of each Australian resident
•randomly select the sample numbers (members) using a random numbers table, or a software package.

•Example 1

(Example 2.3, p30)

A government income-tax auditor is responsible for 1,000 tax returns. The auditor will randomly select 30 returns to audit. Use Excel’s random number generator to select the returns.(Example 2.3, p30)

Solution:

We generate 50 numbers between 1 and 1,000 (we need only 30 numbers, but the extra numbers might be used if duplicate numbers are generated.)

•

Use Excel to generate 50 random numbers between 1 and 1000.
•Stratified random sampling

A stratified random sample is obtained by separating the population into mutually exclusive sets (or strata), and then drawing simple random samples from each stratum.
•Stratified random sampling…

•Stratified random sampling…

•Stratified random sampling …

After the population has been stratified, we can use simple random sampling to generate the complete sample:
•Cluster sampling

Cluster sample is a simple random sample of groups or clusters of elements (vs. a simple random sample consists of individual objects).This procedure is useful when

§it is difficult and costly to develop a complete list of the population members (making it difficult to develop a simple random sampling procedure.

§the population members are widely dispersed geographically.

•Cluster sampling…

Cluster sampling may increase sampling error, because of probable similarities among cluster members.For example, to draw a cluster sample of residents in the Brisbane city area, first select a number of streets in the Brisbane city area using a simple random sampling method and then include all residents in those selected streets to form the cluster sample.

•Sample size

Numerical techniques for determining sample sizes will be described later, but it is sufficient to say that the larger the sample size, the more accurate we can expect the sample estimates to be.
•

tag：