SEMINARS


Surveys and sampling

Faisal Awartani

If someone wanted to assess the effects of a program on children’s immunization or of a health education program (e.g., percentage of people having diabetes or cancer in Palestine), or study the relationship between the Palestinian Legislative Council (PLC) and the Palestinian people, what sort of process would he follow?

There are several tools to assess program impacts. Rapid surveys are used to assess whether a program is worth undertaking; it is taking a first glance at the situation, for example, asking how many people suffer from diabetes and whether it is useful to run an awareness campaign.

Another common tool is the household survey, which requires the following preparatory and processing steps:

Specification of the objectives

Decision on what indicators should be used

Development of an outline for the survey report (reporting scheme)

Design of the data collection instrument

Collection of data

Data entry and verification

Analysis and interpretation of the results

Development of an action plan

Designing a Questionnaire

A household survey consists of different variables on which the research is based (main research variables):

Background variables

Research questions’ variables

Secondary questions

The number of questions varies depending on the topic under investigation and the research team, but as a general rule it can be said that there should be no more than 40-45 questions. The quantity of collected information is not as important as the quality.

Typical background variables for a household survey are as follows:

Demographic characteristics such as sex, age, educational level

Place of residence

Refugee/non-refugee status

Regions (West Bank, Gaza)

Occupation

Income and expenditure factor (gives an idea about the economic status)

Lifestyle

Number of rooms in the house

Number of people living in the household

Whether the house is rented or not

Electrical appliances available (gives an idea about the wealth index)

Research questions are direct questions, such as: "How do you evaluate the performance of the PLC?" This is a monitoring type of question and the person usually chooses from among a set of possible answers, for example:

Very good
Good
Middle
Bad
Very bad

The five-point scale is quite common and considered one of the best scales for evaluating goods and services.

In order to study the prevalence of diabetes, the researcher will have to go to different households and ask the people if they have diabetes or not. It is very important to know the target group that is addressed (in this particular instance usually people over 30 and under 65), not least because the questionnaire will be designed accordingly.

The questions in the questionnaire must be numbered (preferably in a sequence from Q01 to Q40 for example), which is very important for the subsequent data entry, in which the various answers to each question are also numbered.

The questionnaires themselves must also be numbered so that they can be traced and referred to. This is particularly important in cases where problems in the data itself occur. Two kinds of errors are possible:

- measurement errors, which are rather difficult to correct, and

- data entry errors, which are easier to correct.

Another important number is that of the research team member who filled in a particular questionnaire, because knowing this helps those in charge to control the quality of the results obtained.

After the questionnaire is prepared and the questionnaire variables are defined, the survey team is ready to go to the field. But, before that the field researchers must be trained; this is very important. For example, if a survey on the health services or other kinds of services is conducted, it is important that the field researchers interpret the questions in the same manner. Even when the questions are straightforward, the field researchers might interpret them differently, so it is important that they understand the questions well and that they ask questions about the things that they do not understand. It is therefore very important to clarify to the field researchers the meaning of each of the questions.

Once the field researchers have been trained and prepared, they will go into the field to do a pilot study, which has the following aim:

To identify whether there are any problems in the questionnaires

To adjust the instruments, based on the result of the pilot study, for the actual research project.

Developed countries conduct a comprehensive population census every eight to ten years; other, poorer countries conduct one less frequently due to the high costs this involves. The last census carried out by the Palestinian Central Bureau of Statistics (PCBS) aimed at gathering information on the socioeconomic structure in Palestine. Its cost amounted to some $8.5 million. After a census is concluded, the problem is the utilization of the collected information. The Palestinian census was a very difficult process because of the lack of cooperation and coordination among those involved, for example between the PCBS and the ministries. So the question was whether the money spent on the census was wasted or spent effectively.

Household surveys require a multistage sampling procedure consisting of the following different stages:

  1. Selecting one or more population location(s)

  2. Selecting one or more cluster(s) from the population location(s)

  3. Selecting one or more household(s) from the cluster(s)

  4. Selecting the respondent(s) within the household(s) (usually from among the members aged 18 and over)

The accuracy of a survey (e.g., to evaluate the health services) is determined by the margin of error, a statistical mean that is usually calculated by the computer. For a given population, the following two parameters are of relevance:

Sample information

Population information.

It can be said that a survey of a sample of 1,000 people would be representative for a population of two million with a margin of error of more or less 3 percent. With such a margin of error results are considered valid. The small sample of 1,000 people must have been selected by a multistage sampling using a cluster sample in order to be representative. One could, for example, choose 100 people from ten different clusters (= 1,000 people). It has become a standard to select 100 clusters and to choose 15 people from each in order to ensure that the survey results have a margin error of more or less 3 percent.

The sample size of a cluster has nothing to do with the population size. The inter-class population helps in measuring the homogeneity among the different clusters. For the selection of population locations the probability proportional to size sampling (PPS) is used. The following table will illustrate this, supposing that there are six towns (T1, T2, …., T6),of which three shall be chosen:

Towns

Size of population

Cumulative Sum

T1

1,000

1,000

T2

1,500

2,500

T3

2,000

4,500

T4

2,500

7,000

T5

3,000

10,000

T6

3,500

13,500

The process of selecting a sample from among a given total (here the towns) is as follows:

First, the cumulative sum must be calculated.

Then – in order to select a sample - the size of the population in each town must be considered, as well as the following:

  1. The increment size: in total, there are 13,500 people and three locations shall be chosen, i.e., 13,500 divided by three = 4,500.

  2. Then a random number is chosen either by the computer or by using a random number generator to select any number between zero and 4,500. For example, if the resulting number is 1,700, then one looks in the cumulative sum for a figure close to 1,700, i.e., 2,500, which corresponds to T2, so T2 is chosen.

  3. Then 1,700 is added to 4,500 = 6,200, which is close to 7,000, so T4 is chosen.

  4. Then 6,200 is added to 4,500 = 10,700, which is close to 13,500, so T6 is chosen.

  5. Thus, the locations T2, T4 and T6 are selected.

This method of sampling helps in obtaining a self-rating sample, i.e., the estimates are unbiased.

Other types of sampling include:

Random sampling: There is a set of records from which a random sample is to be chosen. For example, if there is a set of 300 records in a clinic and 100 samples should be choosen (n = 100), the interval for the selection is calculated at 300 divided by 100 = 3, which means that from every three records one is selected.

Systematic sampling.