| A statistical simulation and analysis, aimed at A
level statisticians. Developed from an idea by Jo Tomalin
(email) and Dave Cassell.
200
hundred dandelion seeds are scattered at random on a 10 by 8 patch of soil.
They all grow. The patch is then subdivided into 80 plots each measuring 1 by 1
and the number of dandelions in each plot is counted. How many plots would you
expect to have no dandelions in? How many with just one dandelion? What kind of
distribution is involved? How could you model it?
The answer is... find out by growing them on your
Sharp Graphical Calculator, model EL9600! Through programming, the calculator
can simulate the planting of the seeds, the splitting up into 80 plots and the
counting of seeds in each plot. Then the data can be processed into a frequency
distribution and finally modelled by either the binomial or poisson probability
distributions. From the student's point of view, there are many aspects of this
process to appreciate. The programming itself is worth studying, and the
various stages of the simulation since they are all key elements in any
statisics course. A good way of presenting this simulation to a class is to use
the OHP screen that is available for the EL9600.
First a program is required to simulate the
planting of the seeds.
ClrG
ClrDraw
DrawOFF
Input T
81->dim(L1)
81->dim(L2)
0->Xmin
10->Xmax
0->Ymin
8->Ymax
0->N
Label LOOPN
N+1->N
N->L1(N)
0->L2(N)
If N<80 Goto LOOPN
0->M
Label LOOPM
M+1->M
(10*random)->A
random->B
(8*random)->B
PntOn(A,B)
(10*(ipart B)+(ipart A)+1)->N
L2(N)+1->L2(N)
If M< T Goto LOOPM
Wait
0->N
Label LOOPP
N+1->N
Line(0,N,10,N)
If N<8 Goto LOOPP
0->N
Label LOOPQ
N+1->N
Line(N,0,N,8)
If N<10 Goto LOOPQ
Wait
|
To run the program, first enter PlotsOff, then
enter the PRGM screen and execute this program.
It will prompt you for a value for T, say 200, which
represents the number of dandelions to be planted. The program clears all lists
and displays and sets up the axes. It sets up list L1 to contain the plot
numbers 1 to 80. A loop is then run that creates and plots random points (the
dandelions) in the 10 by 8 grid and keeps a tally of how many points there are
in each of the 80 plots. (The EL9600 seems to have a kind of bug with its
random number generator here, which is why there is that extra
random->B line. See Some oddities.)

When all the points have been displayed, the grid showing
the 80 plots is added.

|
max(L2)+1->P
P->dim(L3)
P->dim(L4)
0->N
Label LOOPR
N+1->N
N-1->L3(N)
0->L4(N)
If N<P Goto LOOPR
0->N
Label LOOPS
N+1->N
L2(N)+1->Q
L4(Q)+1->L4(Q)
If N<80 Goto LOOPS
max(L4)+5->Ymax
max(L3)+1->Xmax
DrawON
Plt1(Hist,L3,L4)
DispG
Wait
|
This section of the program uses lists L3 and
L4 to create a frequency count of the data in lists L1 and L2. The data is then
displayed as a histogram.

|
T/80->W
dim(L3)->dim(L5)
dim(L3)->dim(L6)
max(L3)->N
W/N->P
80*pdfbin(N,P,L3)->L6
L3+.5->L5
Plt2(xyLine.,L5,L6)
DispG
Wait
80*pdfpoi(W,L3)->L6
Plt2(xyLine.,L5,L6)
DispG
Wait
|
Finally this section works out the theoretical
expected frequencies, based first on the Binomial distribution and secondly the
Poisson distribution. The number of dandelions planted (T)divided by the
number of plots (80) gives the average number of dandelions per plot, stored as
W. For the Binomial distribution, the total number of "successes", N, and the
probability of success, P, are found and then used in calculating the binomial
probabilities. W is used for the first parameter in calculating the Poisson
probabilities. (Note that 0.5 is added to the values in list L3 in order than
the points are plotted in the middle of each bar of the histogram.)
 
|
|
|
The simulation can be run many times for various values of
T. The illustrations above show an example of T=400. Higher values should
provide a better "fit". If this does not happen, then you can ask why. Either
the model is wrong or the data isn't randomly distributed. Here are some more
displays comparing the results with the expected Poisson frequencies:
The Poisson distribution is usually found to give a better "fit"
than the Binomial, particularly for large values of T, which couuld be
measured, of course, using the "Chi-squared" distribution.
The simulation could be developed to include considerations
like:
- some dandelions dying; a probability of survival could be
associated with each seed
- a dandelion's death affecting its neighbours; dandelions
within say 0.1 units from a dead seed having an increased probability of
dying
- various types of soil in the different plots; a probability
of survival could be assocated with each plot.
- inclusion of the "chi-squared" test for goodness of fit.
(Thanks to
Prof. Dr. L. Paditz for checking the program listing. At
his site you will find a version of this program for the Casio CFX-9850G
PLUS. Jo Tomalin has adapted the program and teaching materials for a Casio
7400 and the TI80 and can be emailed for a copy.)
|