Statistical Analysis of Rainfall
Special Note: You should read the Statistical Analysis and Geometry section in the Water Management module before coming to the lab.
Each group will analyze rainfall data collected from a particular location in the US Virgin Islands. Your lab instructor will assign to your group the specific location. The data can be accessed through the links given below. The instructions below outline how to analyze the data using Corel Quattro Pro software. If you are familiar with and prefer Excel, it is also available.
Getting Started
If you are not presently reading this lab exercise from an Internet browser, you will need to do so in order to access the data. This lab is called "Rainfall Statistics Lab" and can be found from the Schedule link on the SCI 301 Website.
Open the Corel Quattro software. On most of the lab computers, this can be done by clicking on the calculator icon in the menu bar at the bottom of the screen. Once the program is running, you should have a standard spreadsheet before you. Save this sheet under a file name that unambiguously identifies the location of the data and your group members. (For example, if Deron and Kelolo are analyzing Wintberg data, Wint-DeKe would make a nice name.) You may save it directly onto the hard drive, but it is recommended that you also save it to your personal floppy disk.
Once your instructor specifies the data set for your group, you can click on the appropriate link to bring the data set into the browser window. The data before you will be in 3 column frames, one for the year, one for the month, and one for the rainfall.
- Drag the mouse down the first column and copy the data into the clipboard ("Ctrl C").
- Bring up Quattro Pro and paste ("Ctrl V") this data into the first column, which is column A.
- Return to the data in the browser and repeat the process for the other two columns (B and C).
- SAVE THE FILE. It is a good practice to periodically save your worksheet throughout the lab exercise. (You may now return and remain on this lab page in the browser for instructions for the remainder of the lab.)
Sorting and Cleaning the Data
First, you will need to do some data housecleaning. In the rainfall column will be at least a few rows with the entry 9999... These entries represents months in which no data was available and must be removed. In addition, every 13th row will be marked ANN in the month column that represents the total rainfall for the previous 12 entries. These must also be removed.
- Note the number of entries (rows) in the columns. (If the number is not the same for all three columns, you have made a cut and paste error and will need to re-do the previous section.) For this example, let's assume there are 300 entries.
- From the menubar choose "Edit" / "Fill" / "Fill Series". In the dialog box specify "d1..d300" for the field marked "Cells". Enter 1, 1, and 300 into the fields marked "Start", "Step", and "Stop" respectively. This will place the numbers 1, 2, 3, etc. sequentially into this column to be used later to re-sort the data chronologically.
- From the menubar choose "Tools" / "Sort". In the dialog box, enter "a1..d300" in the field marked "Cells". This determines the columns to be sorted. (Note that "a1" is interpreted as the 1st entry of column a, etc. The notation ".." means all entries in between.) The data will be sorted either numerically or alphabetically, depending upon which column you designate for sorting.
- We want to order the data in ascending order of rainfall, which is in column C. To do this, enter "c1" in the "1st key" field. Then click the "Sort" button. (The software will use column C for sorting, and it will carry all other columns specified in the "Cells" with this sorting.)
All three columns will now be re-ordered from least rainfall to greatest. Consequently, all the 9999 data will be at the end. The ANN data will probably also be at the end. But you will need to check all the data to be sure. All of the rows to be discarded that are together can be cleared at once by dragging the mouse down the far-left column with the row numbers. Under the Edit menu, select Clear Row(s).
(Note: There is an alternative method for selecting columns for the "Sort" option, that works for most other dialog box entries as well. Click the small box to the right of any field. The dialog box will contract into a small rectangle above the spreadsheet. Drag the mouse over all the data you wish to place in that field, then open the dialog box again. The proper entries should appear in the field. Your instructor can help you with this method.)
At this point, you may wish to save the file again as a safety precaution. Do so under a different name, such as "Wint-DeKe2". You do not want to overwrite the original file containing the unsorted data.
Descriptive Statistics
- From the menu bar, choose "Tools" / "Numeric Tools" / "Analysis" / "Descriptive Statistics".
- In the "Input Cells" of the dialog box you will enter the column that has the rainfall statistics. For example, if the data is in column C and now has 234 total entries, then enter "c1..c234" into the field.
- Enter "e1" into the "Output" field (or the first cell of whatever column you wish for the results to be displayed).
- Click "Finished" and the mean, range, STD and many other statistical quantities will appear directly on the spread sheet, starting at e1.
Frequency Distributions
The first frequency histogram you will make displays the number of months that have rainfall within specified ranges or "bins". Although you may eventually wish to change the bin size, start with bins of width 1". The idea is to count the number of months that have 0-1 inches of rain, 1-2 inches of rain, etc. To do this with the Quattro Pro software, you must first create a new column that specifies the upper limit of each bin. Each cell entry represents the upper limit of that bin, and previous cell is automatically taken as the lower limit.
- The first entry in the new "bin column" will be 1. (The lower limit will be 0 by virtue of the fact there are no rainfall entries less than 0.) The second entry will be 2, the third will be 3, etc. Continue this process until you have just exceeded the largest rainfall in your data. (That value can be found quickly from the Descriptive Statistics output.) It is best to locate the bin column where the column to the immediate right is empty.
- From the menu bar, choose "Tools" / "Numeric Tools" / "Frequency".
- In the "Value Cells" of the dialog box you will enter the column that has the rainfall statistics. As before, enter "c1..c234" into this field.
- Next, specify the bin column. If you put the bin values in column F and the last value was 14, then enter "f1..f14" into the field marked "Bin Cells".
- Click "Finished" and the frequency data should be placed in the column immediately to the right of the bin column, which would be g1 through g14 in this example. Note that each frequency value should correspond to the number of months with rainfall less than the bin value next to it, but greater than the previous bin value. Hence, the frequency value across from the bin value "4", should be the number of months with rain between 3 and 4 inches. Check a few entries to be sure it has been done correctly.
You are now ready to plot a histogram of the data. Note that there are many ways to plot this data. If you are not familiar with Quattro Pro, the following instructions should get you through the process.
- Click on the histogram icon in the upper menubar. (The icon looks like a histogram.) Move the mouse to an empty area on the spreadsheet. Hold the mouse down and drag it down and to the right until you have created a reasonably-sized box for the histogram. (The size of this box is not related to the final print size. So make it a readable size for you.)
- Click on the icon just below the Histogram icon that shows an chart-to-Histogram arrow. In the "x-axis" field of the dialog box you will enter the bin data column, "f1..f14".
- In the "vertical-axis" field of the dialog box you will enter the frequency data column, "g1..g14". The histogram should appear in the box.
- Once your are satisfied with that the histogram is correct, you can change the axes labels and title as needed by clicking on the icon to the right of the icon in the previous step. The options should be easy to follow. Don't forget, axes should be label with units and the title should reflect the axes labels, such as "Frequency vs Rainfall".
Printing the histogram is straighforward. It is recommended that you first select "Print Preview" under the File menu to be sure that the histogram is finalized and no changes or additions are needed. If you are satisfied, then print a copy for each member of your group.
Monthly Averages
In many regions of the Caribbean there are wet months and dry months. A plot of average rainfall vs the month can verify whether this is true for your data set.
- First, re-order the first four columns so that the months are grouped together. You will follow the same general procedure as before, but now the sorting column will be the months. That is, put b1 into the "1st key" field. The software automatically sorts alphanumeric fields alphabetically. (Note the months will be alphabetically sorted, not chronologically.)
- Create a new column with the entries "Jan", "Feb", ... "Dec". This will be used for the horizontal axis of the histogram.
- In the column to the right will be the average rainfall for each month. Note that there will probably be a different number of entries for each of the months. You may find the averages using your calculator and entering them into the column directly. But you can have the software calculate the averages for you using the following steps.
- If you wish the software to calculate the averages, you must first note the range of entries for a particular month. For example, once sorted, the first month in the list will be April. Let's assume that the April rainfall data runs from c1 to c11.
- In the cell to the right of the "April" entry of the new month column, enter the following expression "=avg(c1..c11)". Now hit "return" and the average value of the cells c1 though c11 will appear.
- Repeat the process for the remaining 11 months.
- Create and print a histogram of average rainfall vs month, following the procedure from the previous exercise.
Spreadsheet Water Management
In this section you will use the power of spreadsheets to investigate the effects of water usage, rainfall, roof size and cistern size on water management. Each month a family uses water, collects and stores water. You have data for monthly rainfall, but you will also need to make some assumptions for the other parameters. You will be provided with typical values for the first pass through the analysis. Following that, you will be able to make your own analysis with estimates based upon your own family or personal experience.
- The UVI Extension Office estimates water usage at about 25 gallons per day per person. For this example, we will assume a family of four. The monthly usage, denoted "Usage", will then be 25 x 4 x 30, or Usage ~ 3000 gallons/month.
- A typical 3-bedroom 2-bath house might have a roof area of approximately 40 ft by 40 ft = 1600 ft2. The volume of water collected in any month is the roof area times the "depth" of rainfall for that month. Noting that the rainfall data is given in inches, the total volume collected, in units of cubic feet, would be 1600 x (monthly rainfall /12). Since the usage is in gallons, we will use the conversion 1 ft3 = 7.48 gallons. The number of gallons collected in a month, denoted "Collected", would be 1600 x (monthly rainfall / 12) x 7.48, or
Collected ~ 1000 x (Rainfall in inches) gallons/month
For a month with 3" of rainfall, this would be 3,000 gallons, exactly equal to the usage.
- The Virgin Islands building code dictates that the cistern size should be at least 10 gallons for every square foot of roof area. For our example, that would be 16,000 gallons.
- And finally, there is the starting amount of water in the cistern, denoted "Start". In practice this could be any value from 0 to 16,000 gallons. You will try several values, but begin with a half-full cistern,
Start ~ 8,000 gallons
Using the above estimates, you can calculate the amount of water in the cistern at the end of each month, denoted "Final". It is the simple formula
Final = Start + Collected - Usage = Start + 1000 x (Rainfall in inches) - 3000
For the each month, "Start" will be the "Final" value from the previous month. Although you could do this by hand, the spreadsheet software will make quick work of it.
- First, re-order the first four columns to its original chronological order using Column D, the "Fill" column you created specifically for this purpose.
- Select column D, if it is not already highlighted. From the menu select "Insert" / "Column". This will insert a new column D, moving all others over one. You should now have an empty column D just after the rainfall data in Column C.
- Enter 8000 into the cell d1. This will be the starting cistern value. It represents 8000 gallons of water in the cistern after the first month.
- Into the cell d2, enter the following ... "=d1+1000*c2-3000" and hit return. Examine this formula to be sure you understand how it represents the "Final" formula above. The result will be the water in the cistern after the second month.
- Following this pattern, the next cell, d3, should contain "=d2+1000*c3-3000", and cell d4 should contain "=d3+1000*c4-3000", etc. With hundreds of entries, this would be a tedious task. But the software will do it for you!
- Click into the cell d2 and copy it into the clipboard. You may do this from the edit menu or by pressing the "ctrl" and "c" key simultaneously. Now click into the cell below (d3), hold the shift key down, hold the mouse down and drag all the way to the very last cell with rainfall data. Now paste using the edit menu or by pressing the "ctrl" and "v" key simultaneously. This should insert the formula with all cell references incremented according to their position in the column. Look at several of the entries to ensure that this has happened.
Column D should now be filled with the amount of water left in the cistern at the end of each month. Scan down the column and note the values. The cistern can never be less than empty or more than full. Hence, values outside 0 and 16,000 are not valid. To account for these limitations, you will need to add one more level of sophistication to the analysis. (If this is not the case for your data set, the instructor will make a change so you may proceed with the next section.)
Since the cistern will never contain more than 16,000, all calculated values above this should be set equal to 16,000. (Rain water coming into a full cistern simply overflows and is lost!) There is an easy way to do this. The "min" function, @min(a,b,c, ...) takes the minmum value of the numbers inside the parentheses. Here's how to use this function for our purposes.
Report
Be sure to include a copy of your spreadsheet. For each section of this lab, you should at a minimum comment on the following: