Data Presentation

Data Presentation:

Please read the information below on data presentation. Make sure you read it all, but pay close attention to the sections on Tables, Graphs and Uncertainties.

Once you have finished reading, finalize your data tables and include all units with uncertainties in the cell headings.

Data presentation in IBESS

Units

The international system of units should be used for quantitative data wherever possible, although the main consideration is that units should be fit for purpose. It is, for example, preferable to use hours rather than seconds in longer experiments such as assessing the effect of insulation on the electricity consumption of a thermostatic water heater, or dm3 rather than m3 for depicting the volume of carbon dioxide produced by burning fossil fuels. Non-metric units such as inches or cups should not be used.

Tables

Tables are designed to lay out the data ready for analysis. The table should have an explanatory title. “Table of results” is not an explanatory title, whereas “Table to show how frequently students in grade 12 at school x eat different types of fish” describes the nature of the data collected. Other points to note include the following.

Units should only appear in cell headings rather than in the body of the table.
The independent variable of a laboratory study should be in the first column.
Where relevant, subsequent columns should show the results for the dependent variables.
Decimal places should be appropriate and consistent throughout a column.

The methods used to process the data should be easy to follow, and the processed data may be included in the same table as the raw data; there is no need to separate them.

Graphs

Graphs should be clear, easy to read and interpret, and have an explanatory title. The data points should be clearly identifiable, even if the graph has been generated by computer software. All graphs and bar charts should have labelled axes of a suitable and demarcated scale.

Data points should be joined by a straight line and the line should start with the first data point and end with the last one as there should be no extrapolation beyond these points. Lines of best fit are only useful if there is good reason to believe that intermediate points fall on the line between two data points. The usual reason for this is the collection of a large amount of data, which is often not possible given the time constraints of investigations at this level. Likewise, extrapolation of the line will only make sense if there is a large amount of data and a line of best fit is predicted or there is reference made to the literature values. Students should exercise caution when making assumptions.

Finally, the type of graph chosen should be appropriate to the nature of the data collected.

Error

There are sources of error at a number of stages of any investigation. The chosen method should try to address as many as possible but, despite this, many will remain. Students should not be discouraged by this because experimental results are only “snapshots” or samples of a complex system. Instead, students should be encouraged to take them into consideration when analysing the data and drawing conclusions. Where appropriate, a thorough evaluation of the sources of uncertainty and error will also help to gain perspective on the scope of the investigation in general and to suggest potential improvements and extensions.

Random variation or normal variation

In nearly all ESS laboratory or field investigations, errors can be caused by variation at the location of study (for example, owing to the time of day or season), by differences between sampled locations (for example, the slope or aspect of the locations) or variation in the material used. Living materials are subject to variation, even when controlled laboratory-based experiments are carried out. For example, when the gain in dry mass of seedlings grown in the presence of different types of fertilizers is measured, the seedlings will vary in their growth rate, even if variables such as light intensity and temperature are controlled, partly because of genetic differences between the seeds, and partly because it is impossible to completely control all the possible factors that might impact growth rate. Errors of this nature are described as random errors; they can be kept to a minimum by careful selection of locations and materials and by careful control of variables, where appropriate, but can never be eliminated entirely.

Human errors

Making mistakes is not an acceptable source of error if they could have been easily avoided with more care and attention. Data loggers can be used if a large number of measurements need to be made in order to avoid errors arising as a result of loss of concentration by the student. Careful planning can help reduce this risk.

Systematic errors

Systematic errors can be reduced if equipment is regularly checked or calibrated to ensure that it is functioning correctly. For example, a thermometer should be placed in an electronic water bath to check that the thermostat of the water bath is correctly adjusted. A blank should be used to calibrate a colorimeter to compensate for the drift of the instrument.

Degrees of precision and uncertainty in data

Students must choose an appropriate instrument for making measurements of quantities such as length, temperature, pH and light intensity. Different instruments may be appropriate depending on whether the measurement is made in the field or in a school laboratory. This is partly because random errors in some field-based studies might far outweigh the systematic error that could be reduced by using a more sophisticated or precise measuring instrument. It does not mean that every piece of equipment needs to be justified, and it should be appreciated that in many situations encountered in school-based practical work, the most appropriate instrument may not be available.

For instruments with digital displays, the simplest rule is that the degree of precision is plus or minus (±) the smallest division on the instrument (the least count).

The limit of error for non-digital instruments is usually no greater than the least count and is often a fraction of the least count value. For example, a liquid-in-glass thermometer is often read to half of the least count division. This would mean that a temperature reading of 24.1ºC becomes 24.10ºC (± 0.05ºC). Note that the temperature is now cited to one extra decimal place so as to be consistent with the uncertainty.
The estimated uncertainty takes into account the concepts of least count and instrument limit of error but also, where relevant, higher levels of uncertainty as indicated by an instrument manufacturer (which is usually obtainable online), or qualitative considerations such as parallax problems in reading a thermometer scale, reaction time in starting and stopping a timer, or random fluctuation in an electronic balance read-out. Students can enhance their ability to evaluate their methodology and their data if they attempt to quantify these observations into the estimated uncertainty.

Other protocols exist and no specific protocol is required. However, the recording of uncertainties of a sensible and consistent magnitude, where appropriate, can enhance the evaluation of an investigation.

Propagating errors

Propagating errors during data processing is not expected but it is accepted provided it is appropriate and the basis of the experimental error is explained.

Replicates and samples in ESS studies

ESS studies, because of their complexity and normal variability, require replicate observations and multiple samples. As a rule of thumb for manipulated studies, the lower limit is five measurements of an independent variable, with three repeats for each. This will produce five data points for analysis. So, in a study to see how lichen growth varies with distance from a road in an urban area, a measure of lichen growth would need to be taken at five different distances from the road, and repeated for each distance a minimum of three times. Obviously, this rule will vary within the limits of the time available for an investigation and, in field studies such as this, will be limited by the random error associated with the location. Some simple investigations will, however, permit a large number of measurements or a large number of runs. It is also possible to use class data to generate sufficient replicates to permit adequate processing of the data in group and non-assessed practical work.

The standard deviation is the spread of the data around the mean. The larger the standard deviation the wider the spread of data is. Standard deviation is used for normally distributed data and can be calculated if many more than three replicates have been obtained, typically more than 10. For example, when data is collected on the weekly mass of household domestic waste for the families of students in a particular class, the mean can be calculated and compared to the means in other parts of the world. However, it might be interesting and informative to calculate the standard deviation in order to know how representative the mean was. Calculation of standard deviation is not a requirement, but it can be useful for showing the general variation/uncertainty around a measurement; it is less helpful for identifying potential anomalies.

Error bars that plot the highest and the lowest value for a test, joined up through the mean to form the data point plotted on the graph with a vertical line, will allow the variation/uncertainty for each data set to be assessed. If the error bars are particularly large, it may show that the readings taken are unreliable (although reference to the scale might be needed to determine what “large” actually is). If the error bars overlap with the error bar of a previous or subsequent point, it would show that the spread of data is too wide to allow for effective discrimination. If trend lines are possible, then adding the coefficient of determination (R2) can be helpful as an indication of how well the trend line fits the data.

Statistics

An effective presentation of the data goes a long way to assessing whether or not a trend is emerging. However, this is not the same as using statistics to assess the nature of such a trend and whether it is significant; in other words, whether a trend, judged subjectively from a graph, is actually valid. Students are encouraged to use relevant statistical tests to assess their data, but should briefly explain their choice of test, outline the working hypothesis and put the results of the test into the context of their investigation. For statistical tests, the correct protocol should be presented, including null and alternative hypotheses, degrees of freedom, critical values and probability levels.