Cedar Pollen Size Paper

Ben Bell, a PhD candidate has just published his latest collaborative research on cedar pollen and climate variability. His research is focussed on ways in which Cedrus atlantica might be used as a proxy for (palaeo)climate in the Atlas. This paper examines a previously postulated link between pollen grain size and moisture availability, and concludes that moisture availability is not a significantly related to grain size in this context.

The study makes use of a number of methods for determining grain size – light microscopy, scanning electron microscopy, and laser granulometry. I was involved in the laser granulometry aspect, which Ben proves experimentally is comparable to the microscopic methods. Laser granulometry is considerably less time consuming than microscopic examination, so allowed for the large sample size used in this study.

The paper is published in Palynology, and is open access, available here.


A Shiny App for Very Simple Particle Size Diagrams

I knocked this app up for students who were struggling to draw diagrams for their reports on particle size. It’s my first app written and published with shiny, and I’m looking forward to using this interface for more of my code in the future.

How to Use This App:

  • Your data should be saved as a comma-separated-value, or *.csv file – NOT an Excel file. Remember to select this option in “Save as…” if you are using Excel.
  • Your data should have sieve sizes in rows, and sample names in columns. The pan should have a size of zero (0). Here’s an example to download and use as a template.
  • Your sieve sizes should be expressed in microns.
  • Visit https://tombishop.shinyapps.io/histogramR/.
  • If you’ve included multiple samples (in columns), select the column you’d like to plot.
  • If you want to use units of phi, rather than mm, toggle that setting.
  • To copy the image, just right-click it and save it.
  • If you need statistics calculating for your data, you could use Regis Gallon’s excellent G2Sd program – just remember to set the “sep” parameter to “,” and the “dec” parameter to “.”.
  • Remember to label your axes as appropriate.

Itrax Data Manipulation in R

I’ve been working on ways to make Itrax data more useful to casual users – I figured one way to do this would be to provide some kind of standard report for each scan (or core sequence), with a stratigraphic diagram, some zonation and multivariate analysis. I’ve decided to do this in R, as it is freely available, cross-platform, handles large datasets and has some existing packages that are useful in manipulating scanning XRF data. At present the functionality is very basic (a bit like my understanding of R). I’ve made the following functions available on my Github repository:

  • Import: A function for importing Itrax data into R and cleaning it up a bit on the way. Can also plot the data.
  • Ordination: Performs correspondence analysis, with various options for preparing the data. Also provides biplots.
  • Correlation: Generates correlation matrixes for Itrax data, and some visualisation.
  • Average: Averages Itrax data into a smaller dataset.

I’ll update as I add or modify functionality and documentation. I’m particularly interested to hear from others who are writing code for working with Itrax data, as I think it would make sense to collaborate and work towards a single, powerful suite of tools. Currently my plan is to begin to incorporate some of Menno Bloemsma’s methodology (parts of Itraxelerate) into R, whilst also working on a printable “standard” core data report that can be generated in batches from raw data.

PAST Counter Function

I’ve just discovered the very useful counter function in PAST. PAST is a statistical software package designed specifically for palaeontological data, and can do all sorts of tests and exploratory data processing. I’ve recently moved to version 3.14. One function I’ve just noticed is the counter – this enables you to input counts directly into a spreadsheet using the keys on your keyboard. It also provides auditory feedback and a total count. The software and instructions for its use are available from Øyvind Hammer’s website.

Analysis of Competing Hypotheses (ACH) in Palaeoenvironmental Research

Interpreting palaeoecological data can be a opaque process, differing considerably between workers, and oftentimes scholars have some difficulty describing their own decision making process, or interpreting that of others, particularly in formal written formats like journal articles. This probably has a lot to do with the nature of multi-proxy palaeoecological investigations, where sources of information can be multiple, conflicting, incomplete, imprecise, and unreliable.

Often palaeoecological investigators don’t know exactly what information they will find in a palaeoecological archive before they analyse it, or what the quality of that information will be. This limits the use of statistical hypothesis test – for example, defining a hypothesis (and null hypothesis) to test for significance, although it has some limited application with quantitative data. Traditional hypothesis testing tends to focus on the most likely scenarios, rather than all of the proposed hypotheses. This got me thinking of other ways of testing hypotheses with palaeoecological data.

In many ways, palaeoecological data is a lot like intelligence, medical, or forensic data – information is derived from multiple, different, incomplete, unreliable sources, and can be interpreted in different ways. It is comprised of imperfect evidence preserved after some event or epoch, and it is up to the researcher(s) to compose different information sources into some coherent, plausible sequence of events, causal explanation and/or quantitative information about a past environment. This led me to take a look at methodical ways of testing hypotheses used in other fields.

For example, anyone who has seen the medical drama “House, M.D.” will be familiar with the fast-paced “differential diagnosis” sessions Gregory House (Hugh Laurie) holds with his team. The system is commonly taught in medical schools to assist medical practitioners to come to a diagnosis of a patient’s condition when the symptoms presented are similar. It also allows a medical practitioner to select an appropriate diagnostic if they are unable to discriminate between two or more diagnoses. The process can be broadly summarised as:

  1. Gather all information.
  2. List all possible causes.
  3. Prioritise the list by risk to the patient’s health.
  4. Working from the highest priority to the lowest, rule out each condition using the available information.

This simple model perhaps mirrors the approach informally adopted by many palaeoecological workers – collect data, hypothesise, rule out until settled on answers. This model fails to accommodate the possibility of competing hypotheses that cannot be adequately differentiated because of limitations in the information available. The intelligence analytical community has developed a way of reasoning and testing hypotheses called the “Analysis of Competing Hypotheses” (ACH). This approach can accommodate the various imperfections of the information available, and can indicate (qualitatively) the likelihood of a particular hypothesis being false. To summarise, the process goes something like this:

  1. Identify the possible hypotheses.
  2. List information and arguments (inc. assumptions and deductions) both for and against each hypothesis.
  3. Assess the relative “diagnosticity” of each piece of information.
  4. Prepare a matrix with hypotheses in columns, and all evidence and/or arguments in rows.
  5. Assess how consistent each piece of information or argument is with each hypothesis, attempting to refute each hypothesis.
  6. Reconsider the hypotheses, removing sources that don’t help discriminate, and identify further evidence required.
  7. Iterate steps 2-7 as required.
  8. Draw tentative conclusions about the relative likelihood of each hypothesis (rank them).
  9. Consider how sensitive your conclusion is the a few critical items of information, and the consequences thereof.
  10. Report conclusions, discussing all hypotheses.

ACH was introduced by Richard Heuer in “The Psychology of Intelligence Analysis” (CIA) to combat confirmation bias in the field of intelligence analysis, to facilitate multiple workers to address a common problem with multiple lines of evidence, and to create an audit trail for intelligence decisions.

It’s clear that with multiple lines of evidence, weighting, and the iterations, this could quickly become more complicated than just muddling through the data. This is perhaps why there is a growing market for consultants marketing their software and services in intelligence, forensics and criminal investigation. Fortunately both Richard Heuer’s treatise on the subject, alongside some powerful software to assist, are available gratis online.

I’d be interested to hear from anyone who’d like to try (or has tried) using ACH in their analysis of palaeoenvironmental data. I’ll happily configure a portable web-server if you’d like to try the software based version in a group meeting.


Itrax Table of the Elements

Here’s a poster I’ve designed to be used as a reference for people working with Itrax or other core scanning equipment. It is a table of the elements (with much of the usual information these traditionally contain), with the electron configurations, common x-ray emission spectra, and information on efficiency of detection using Mo and Cr source tubes. Hopefully you’ll find it helpful – if you use it in your lab I’d love to hear from you!


A high-resolution vector image file can be downloaded from the resources page.

A Beginners Guide to G2Sd for Particle Size Analysis

You may be familiar with the classic GRADISTAT for calculating particle size statistics. It is a set of macros written into a Microsoft Excel spreadsheet by Kenneth Pye and Simon Blott. At the time of writing it was last updated for use with Microsoft Excel 2007, and is becoming increasingly difficult to use with newer versions of Excel. After recently troubleshooting some odd GRADISTAT outputs for one of our lab users, I decided to see if there were alternatives available.

G2SD, written by Regin Gallon and Jerome Fournier, does everything GRADISTAT did and a little more, and all as an easy to use package in R. Easy to use R? I hear you ask! Well yes, because it has the option of using a web-based interface (built using the “shiny” R package). I’m (very) still starting out with R, and I’m not finding it easy! However, this package can be used with minimal knowledge – just follow my instructions below. Want to try it out first, or only need basic functionality? Why not use the hosted web-based version, available here for use with delimited text files (e.g. *.csv, *.tab).

Install R and G2Sd

  1. Assuming you are a Windows user, visit the R website, download and run the installation file, selecting the default options.
  2. Visit the repository for G2Sd and download the latest stable version.
  3. Open RGui, and from the top navigation menu, select Packages>Install Packages From Local Files.
  4. Navigate to the downloaded *.zip file that contains G2Sd files.
  5. G2SD is dependent on some other packages. Install the first by typing install.packages(“shiny”).
  6. Repeat step five but replace shiny with xlsx, then rJava, xlsxjars, reshape2 & ggplot2.
  7. You only need to do this once!

Loading and Using G2Sd

  1. Load G2Sd by typing library(G2Sd), or by navigating Packages>Load Packages>G2Sd from the top navigation menu.
  2. Run the web browser based GUI by typing granstat(web_interface=TRUE). A web-browser should appear!
  3. The data should be in the format:
    1. The first column are the sieve mesh sizes in microns, in descending order (from largest to smallest aperture). Size “0” is the pan.
    2. Columns subsequent to the first are a sample. The first row is the sample identifier.
  4. From here you can visualise and download the data.

Using the Command Prompt Instead

  1. Load G2Sd as previously described.
  2. Load CSV data into an R dataframe by typing mydata <- read.table(“mydata.txt”, header=TRUE, sep=”,”). If you don’t have your own data, an example dataset is included in the package. Try loading it using data(granulo), and exporting it to a CSV datafile using write.table(granulo, “granulo.txt”, sep=”,”).
  3. For statistics, try typing granstat(mydata). There’s a lot more functionality here – check out the package documentation for more.
  4. If you want to export these to a CSV file, just combine the two functions we met previously: type write.table(granstat(mydata), “mydata.txt”, “sep=”,”).
  5. For graphics, try trying grandistrib(mydata). There’s options for multiple samples to be plotted, and a number of different styles – try granplot(mydata) or granplot(mydata, xc=2:4) for a couple of examples. Check out the package documentation for more.
  6. To export a graphic generated in the data frame, right click the figure and save using the dialog box.


Blott, S.J. & Pye, K. (2001). “GRADISTAT: a grain size distribution and statistics package for the analysis of unconsolidated sediments”. Earth Surface Processes and Landforms v26, pp.1237-1248.

Fournier J., Gallon R. (2014). “G2Sd: a new R package for the statistical analysis of unconsolidated sediments”. Géomorphologie : relief, processus, environnement v20(1), pp.73-78.