Itrax Data Manipulation in R

I’ve been working on ways to make Itrax data more useful to casual users – I figured one way to do this would be to provide some kind of standard report for each scan (or core sequence), with a stratigraphic diagram, some zonation and multivariate analysis. I’ve decided to do this in R, as it is freely available, cross-platform, handles large datasets and has some existing packages that are useful in manipulating scanning XRF data. At present the functionality is very basic (a bit like my understanding of R). I’ve made the following functions available on my Github repository:

  • Import: A function for importing Itrax data into R and cleaning it up a bit on the way. Can also plot the data.
  • Ordination: Performs correspondence analysis, with various options for preparing the data. Also provides biplots.
  • Correlation: Generates correlation matrixes for Itrax data, and some visualisation.
  • Average: Averages Itrax data into a smaller dataset.

I’ll update as I add or modify functionality and documentation. I’m particularly interested to hear from others who are writing code for working with Itrax data, as I think it would make sense to collaborate and work towards a single, powerful suite of tools. Currently my plan is to begin to incorporate some of Menno Bloemsma’s methodology (parts of Itraxelerate) into R, whilst also working on a printable “standard” core data report that can be generated in batches from raw data.

PAST Counter Function

I’ve just discovered the very useful counter function in PAST. PAST is a statistical software package designed specifically for palaeontological data, and can do all sorts of tests and exploratory data processing. I’ve recently moved to version 3.14. One function I’ve just noticed is the counter – this enables you to input counts directly into a spreadsheet using the keys on your keyboard. It also provides auditory feedback and a total count. The software and instructions for its use are available from Øyvind Hammer’s website.

Analysis of Competing Hypotheses (ACH) in Palaeoenvironmental Research

Interpreting palaeoecological data can be a opaque process, differing considerably between workers, and oftentimes scholars have some difficulty describing their own decision making process, or interpreting that of others, particularly in formal written formats like journal articles. This probably has a lot to do with the nature of multi-proxy palaeoecological investigations, where sources of information can be multiple, conflicting, incomplete, imprecise, and unreliable.

Often palaeoecological investigators don’t know exactly what information they will find in a palaeoecological archive before they analyse it, or what the quality of that information will be. This limits the use of statistical hypothesis test – for example, defining a hypothesis (and null hypothesis) to test for significance, although it has some limited application with quantitative data. Traditional hypothesis testing tends to focus on the most likely scenarios, rather than all of the proposed hypotheses. This got me thinking of other ways of testing hypotheses with palaeoecological data.

In many ways, palaeoecological data is a lot like intelligence, medical, or forensic data – information is derived from multiple, different, incomplete, unreliable sources, and can be interpreted in different ways. It is comprised of imperfect evidence preserved after some event or epoch, and it is up to the researcher(s) to compose different information sources into some coherent, plausible sequence of events, causal explanation and/or quantitative information about a past environment. This led me to take a look at methodical ways of testing hypotheses used in other fields.

For example, anyone who has seen the medical drama “House, M.D.” will be familiar with the fast-paced “differential diagnosis” sessions Gregory House (Hugh Laurie) holds with his team. The system is commonly taught in medical schools to assist medical practitioners to come to a diagnosis of a patient’s condition when the symptoms presented are similar. It also allows a medical practitioner to select an appropriate diagnostic if they are unable to discriminate between two or more diagnoses. The process can be broadly summarised as:

  1. Gather all information.
  2. List all possible causes.
  3. Prioritise the list by risk to the patient’s health.
  4. Working from the highest priority to the lowest, rule out each condition using the available information.

This simple model perhaps mirrors the approach informally adopted by many palaeoecological workers – collect data, hypothesise, rule out until settled on answers. This model fails to accommodate the possibility of competing hypotheses that cannot be adequately differentiated because of limitations in the information available. The intelligence analytical community has developed a way of reasoning and testing hypotheses called the “Analysis of Competing Hypotheses” (ACH). This approach can accommodate the various imperfections of the information available, and can indicate (qualitatively) the likelihood of a particular hypothesis being false. To summarise, the process goes something like this:

  1. Identify the possible hypotheses.
  2. List information and arguments (inc. assumptions and deductions) both for and against each hypothesis.
  3. Assess the relative “diagnosticity” of each piece of information.
  4. Prepare a matrix with hypotheses in columns, and all evidence and/or arguments in rows.
  5. Assess how consistent each piece of information or argument is with each hypothesis, attempting to refute each hypothesis.
  6. Reconsider the hypotheses, removing sources that don’t help discriminate, and identify further evidence required.
  7. Iterate steps 2-7 as required.
  8. Draw tentative conclusions about the relative likelihood of each hypothesis (rank them).
  9. Consider how sensitive your conclusion is the a few critical items of information, and the consequences thereof.
  10. Report conclusions, discussing all hypotheses.

ACH was introduced by Richard Heuer in “The Psychology of Intelligence Analysis” (CIA) to combat confirmation bias in the field of intelligence analysis, to facilitate multiple workers to address a common problem with multiple lines of evidence, and to create an audit trail for intelligence decisions.

It’s clear that with multiple lines of evidence, weighting, and the iterations, this could quickly become more complicated than just muddling through the data. This is perhaps why there is a growing market for consultants marketing their software and services in intelligence, forensics and criminal investigation. Fortunately both Richard Heuer’s treatise on the subject, alongside some powerful software to assist, are available gratis online.

I’d be interested to hear from anyone who’d like to try (or has tried) using ACH in their analysis of palaeoenvironmental data. I’ll happily configure a portable web-server if you’d like to try the software based version in a group meeting.

 

Itrax Table of the Elements

Here’s a poster I’ve designed to be used as a reference for people working with Itrax or other core scanning equipment. It is a table of the elements (with much of the usual information these traditionally contain), with the electron configurations, common x-ray emission spectra, and information on efficiency of detection using Mo and Cr source tubes. Hopefully you’ll find it helpful – if you use it in your lab I’d love to hear from you!

periodic-table_website

A high-resolution vector image file can be downloaded from the resources page.

A Beginners Guide to G2Sd for Particle Size Analysis

You may be familiar with the classic GRADISTAT for calculating particle size statistics. It is a set of macros written into a Microsoft Excel spreadsheet by Kenneth Pye and Simon Blott. At the time of writing it was last updated for use with Microsoft Excel 2007, and is becoming increasingly difficult to use with newer versions of Excel. After recently troubleshooting some odd GRADISTAT outputs for one of our lab users, I decided to see if there were alternatives available.

G2SD, written by Regin Gallon and Jerome Fournier, does everything GRADISTAT did and a little more, and all as an easy to use package in R. Easy to use R? I hear you ask! Well yes, because it has the option of using a web-based interface (built using the “shiny” R package). I’m (very) still starting out with R, and I’m not finding it easy! However, this package can be used with minimal knowledge – just follow my instructions below. Want to try it out first, or only need basic functionality? Why not use the hosted web-based version, available here for use with delimited text files (e.g. *.csv, *.tab).

Install R and G2Sd

  1. Assuming you are a Windows user, visit the R website, download and run the installation file, selecting the default options.
  2. Visit the repository for G2Sd and download the latest stable version.
  3. Open RGui, and from the top navigation menu, select Packages>Install Packages From Local Files.
  4. Navigate to the downloaded *.zip file that contains G2Sd files.
  5. G2SD is dependent on some other packages. Install the first by typing install.packages(“shiny”).
  6. Repeat step five but replace shiny with xlsx, then rJava, xlsxjars, reshape2 & ggplot2.
  7. You only need to do this once!

Loading and Using G2Sd

  1. Load G2Sd by typing library(G2Sd), or by navigating Packages>Load Packages>G2Sd from the top navigation menu.
  2. Run the web browser based GUI by typing granstat(web_interface=TRUE). A web-browser should appear!
  3. The data should be in the format:
    1. The first column are the sieve mesh sizes in microns, in descending order (from largest to smallest aperture). Size “0” is the pan.
    2. Columns subsequent to the first are a sample. The first row is the sample identifier.
  4. From here you can visualise and download the data.

Using the Command Prompt Instead

  1. Load G2Sd as previously described.
  2. Load CSV data into an R dataframe by typing mydata <- read.table(“mydata.txt”, header=TRUE, sep=”,”). If you don’t have your own data, an example dataset is included in the package. Try loading it using data(granulo), and exporting it to a CSV datafile using write.table(granulo, “granulo.txt”, sep=”,”).
  3. For statistics, try typing granstat(mydata). There’s a lot more functionality here – check out the package documentation for more.
  4. If you want to export these to a CSV file, just combine the two functions we met previously: type write.table(granstat(mydata), “mydata.txt”, “sep=”,”).
  5. For graphics, try trying grandistrib(mydata). There’s options for multiple samples to be plotted, and a number of different styles – try granplot(mydata) or granplot(mydata, xc=2:4) for a couple of examples. Check out the package documentation for more.
  6. To export a graphic generated in the data frame, right click the figure and save using the dialog box.

References:

Blott, S.J. & Pye, K. (2001). “GRADISTAT: a grain size distribution and statistics package for the analysis of unconsolidated sediments”. Earth Surface Processes and Landforms v26, pp.1237-1248.

Fournier J., Gallon R. (2014). “G2Sd: a new R package for the statistical analysis of unconsolidated sediments”. Géomorphologie : relief, processus, environnement v20(1), pp.73-78.

 

Serial Interfaces on Lab Balances

I’ve hooked up one of our lab balances to a PC!

For users of the Geography Laboratory

In the main lab is a networked computer that I’ve connected to the Ohaus Adventurer AR-Series balance. To use it:

  • login
  • open “C:\SPDC Data Collection V2.01\SPDC Data Collection V2.01.exe”
  • click “Browse”, find the Excel spreadsheet you want to import to, and click “Open”
  • click “Run”. It will automatically open the spreadsheet and connect to the balance
  • select to cell you want to populate, and press the “Print” button on the balance; it will populate the cell with the current reading.

How’s it Done and Can You Do This With Other Balances?

Almost all laboratory balances have RS232 communications built in as standard, but for some reason they seem to want to make it as hard as possible to use them! This is unfortunate, because this functionality is really handy for connecting them up to spreadsheet packages like Excel for automatic data entry. If you can get it working, it is an easy way to save all your lab users time and reduce transcription errors. I’ve had a good hack at the Ohaus balances in our lab, and here are some notes on my experiences. My next project will be to pipe these serial connections over IP, and then allow access via a WIFI hotspot. Just connect to the WIFI hotspot, start the wedge software and measure away – well that is the plan, anyway.

Ohaus Adventurer AR-Series (like above)

These should be straightforward – they have configurable Rs232 communications and a standard male DB-9 connector. Of course, for reasons best know the the manufacturer, they have completely non-standard pin assignments. The pin assignments are shown in the manual, so you’ll have to build your own null modem cable to connect it, or the ground ends up connected to the CTS and you get nonsense. On the plus side, Ohaus supply the wedge software for free, which will allow direct input into a .xlsx, .csv, etc. The standard serial settings are 2400 7N2 (yeah, why not?), but these are configurable so you could change them to something this side of sensible (like 9600 8N1). Once it is all up and running, just press “print” to fill a cell.

Ohaus Explorer (& Pro)

Now here’s a thing: although the pin assignment and default serial settings are the same as for the Adventurer series, the RTS/CTS is implemented differently. No problem, just set the correct flow control in the wedge software, right? Well, we would if we could, but the Ohaus wedge software doesn’t implement hardware handshaking properly! Open a serial connection using Minicom or PuTTY and you’ll have no trouble sending and receiving (when you configure the handshaking properly), but try the Ohaus software, and it won’t play. You’ll have to tie the CTS and DTR pins together to get them to work – it’s a bit ham-fisted, but it’ll do. This will be my next project.

Mettler Toledo 

Naturally, Mettler Toledo fit a serial interface without the familiar DB-9 connector. I’m working on sourcing it, but I suspect it’ll be £££.

Sartorius

Haven’t started on these yet, because we don’t have many in the labs, although previous experience tells me they are usually good at implementing serial comm’s over RS-232.

I’m on GitHub!

I’m on GitHub! At the moment I’m using it to develop, update and distribute the LaTeX templates for typesetting theses, associated with the courses I deliver on the subject. Expect the addition of some of the Python-based dGPS, ADS-B and other radio-science scripts I’ve written.

Raspberry Pi Headless Configuration

If like me you’ve been experimenting with Raspberry Pi’s you’ll know putting monitors, mice and keyboards costs far more than the units themselves and presents some difficulty. Not just that, but running “headless” systems is normally preferable for the kind of applications I’m working on (servers, radios, desktop gadgets). If you can get away with only hooking up KVM (keyboard, video, mouse) once (or not at all!), great! There’s loads of guides knocking around, but here’s how I do it:

Using a KVM

First, I connect the Pi to the telly with a spare HDMI connection, plug in a wireless mouse and keyboard and run NOOBS to install Raspbian. I assume the systems are on the same local network, so use hostnames here. If you are connecting over the internet, you’ll need to replace these with a public IP or domain name.

Using the Serial Interface

First I connect a Raspian image onto SD and connect up a TTL serial interface (find the pinouts for your Pi). Use Device Manager to find the COM number of the interface on your PC, then use PuTTY to configure it (the baud is 115200) and connect to a session on the Pi. Use the default logins and follow the instructions below.

Enable SSH

First I run sudo raspi-config and enable SSH and change the hostname. I choose not to reboot, but quit raspi-config and run sudo shutdown now. Now, simply login from Windows from PuTTY using the hostname and default settings. Using the hostname avoids the need to assign a static IP.

Enable RDP

If you want a desktop, RDP is built into Windows 10 and is an easy way to remote into the x-window system, all you need to do is install xrdp. Just install like any other package (sudo apt-get install xrdp). Now reboot the Pi and find the Remote Desktop Connection client on your Windows 10 system, and enter the hostname to connect.

Security

Remember, by opening up remote access servers like these, you open the Pi up to potential attacks, especially if you have external access from the internet. Internal attacks might not be a big problem on your internal network, but if you choose to open these up over the net, think about using random port numbers, public/private keys and tunneling the RDP over SSH.