Advertisement

Comparing GC/MS data from different laboratories-same method

Basic questions from students; resources for projects and reports.

11 posts Page 1 of 1
Hello!

I am doing a project where I have to compare results obtained analyzing the same samples on different GC/MS systems in different laboratories (same conditions, same method), to see if there is a significant difference between them or not. I was planning to use peak areas for the comparison. I have to compare around 50 samples and each of them has 11 components (variables). Do you know what statistical/chemometric method to use? Or do you have any resources where I can read more about this issue? Any help or suggestions would be appreciated!
Hello!

I am doing a project where I have to compare results ...
quantity results ?
obtained analyzing the same samples on different GC/MS systems in different laboratories (same conditions, same method), to see if there is a significant difference between them or not. I was planning to use peak areas for the comparison. ...
If results are quantities then comparing peak areas or hights is not a good idea. Just compare amounts.
The results are the peak areas of the detected compounds, I'm not determining quantities. I have to see if there is a significant difference between the peak areas obtained on different instruments. The main problem I have is how to compare that data, considering I have a big number of results - 50 samples with 11 compounds . I could easily do a PCA or PLS-DA, but with that I only get a graphical result. I would need a method that gives me exact numbers, to see if there is a difference between methods without comparing each sample on its own.
The results are the peak areas of the detected compounds, I'm not determining quantities. I have to see if there is a significant difference between the peak areas obtained on different instruments. ...
Then compare areas. However if it is a case, remember that areas should be expressed in the same units. S/N might be better estimation of sensitivity.

Good luck !
If the instruments are from different manufacturers - the answer without even looking is yes the areas are different. The electronics and signal conversion differ by manufacturer - and even between instrument models.

What do you want to do with the numerical result? Do you want to see if signal strength clusters between sets of laboratories? - If so, you want a cluster analysis.

If you want to see if one laboraratory or another is an outliner on any a particular compund, I would suggest univariate analsysis. If samples are of varying conentrations, you may need to normalize to mean area for the concentration to include all samples in the univarariate analysis for a given compund. (Univariate analysis being mean and measurements concerning the mean as well as distribution, including establishment of percentiles.) If these are calibration curve data, you can apply univariate analysis to the slope and intercept values of curves created. (as the linearity may differ instrment to instrument.)

It is a bit hard to guess where you are going. And, it would help to know what design there is in the data collected. Design of an experiment and appropriate statistical treatment are very strongly tied.
Thank you both for trying to help! I appreciate it. I guess I have to apologize for not being clear from the start. Judging from the answers I guess I should change my approach to this, cause I guess peak areas wasn't a good choice but that's the data I have.

So let's start from the beginning. The thing that I am trying to test is method reproducibility. The GC/MS analysis was performed in two different laboratories, on two GC/MS instruments from different manufacturers - both using the same method and conditions. I am trying to see if the method of analysis is reproducible no matter the instrument/laboratory.

I hope this clears it up a little!
The best way is to look at multiple replicates of individual samples run multiple times at the two laboratories. Do you have multiple replicates run at each laboratory?

If you are looking to compare computed results from the labs, use the computed results. Any differences in calibration computations (like weighting or curve order selected) are rolled into the result. And if we assume the best selections were made for the instrument, we let it ride. Otherwise you can go get raw areas and compute response rations, calibration curves, etc.

I assume that the 50 samples are from some other experiment and just happened to be available for comparison? Or, were these selected to cover specific ranges?

I am going to guess that this was single replicates of 50 samples that were not selected for analyte range or made with spiked “clean” matrix for the study, but rather were samples that just "happened to be in the box" (worst kind of set for this kind of analysis). Use lab A as the "reference" look at the percent difference (I’ll call error) in the results of lab B from lab A. Look at the distribution of the error. Is it normally distributed? If all the values for a compound are close to each other (within one lab) just use the error rather than the percent error. If you do not have a normal (or log-normal) distribution of error, your problem may have become harder. Assuming a reasonable degree of normality, plot error vs. concentration as determined in lab A. What does this tell you about comparability of data across the analyte range? Perform a regression of results from A vs. results from B. Is the slope close to 1? Is there a significant intercept? Are your confidence intervals large enough that you can only say that the experiment does not really tell you anything?

If you have selected samples to give specific ranges for analytes or have run replicate injections. The design becomes a bit more powerful. An important lesson in statistical analysis of data: Select the treatment of the data as you design the experiment. In this way you have the right number of replicates across the proper range of results when you do the data analysis. Typically when I try to design a data analysis for data that have already been collected, my first statement of observation is: “Drat! I don’t’ have the right data to get the answer I really wanted!!” (I’ve been doing this for a number of years and still find myself attempting analysis of data on improperly designed experiments.)
Don_Hilton you are a great help! Thank you for your effort! The 50 samples, as you said, just "happened to be in the box" - they were leftovers from another experiment. If it was my decision, I would have done it differently. This is what gives ma a headache, if it were "clean" matrixes it would have been much easier. But, on the plus side, each of the samples was run in triplicates, in both laboratories.

EDIT

Do you know where I can read more about this kind of experiments? Papers, books?
With replicates - three is not many, but you can compute variance for each analyte for each sample in each lab and then look at pooled variance to compair variability of results lab to lab. Look at what is going on in samples. If the matrix has an interference that distorts results in one lab in a sample, but not the other lab, you may want to do the analysis with and without that sample. While everything is the "same" between labs. The chromatography may not match exactly for a number of reasons - including differences in columns, even if each is new.

If your sample have a range of values for results, take narrow bands in the ranges and see how variance compares at concentration levels.

With a set of 50 samples, I hope that there were instrument controls or calibration checks in the set of samples. With 50 samples run in triplicate (150 injections) there needs to be a fair number of control samples or calibration checks run in the sequence - I hope. These should have many more than three replciates at a high, low and perhaps even medium level? If so, start the analysis on these. (These are the instrument QC samples which show if the instrument is even being consistant in the particular laboratory.)

Now, I have taken you off into digging for details in the data - but we do come back to the purpose of the experiment. Wtih statistical techniques you can ask two questions - 1) can I say that there is no difference that can be determined between set A and set B (or a data set and a model). And 2) (the one people forget to ask) with the data I have, can I detect a difference as small as I need to be able to find? If you can not detect a difference as small as you need to detect and your experiment shows no significant difference between A and B - your experiment fails to answer the question.

Be sure you have the question clearly stated. If you want to compare reproducability of computed results within laboratories against each other you are measuring variance within laboratories and doing a comparison of variance beween laboratories. (F test at the simplest) If you are testingto see if one laboratory reproduces the other, you are also testing mean values. (t-test at the simplest). The trick is pooling data in a meaningful way. Because the mean values are expected to scatter across a box of samples, a regression may work, but you need to be aware of the distribution of means- this weights the fit.

I would suggest that with the ideas you have from here, you start tumbling the data to see what it looks like. There are some good books that help with a look at experimental data. I have a couple at home - the one I reached around and picked up is John Mandel's "The Statistical Analysis of Experimental Data" - availalbe from Dover books (old classic, but at Dover inexpensive!) the other has wndered off... And, at work I my favorite is an older version of Box, Hunter and Hunter: "Statistics for Experimenters" The current is "Statistics for Experimenters: Design, Innovation, and Discovery" and you can get a used copy at an almost affordable price.
You were a tremendous help! Once again, thank you for all the answers and suggestions! I will try and do the work following your suggestions and see what works the best. You really cleared some stuff for me!
Another way to look at this: you could consider the 550 peak areas you have from 11 peaks in 50 samples as a giant paired t-test (pairing each sample/peak combination in lab A with its matching measurement from lab B). If you had 550 people who'd each been tested twice, before and after a treatment, you'd do it that way to see if the treatment had any effect (irrespective of the fact that each person/peak is completely independent of all the others and has their own personal baseline).

This would give you a single statistical test that would indicate whether values in lab A consistently vary in the same direction compared to lab B.

To do it on peak areas is almost completely meaningless, and almost certainly going to give a significant result. To do it on calibrated result is far more valuable.
11 posts Page 1 of 1

Who is online

In total there are 13 users online :: 0 registered, 0 hidden and 13 guests (based on users active over the past 5 minutes)
Most users ever online was 4374 on Fri Oct 03, 2025 12:41 am

Users browsing this forum: No registered users and 13 guests

Latest Blog Posts from Separation Science

Separation Science offers free learning from the experts covering methods, applications, webinars, eSeminars, videos, tutorials for users of liquid chromatography, gas chromatography, mass spectrometry, sample preparation and related analytical techniques.

Subscribe to our eNewsletter with daily, weekly or monthly updates: Food & Beverage, Environmental, (Bio)Pharmaceutical, Bioclinical, Liquid Chromatography, Gas Chromatography and Mass Spectrometry.

Liquid Chromatography

Gas Chromatography

Mass Spectrometry