Field of Science

Sequencing data

The New York Times recently posted a piece on problem solving which asked readers to first solve a problem:

"We’ve chosen a rule that some sequences of three numbers obey — and some do not. Your job is to guess what the rule is. We’ll start by telling you that the sequence 2, 4, 8 obeys the rule."

You can test your hypotheses by typing sequences into three boxes to see if they follow the unstated rule.  Once you think you know, you type in a description.  Most people it turns out, suggest an answer without ever trying a sequence that returns a firm "NO."  Psychologists interpret this as being evidence of confirmation bias: once we get a "yes" for our theory - we don't poke around trying to find a "no."

When I teach chemical kinetics, I point out to students that few experiments can prove a reaction goes in a particular sequence, only that the data is consistent with a proposed mechanism.  No answers can be as or more critical to problem solving as yes.

I failed to 'correctly' solve the puzzle, [SPOILER ALERT] though I did get several no answers.  One rule I tried was an: 21, 22, 23 = 2, 4, 8.  The sequence 1, 1, 1 follows that rule (11, 12, 13 are all one), but yielded a no.  The rule an = 2 x an-1: 2, 2x2=4, 2x4 worked for every sequence I tried, but is not 'the 'answer.  The answer is that correct sequences have each number larger than the last.

The study suggests I failed not only because of confirmation bias, but because I complicated the problem, assuming that there was some sort of trick to the rule. Actually, I assumed the technical mathematical meaning of sequence held, in that there was a rule that uniquely specified each number in the sequence given the starting value(s). An ordered list of numbers, each of which is larger than the previous value is not a sequence in the mathematical sense.

In retrospect, I should have tried the sequence 0, 0, 0. It follows the rule I proposed (an = 2 x an-1) as the correct one, but returns a "no." It would have ruled out my proposed rule, a useful "no".  (I might also have tried non-integer numbers.)  I failed in part because I didn't understand the question they were asking, we didn't have the same definition of "sequence."  In some sense I fell prey to the "when all you have is a hammer, everything looks like a nail" scheme.


There are more than 2500 rules that would give you the mathematical sequence 2, 4, 8.  See Sloane's encyclopedia of integer sequences.  My first proposed sequence is A000079 in the collection.

For more about sequences and Sloane's encyclopedia, read this article at AT&T.

Eating periodically (not a quantum diet)

What elements are in chocolate?

Answer #1

Carbon (Chocolate)
Hydrogen (CHocolate)
Oxygen (ChOcolate)
Holmium (CHocolate)
Cobalt (ChoColate)
Lanthanum (ChocoLate)
Astatine (Chocol(ChocolAte)
Tellurium (ChocolaTe)

So you could have: CHoCoLaTe or CHOCOLate or....

Answer #2

(Presuming the letters are not required to be used in order - and yes, I wrote a piece of code to give me this for any word)
All of the above and
aluminum (Al), chlorine (Cl), calcium (Ca), cerium (Ce), helium (He), actinium (Ac), Technetium (Tc), thorium (Th), thallium (Tl) and tantalum (Ta)

Answer #3

Elements that have been detected in chocolate (in this case dark chocolate, rough percent of my recommended dietary allowance in parentheses assuming I eat only a 100 gram bar).

Carbon, hydrogen, nitrogen, oxygen, potassium (why cocoa is detectably radioactive), calcium (about 5% of my RDA), iron (125%), magnesium (70%), phosphorous, potassium (almost a gram, 20%), sodium, zinc (40%), nickel, sulfur, silicon, cadmium, lead (yep, lead, mostly from dust contamination during transport), mercury, arsenic, uranium (trace amounts, but yes, more radioactivity), aluminum, copper (from pesticides, but on the plus side gives you your RDA for this element), and manganese

Nearly one fifth of the known elements have been detected in chocolate, which clearly should be the backbone of any periodic diet.

What other elements are you eating?


Just in case your chocolate doesn't have enough radioactivity for you:

Chemists are wildly polysemous

STO-3G//STO-3G calculated Raman spectrum of arsole
A few months ago this BBC news report - about the evacuation of a building because of a volatile compound got chemists on Twitter talking about language, particularly those words that mean one thing to chemists and something quite different to the rest of the world.  (Thanks @NatalieFey_NLS, ‏@stephengdavey and @stuartcantrill!) Like volatile (high vapor pressure vs. explosive) or to my mind the most overexposed chemical example and the inspiration for far too many t-shirts: mole.  One thing led to another, or at least, one comment by @stuartcantrill led to my Thesis column in  this month's Nature Chemistry.
Is RT retweet or 2.5 kJ/mol?

This piece was pure fun to write.  I enjoyed crowdsourcing examples of chemical double meanings. (List of 200 examples is here.) By far the favorite mechanism of formation for chemists is polysemy, where words share a common ancestor, but the meanings have drifted apart.  Take flush, as in flush a column, or flush a toilet or  flush game or even a straight flush.  All these senses derive from the Latin fluxus for flow.  (Don't see the connection to poker? The OED suggests you think of a flush as a "run" or flow of cards.)

Sometimes the two meanings sit close to the surface for chemists, other times we are pretty blind to the lexical ambiguity.  My youngest son is toying with the idea of a chemistry major, and when I read him examples from the list, he was quick to note both senses for many words: cell, salt, aromatic.  But when I got to molar, he wanted to know what else it meant beyond the concentration of a solution.  "Teeth?" I suggested.  He face palmed.  Whether he majors in chemistry or not, we've already messed with his mind.

Polysemy is productive — as the linguists would say — not just in terms of the language, but of new chemistry.  We ought not to discourage lexical play in chemists (not that one has much control over language in any case, IUPAC's gold book notwithstanding) it gives us a rich set of images to draw on and as I said in the essay, "we can't look for what our language doesn't let us imagine."


Read the essay here. ($)

What is the half-life of a tweet?

My tweets apparently have a half-life of about two hours, but I have no idea if that's unique to me.  My spouse is new to Twitter and as I was showing him how he could see some data about his tweets, I noticed that the graph of the data looked familiar.  Probably because I taught chemical kinetics twice last year (in pchem and general chemistry).

Over lunch today, while waiting for my car to be serviced, I decided to explore the kinetics of my tweets.  I used data from the first 10 hours after I posted a tweet, and used tweets that had several hundred total impressions and few retweets.  Using five data sets from the past month, I fit the tweets to linear models for 0th, 1st and 2nd order kinetics.  R2 values suggest that a 1st order model is most appropriate, with a rate constant of 0.35/hour, which translates to a half-life of 2.0 ± 0.4 hours.  I'm curious if that's relatively constant for me, or whether it's characteristic of other parameters, but time is up.




Perhaps because I'm writing this outside in a park, I'm reminded of an infamous problem about the temperature dependence of the chirp rate of male snowy tree crickets in many general and physical chemistry texts.  A discussion of the phenomenon (first recorded in the late 19th century, and not true of cricket everywhere) can be found in Thomas Walker and Nancy Collins. “New World Thermometer Crickets: The Oecanthus Rileyi Species Group and a New Species from North America.” Journal of Orthoptera Research 19 (2010): 371–376. 

Molecular Jek-yls and -hydes

Like Jekyll and Hyde, changing a functional group changes 
a molecule's behavior. Image from Library of Congress.
Chains of pure carbon and hydrogen, called hydrocarbons by chemists, are notoriously hard to get a chemical handle on.  One of the major driving forces in chemical reactions is "opposites attract" — in this case opposite charges.  Since carbon and hydrogen have essentially the same desire for electrons (negative charges), there is not much difference in charge around to drive a reaction. Swap out a hydrogen for something else that does have a relative charge —  chlorine, fluorine, oxygen, nitrogen — and suddenly you have something to react with.  Chemists call these riffs on a basic carbon framework "functional groups" - they are often the parts of a molecule's structure that drive its function.

Change up the functional group, and you change the molecule's behavior. Like Jekyl and Hyde.  Ethanol is something to drink on a Friday night, ethanal is found in the coffee you drink for the hangover the next morning (in an ironic twist, it's also produced as your body metabolized the ethanol.)

The first part of a chemical name tells the size of the carbon framework, the ending tells you about its function — or lack thereof.  Names that end in -yl or -ane mean a hydrocarbon chain without any fancy functionality.  Propane, a popular fuel, is a three carbon hydrocarbon chain.  Methyl mercaptan (added to odorless natural gas to make it smell, and make leaks quickly noticeable), has a one carbon long "chain" in it. Change -yl to -ol and you have made an alcohol, a chain with an -OH group on it (Ethanol is CH3CH2OH, sometimes written EtOH, a 2 carbon chain with an OH group on it.)

Knowing the functional groups means knowing something about the kinds of things a molecule can do.  Esters smell floral, carboxylic acids can remove a layer of skin, and are found in many lotions.

So to decode:
-ol means an alcohol (functional group = -OH) but not necessarily the kind of alcohol you drink 
-al means an aldehyde (-COH); these often smell sweetish 
-oxy means an ether (an oxygen sandwiched between two carbon chains) 
-oic acid or -ic acid means a carboxylic acid (pronounced "car-box-sill-ick") salicylic acid, often found in face washes 
-oate means an ester (a COO group sandwich between two chains); ethyl nonanoate smells like grape, the functional group is between a 2 carbon chain (ethyl) and nine carbon chain (nona) 
-one means a ketone, a CO group sandwiched in between two chains

Check out Andy Brunning's of Compound Interest's great graphic on functional groups and their names and Practically Science's map of molecules in food and their smells.

Getting at the truth: gender in the lab

Nobel prize winning biochemist Tim Hunt made an unfortunate series of remarks at a luncheon for women science writers and journalists at the World Conference of Science Journalists in Seoul, South Korea: “Let me tell you about my trouble with girls … three things happen when they are in the lab … You fall in love with them, they fall in love with you and when you criticise them, they cry.”

Today he's said he's sorry for having made those remarks to that particular audience, suggesting first that it was a misunderstood attempt at irony, but he stands by his comments: "I just meant to be honest, actually."

He went on to say that, "It's terribly important that you can criticise people's ideas without criticising them and if they burst into tears, it means that you tend to hold back from getting at the absolute truth....Science is about nothing but getting at the truth and anything that gets in the way of that diminishes, in my experience, the science."

What I'm thinking about is how the documented tendency of men (or should I say boys?) to be overconfident in their self-assessment of ability in science and math might diminish the effective functioning of a research group? Shelley Correll's work showing that "males assess their mathematical competence higher than females who perform at the same ability level and who receive the same feedback about their mathematical competence."makes me wonder if when Tim Hunt criticizes a boy's ideas, the boy discounts the criticism because he is overconfident.  [Amer. J. Soc. 106 (2001): 1691–1730.] #justbeinghonest

Hunt's remarks should come as no surprise, given what he said in this interview:
Labtimes: In your opinion, why are women still under-represented in senior positions in academia and funding bodies? 
Hunt: I'm not sure there is really a problem, actually. People just look at the statistics. I dare, myself, think there is any discrimination, either for or against men or women. I think people are really good at selecting good scientists but I must admit the inequalities in the outcomes, especially at the higher end, are quite staggering. And I have no idea what the reasons are. One should start asking why women being under-represented in senior positions is such a big problem. Is this actually a bad thing? It is not immediately obvious for me... is this bad for women? Or bad for science? Or bad for society? I don't know, it clearly upsets people a lot.
If he wants a hint, it's bad for science.  Restricting the pool means you get fewer breakthroughs. Last fall I built a simple Monte Carlo simulation of "science" to find:

"I wonder if framing the issue of women in science as one of equity to individuals — it's not fair to deny women the opportunity to play the game — blinds us to the costs to science as a whole of unwittingly perhaps, but systematically regardless, hampering the participation of women in science. We see science as a meritocracy, where the best people and the best ideas bubble up and we fear efforts to play fair could undermine the overall quality of science. But are 'fair' and 'best' necessarily at odds with each other in the arena of scientific discovery? Stated another way, at any given time do discoveries go unmade because the person who might make them is not in the scientific workforce?

In an attempt to roughly quantify the answer to this question, I built a simplistic computational model of scientific discovery. The model used a Monte Carlo approach to create a scientific community from a larger population of one million. Inherent scientific ability was assumed to correspond to a single integer variable, with values ranging from a low of zero to a maximum of 200 and to follow a normal distribution (σ = 30); potential scientists were assumed to have a score above 140 on this measure. The parameters were set such that one discovery was expected per thousand potential scientists. Discoveries were not uniformly distributed throughout, but weighted such that higher ability scores were more likely to have the potential to make a breakthrough.

A model scientific community was selected from the full population using a weighted random selection procedure, which again favoured the 'best' end of the pool, and the number of 'discoveries' made by this select group were added up. The simulation was run for a total of one thousand trials. Models that limited the selection of women to 10% of the pool incurred a 10 to 15% average penalty on the number of discoveries made, compared with pools with roughly equal numbers of men and women.

Having 10% of potential scientific breakthroughs go undiscovered may sound insignificant, not worth the bother of figuring out how to bring more women into a field. That is, until you are asked to take a 10% pay cut, or if I ask which of the top-ten organic reactions you would prefer to do without. Heck? Diels–Alder? Within the limits of my model, choosing fairly with respect to gender does not compromise the quality of the scientific community, in fact, the opposite is true." [Nature Chemistry 6 (2014): 842–844.]



Correll, Shelley J. “Gender and the Career Choice Process: The Role of Biased Self‐Assessments.” American Journal of Sociology 106 (2001): 1691–1730.  See also the discussion in Cordelia Fine's Delusions of Gender pp 48-50.

Francl, Michelle. “Seeding Crystallography.” Nature Chemistry 6 (2014): 842–844.  ($)

The Secret Language of Chemists: Why does butter make us think of four?

Butter and why it means "four" to chemists.
c. Michelle Shrank CC license
Every time I take a stick of butter out of the 'fridge I think of the number four.  No, it's not some odd form of synethesia, but a side effect of being a chemist.

Names of molecules and their structures are (sometimes) related to each other.  You can think of organic molecules (molecules that are principally built from carbon, hydrogen, oxygen and nitrogen) are constructed like Lego buildings.  There are blocks, each block has a name and you click them into place (that last isn't so simple in practice) to build a molecule. So knowing the secret language of chemistry gives you a window into the structure, which in turn is a clue how the molecule works and what it might be good for.

So why does butter make a chemist think of four?  The stem but — pronounced like "butte" the land formation  —  is used to indicate a four carbon building block.  It is a back-formation from butyric acid, responsible for the smell of rancid butter, which has four carbons in it.  (Butane, a flammable liquid used in lighters, is a four carbon chain.)

The rest of the secret code:

meth- 1 carbon
another back-formation, this time from methanol (wood alcohol) from the Greek root for wine (μέθυ ≡ methy)

eth- 2 carbons
from the Greek, ether, the uppermost reaches of the atmosphere; as seen in ethylene (the sweet smelling flammable gas produced by ripening fruit, particularly bananas.  It's technically a hormone!)

prop- 3 carbons
This one also comes from the Greek (surprise!) for proto and fat, as propionic acid was the first "fatty acid" (acid molecules that also behave like fats or oils); propane gas used in stoves and grills has three carbon atoms and 8 hydrogen atoms per molecule.

but- 4 carbons
From the rancid butter!

after four the prefixes are derived directly from the numbers in the chain
pent- 5
hex- 6
hept- 7
oct- 8
non-  9
dec-10
undec- 11
dodec- 12

So when you see references to the food additive BHA, which stands for butylated hydroxyanisole, one thing you can say about it is that it has a four-carbon unit in it somewhere.  Though, I admit, that's not much help in answering the important questions: What does it do, and how will affect me?



Chemists' Magic Decoder Ring

What if we gave out chemical name
decoders instead of periodic tables?
Vintage magic decoder ring.
Used under CC license. Source.

Earlier this week the Royal Society of Chemistry released a report on the public perceptions of chemistry.  It's a great set of data for those of us who write and talk about chemistry outside of the classroom environment. This infographic sums up the key findings, one of which is that people lack confidence in talking about chemistry.

Stuart Cantrill, chief editor of the journal Nature Chemistry (full disclosure, I contribute regularly to the editorial content of the journal), noted in the discussion which followed the presentation that chemistry uses a very "specific technical language...if you're not talking the same language as someone you are talking to, they can't engage with you...it's almost like a secret language that only chemists know." (Listen here starting at 25:45)

It made me wonder if we should hand out a cheat sheet on how to decode chemical names and functionality instead of the traditional and iconic periodic tables at events. It might make for less splashy t-shirts or shower curtains, but then again, Andy Brunning of Compound Interest makes amazing graphics on all sorts of chemical themes.

Next post:  the secret language of chemists and why butter makes me think of four!

Say that again? Why chemical names tangle on the tongue

Michael Pollan's Food Rules famously advises not eating anything with an ingredient a 3rd grader can't pronounce.  The rule is more about eating closer to the production point, about consuming things that are familiar to 3rd graders (like broccoli and eggs), than it is that chemicals that are hard to pronounce are inherently hazardous, though in some corners it's taken on just that sort of magical thinking.

Why are chemical names so weird looking? Take 2-Methyl-5-(6-methylhept-5-en-2-yl)cyclohexa-1,3-diene for example.  It certainly doesn't sound like anything you would want to eat, but it is just the formal name for the compound that is the main component of ginger oil, and responsible for much of ginger's characteristic bite.  Like crystallized ginger, ginger tea, or a good stir fry?  You've eat this compound in significant quantities.

Chemical names can look like alphabet soup, but they are a way for chemists to paint a compact picture of the structure, or at least to point out key structural features.  Why is it so important to know what a molecule looks like?  The structure of a chemical is what determines its behavior, how it will react, in the body and in the environment.  It's key to understanding how things work on the molecular level:  structure determines function.  Period.

Formal chemical names, called IUPAC names (for the International Union of Pure and Applied Chemists, the body that decides on everything from what new elements will be called to the standards for drawing molecules), are in fact a code from which the full structure of the molecule can be unraveled.  Most of the time chemists call chemicals by a common name, which also gives clues to the structure, though not so many that the molecule could be unambiguously drawn.

So back to 2-Methyl-5-(6-methylhept-5-en-2-yl)cyclohexa-1,3-diene, which looks like


The "methyl"s (METH-ill) in the name refer to a CH3 group. What, you don't see any CH3's here?  This is a chemical line structure, where each intersection point (or end of a line) is a carbon atom, and the hydrogen atoms have almost all been left off.  A chemist sees this structure as 

with the methyls at either end.  The little red dots count off a seven membered chain, the "hept" in the name. The "cyclohexa" (sigh-clo-HEX-uh) points to a six membered ring, while "diene" (DIE-een) means it has two double bonds in it. The numbers tell you where to attach methyls and draw the double bonds.  The little "2-yl" (too-ill) means the seven membered chain is linked to the six membered ring at the second carbon in line.

So these tangled names to a chemist are codes, and once you can read the code, even a bit, you can begin to see a molecule taking shape in your mind when you read its name.

This pronounces as 2-METH-ill / 5, 6-METH-ill-hept 5 een 2 ill cyclo HEX uh 1 3 DIE-een.

There's probably a reason this is better known as zingiberene, which suggests its common origin (ginger or zingiber), but not much about its structure.

Formaldehyde: not just for dead things

Next spring I'm teaching a course on the physical chemistry of food while a colleague is teaching a course on the analytical chemistry of foodstuffs.  Among other science texts we'll be using John Coupland's Introduction to the Physical Chemistry of Food, but I'm also collecting short pieces to put some of the work into a historical and social context.

These aren't actual biological specimens preserved
in formaldehyde, but Halloween decorations.  
Though these days we tend to think of chemists as the untrustworthy creators of toxic, artificial everything, the systematic training of chemists was driven in part by the desire for the public to know what was in their food and water.  In 19th century Britain, hundreds of chemists made their living testing the purity of everything from butter to well water.  So when the Food Babe tells you there is something "yucky" in your food, the reason we know it is there is some chemist developed a careful protocol for its analysis, and other chemists tested the material.
Molecular structure
of formaldehyde


I've been thinking about formaldehyde, one of the simplest organic molecules (to a chemist, organic means made up mostly of carbon and hydrogen atoms, and has nothing to do with whether the molecule is synthetic or natural or...). Last year, formaldehyde, which is a preservative, was in the news because Johnson & Johnson had agreed to remove it from baby shampoo, though as Matt Hartings and Tara Haelle clearly pointed out in a piece at Slate, it was in such low concentrations that it posed no risk to babies (who, they point out, themselves contain substantial amounts of formaldehyde.)

Pepsi is reformulating Diet Pepsi to take out the artificial sweetener aspartame. The Food Babe is crowing that she and her army have forced Kraft to remove the so-called coal tar dyes (e.g. tartrazine/FD&C Yellow 5), to be replaced by natural colorings from spices.  What does all this have do do with formaldehyde?

From the Food Babe's 'campaign' literature.

To start with those natural colorings - at least one of them used in the UK version of mac and cheese, beta-carotene, isn't extracted from natural sources but synthesized from petroleum feedstocks (just like those coal-tar dyes).  One of the starting materials:  formaldehyde. The other natural colorings on the table — annatto, turmeric and paprika — are not quite what you might think either.  While you might imagine shaking in some spices from a quaint bottle, the spices themselves are not used as colorants, the colorants are extracted using organic (not that kind of organic, the chemist's kind of organic) solvents, such as ethyl acetate.  It's unclear to me why these colorants, particularly beta-carotene pass muster with the Food Babe.


Aspartame is sometimes vilified because it is metabolized into methanol and formaldehyde in the body.  Which it is.  You already contain a lot of formaldehyde, about 12 milligrams per liter of fluid in your cells.  One source is metabolism of the amino acids, particularly, serine and glycine (in naturally occurring proteins), from which your body scavenges methyl groups (CH3) to pop on to various structures.  Aspartame is a very tiny protein, so the same pathways that produce methanol and formaldehyde from natural sources, dismantle aspartame to yield methanol and formaldehyde, though the amounts produced are tens of times lower than what comes from eating apples and fish.

Because formaldehyde occurs naturally in foods (about 5 mg per serving in some fruits, fish is also high, pectin containing fruits such as apples add significantly to the amount of formaldehyde ingested), our bodies have a mechanism for dealing with it, we process about 60 to 100 grams of formaldehyde a day and do so quickly.  Formaldehyde has a half-life of about 1 to 2 minutes in the body.



Why are those spices colored?  What does it have to do with quantum mechanics, flamingos and canaries?  Read this post, the very first one written for the blog,  to find out.

References
EFSA report on endogenous versus exogenous sources of formaldehyde.
EFSA review of curcurmin, a component of turmeric, which had been suspected of being genotoxic.