Transcribe your podcast

Rationally speaking, is a presentation of New York City skeptics dedicated to promoting critical thinking, skeptical inquiry and science education. For more information, please visit us at NYC Skeptic's Doug. Welcome to, rationally speaking, the podcast, where we explore the borderlands between reason and nonsense. I'm your host, Julia Galef, and with me is today's guest, Professor James Evans. James is a professor of sociology at the University of Chicago. He's also the director of the Knowledge Lab at the Computational Institute, where he's a senior fellow.


They use big data and high performance computing to solve a wide range of scientific and social scientific problems. So that's the area we're going to be focusing on today. One of the main things that James study is, is what's called meta knowledge. That's knowledge about knowledge. So he's asking questions like, has the pace of scientific discovery changed? And if so, why? How do scientists decide what to research, how well the different scientific fields communicate with each other?


And he's approaching these questions in a very empirical way by cashing out these concepts, these questions in concrete terms, things that we can measure and get data on. So, James, welcome to rationally speaking. Thank you, Julia. First off, I'm curious about how unusual your approach is within sociology of mining, the data that gets produced in the process of doing science. Because my impression was that sociologists have asked questions about how science works, how the scientific process works for decades.


But I've mostly heard them asking these questions in qualitative ways, for example, looking at case studies. So how how new or strange is your approach?


Well, I would say there is a field called Science Studies, and it is composed not just of sociologists, but also anthropologists and historians and philosophers of science that has been interested in these questions for quite some time. It has tended to be mostly qualitative inquiries, and that qualitative turn really took place in the 1970s and early 80s in response to a kind of more quantitative sociology of science and sociology of scientists, which really wasn't focused on the ideas of science, but the institutions of science.


And and this new science studies was focused on the ideas and the epistemology. And it was and it tended to be qualitative in response to that. That being said, I would say just in the last few years, there's been a much wider reception and interest in this work from within that community, partly because in the world of science, one of the things that's changing is this emergence of ubiquitous digital data and big data and massive computation. So in some sense, there have been more and more who have studied, who've come to study privacy and big data and all these things in ways that really can only be or often only approached with computation and big data themselves.


And so I would say there's been a growing reception within that community to exploring different kinds of representations and using a really wide range of data to ask questions that that deep questions about where knowledge comes from, how questions and answers emerge, how certainty comes about with this data.


So let's talk about what kind of data you're using. I briefly defined meta knowledge in my intro as being knowledge about knowledge. So maybe you could just elaborate on whether that's a good way to describe what you mean by meta knowledge and what kinds of data you're collecting that help us learn about knowledge. Yeah, I think I mean, it is it's knowledge about knowledge, you know, how can we understand from the way in which knowledge has been approached socially, technologically to think about the future and to reconfigure the future in ways that solve our problems, that answer our questions.


And so we're using really all of the data that's available to us. Some of that data is archival. So that involves hundreds of years in some cases and some mature fields like physics and chemistry of publication data that involves hundreds of years of patents. It involves prepublication data like conference proceedings and transactions. The first scientific journal, quote unquote, wasn't really a journal at all, but a compilation of meeting justifications. The philosophical transactions of the Royal Society, which began in sixteen sixty, was really kind of a conference proceedings plus transactions or letters.


So we use all those things, but more now as well, right. So we with the emergence of really powerful machine learning technologies that allow things like your iPhone to be able to recognise your speech, they're also able to recognise and extract things like images and equations and other concepts from papers more and more systematically. And so all that becomes available data and much of it's shared informally in blog sites, on the Web and in other ways that we also extract.


So both formal text from there. Also, there's this enormous explosion of prepublication archives where people share early manuscripts and they actually update over time things like the archive for the physical sciences. That's where much of mathematics and physics kind of takes place and is updated over time. So all these data. But we're also interested in opinions and intuitions by individuals. And so we've been working to create intelligent and adaptive surveys and ranking mechanisms so people can kind of share their maybe even not completely articulated intuitions about which things are about, which they're more certain or about which there might be more interested or they might attend more to.


So they're we're interested in kind of putting all these data together through a kind of massive integration into models that allow us to understand how it is that that science as a system thinks so.


James, one of the questions I know you've investigated that I've wondered about myself over the years is whether the pace of scientific progress has changed over the years and now how so? And why and I've just had sort of off the cuff debates with people about this. And what I think is so interesting is that there are these different, contradictory apriori models one might have about this question, where on the one hand, someone might say we should expect that the pace of scientific progress would slow down over time, because this is basically a sort of low hanging fruit model where we solve the early the easy questions early on.


And as we used up the easy questions, we're left with harder and harder questions that take longer and longer to solve. So pacemen slow down. But on the other hand, there's this other apriori model someone could have that says, well, we should expect the pace to speed up because each new scientific discovery is the product of the pre-existing knowledge that we have. And so the more knowledge we accumulate, the more we should be able to discover. So maybe it's not fully exponential, but there's some kind of exponential or synergistic component there that determines the pace of progress.


And so far, for me, these debates have all been kind of apriori, maybe with a little bit of anecdotal squinting at examples. But I'd be curious to hear what you've discovered about this from the data and how you've defined scientific progress, because that's not really a straightforward thing to define. So, yeah.


So let me kind of maybe rephrase that question back to you in terms of the way in which people have talked about it. In my field, the burden of knowledge is the way in which most people talk about that first question, the low hanging fruit question, or the low hanging fruit theory that we're basically, you know, we've picked the low hanging fruits and we have to basically climb higher and higher in the bounds of knowledge before we want to gather.


And it turns out that there have been some very interesting papers in the last few years by Benjamin Jones up at Northwestern University, Bruce Weinberg at Ohio State University. And colleagues have explored this possibility. And they do show over the course of the last 100 years a progression in the age of scientists until they write their first big paper by medical scientists, until they achieve their first big grant Nobel Prize winners, until the age at which they do the work that gets recognized for the prize.


It turns out that they don't fund that for invention, however. And the other effect that you're talking about, I. I would say people often talk about it as what Stuart Kauffman at the Santa Fe Institute labeled the adjacent possible.


So the idea that every time you end up discovering new bits of knowledge, they become components that can be combined with the things that were available before. And so what's going to happen at the next period is only one step from what takes place in the current period, but that everything available to you at that period can be combined in this in this new way. And I would say there are a number of there are more questions than answers. Maybe that's depressing in this in this domain for me.


Hey, I mean, a lot of my episodes are about philosophy, so that's nothing new for me. Yeah. And the more we learn, it's kind of like a growing sphere. You know, the surface area and the outside or the questions that we don't have answered in that sphere is continuously growing with the size of what we know. So some things that it appears that we know is that, yes, it's certainly taking people longer to produce professional science in some ways.


Now, what we don't know is whether that's because we've exhausted all of the low hanging fruit or if we've exhausted all the low hanging fruit on the trees that we've chosen to pick from, which is to say disciplines or fields are like nexuses of questions and methods or approaches to kind of solving those questions. And by designing a field, you know, like physics or sociology or anthropology or ecology, we're basically taking a class of questions and we're mixing them with the class of approaches.


And that's, you know, a discipline. And so the question as to whether that burden of knowledge is the result of just, you know, an exhaustion of all simple things or just an exhaustion of those simple things that we've chosen to focus on recently is a big question, because it seems that sometimes you see interdisciplinary work end up yielding outsized benefits, but it tends to yield those outsize benefits when there's been a focus on the disciplines prior. So it's like, you know, as more focus goes into these nexuses of questions and answers or questions and approaches, that it provides more opportunities for there to be outsized benefit from linkages and kind of arbitrage from questions and answers that take place outside.


So I think one other aspect of that that we've actually studied recently is how it is that as fields mature, the questions and answers of science change. So, for example, as a field early in the life cycle of a field, it turns out it ends up being really efficient for people to exploit the structure of the knowledge that they already know. So things, molecules, diseases that look central in this kind of space of scientific questions and answers end up getting disproportionately focused on that ends up being a really efficient strategy.


What we find professionally. So this is a paper that I published in the Proceedings of the National Academy of Sciences last November. We find that as fields become more mature, the most efficient question for the kind of reconfiguration of scientific knowledge ends up being a very risky question.


And scientists only have so large a budget of experiments or investigations that they can engage in. And so what we find is the most efficient way of discovering the chemical relationships that have been kind of discovered over the last 50 years is very different from the pathway that science as a system has in fact investigated them.


Can you explain a little more what you mean by efficient?


If you were to just create a kind of a strategy for asking questions and you were to unleash that strategy on kind of a system of relationships and it would use its logic, its strategic logic to ask the next question, we basically find that the strategy from the scientific community, the scientific community has pursued both in patents and publications over the last 50 years, ends up being conservative and increasingly so, which is to say they pick really central things, you know, like really central biochemicals and molecules, and they investigate things that are very close to them already in that scientific space.


OK, so with this, I'm trying to make the analogy to fields I'm a little more familiar with, which tend to be in the social sciences instead of the hard sciences. So would it be something like in, say, cognitive science or behavioral economics that would be taking if a phenomenon that's already well-established, like, well, I was going to say priming. That's a bad example, maybe because your mind recently. But let's pretend that priming hasn't been undermined.


Prospect theory or something like that, you know, and it would just be, you know, doing so.


The conservative non efficient thing would be to just do more experiments, looking for more examples of priming in slightly different context. And had already been studied precisely, whereas the the more efficient but like riskier thing would be to try to discover a whole new phenomenon that's not priming. Right.


And to connect things that are less dominant than priming with other less dominant things that distant parts in the scientific system.


So it turns out that if you want to just discover what was discovered, you can discover this much more efficiently through a strategy that goes precisely in the opposite direction. Is that with which science has gone, which is to say, you start off by asking conservative questions that exploit the knowledge you have, but then you recognize this diminishing marginal returns to studying priming in the next context, for example. Right. And and so then you start looking at other maybe less pronounced heuristics or biases.


If we were taking the psychological example and looking at those in other contexts and and because this this network of scientific claims ends up being fractal, then that ends up being the the most efficient strategy is basically a strategy that begins conservative and becomes really quite risky later on versus the strategy that we that we see, which is actually the most efficient for discovering about 15 percent of what we know and unbelievably inefficient at discovering 100 percent of it because scientists need to stay in the game.


You know, they can't amortize really risky experiments because each scientist is like a little entrepreneur. And if they if they have enough failures, then they go out of business. And so basically, the system, the strategy of science seems to optimize getting tenure and job security while also exploring the scientific space. But it certainly isn't optimal at the level of the system for exploring the space of scientific possibilities.


So I understand the logic of incentives there. But why has that increased over the years? Has something changed about the way we award tenure to scientists or about the job market or something like that? Or is it something about the content itself that the questions themselves have gotten easier to investigate conservative things than efficient ones?


Yeah, I mean, I think that's a great question. I think it certainly appears as though there's an increasing preference for focusing on research within field and not across field. And that could be the result of the engorgement, the enlargement of science, which naturally places competition in a new way. It could many have argued, and I would say that the evidence really isn't completely in that it also has something to do with the decreasing in the U.S. federal largesse for for science.


And so there's basically, you know, there's more bodies that are asking for relatively less dollars. And and so there's you know, that puts a pressure on the system of advancement. So I think it's definitely some mixture of those two forces. I think that the jury's out on exactly what that mixture is. But we're very interested in exploring it and have been by looking at things like competition and funding more directly. But the jury's still out.


Yeah, I also wonder whether some element of the increased conservatism might be a thing that we should have been doing all along. But weren't you could say they're diminishing marginal returns to discovering priming in, you know, one additional new context. And in a sense, that's true. But given the many problems with science and especially social science that we've been uncovering lately or that have been getting more public attention lately, maybe all of those sort of minuit, more conservative experiments are sort of they're playing the role of replications where the more we look at a given phenomenon in different context, the more we're going to notice if that phenomenon was on shaky ground to begin with.


And so maybe the stuff that looked like efficient risk seeking or at least risk neutral exploration early on was just sort of sloppy because we weren't actually trying to be meticulous and make sure that what we were discovering was, in fact, real.


I think that's a very generous interpretation of of the system. And I think it's actually quite doubtful. And the reason I say that is we've been engaged in a large number of quasi replication studies, both in the social sciences and in the natural sciences and exploring, you know, drug gene interactions and changing interactions who are been scraping many millions of these from text and then looking at how they stack up against intensively replicated high throughput experiments. And there are very few cases in which people are precisely replicating work that was done in the past.


So they're engaging in work that is really exploring this kind of adjacent possible, but in a very local way.


There are very few of these kind of streams of of claims that. End up being really independently replicated multiple times, you know, by different kinds of research groups, because the scientific returns to those replications are very low. It's very difficult to publish a replication of a finding that, you know, that's not controversial or that's not seen as really substantively critical for understanding a particular system. So I agree that this move that we see, I think in psychology, in the social sciences, I think it really began in the biomedical sciences and medical genetics.


I think it's a good thing. I think that really trying to decrease our false positives is good for certain kinds of problems, but not for all kinds of problems.


I mean, there are a lot of problems where we might want to do the opposite, which is to say we want to decrease false negatives. And this is not only true in tracking the sources of terrorism, but there are other classes of problems where we we don't just want wrong positives, but we don't want to miss hidden positives. And so I think both of those forces, I would say this replication movement, which I think is an exciting one, and I think big data and, you know, exploiting high throughput experiments and this enormous kind of archive of history can allow us to get a handle on that.


But I think we also need to be able to kind of model where it is that scientific attention has gone to figure out what are the classes of questions that systematically haven't been asked that could have been asked, that could yield valuable and interesting and promising results. Cool.


Well, shifting tracks a little bit. Another question that you've investigated that I think is really interesting is that the cultural landscape of different scientific fields where, you know, we're used to we're used to thinking of culture as being literal culture, a shared language, shared customs and thereby ease of interaction, et cetera. But as you've noted, there's there are also cultures within science. There's jargon that some fields use that is incomprehensible to other fields. There's ways of analyzing data that are common in some fields, not in others, etc.


. So how did you how did you approach the question of where the cultural boundaries lie across the different scientific fields?


I think there are a number of ways in which we've approached it.


I'm guessing the article that you're referring to in which we talk about finding these cultural holes between scientific fields is one where we we trace out the the jargon, the frequency of phrases in different fields of science and then use those to to trace how it is that fields that cite one another that may be relatively close in their apparent attention to the other field.


What they would have to know, what they would have to learn from the kind of the codebook of science to be able to read that work in their neighboring disciplines. And we find that some disciplines like molecular biology have an enormously large shared vocabulary.


So it's shared among those scientists, but it's very different from the rest of science. So they can speak with one another relatively easily, whereas other areas like the ecological sciences have very little shared vocabulary. And so there's enormous balkanization between people who study bears and people who study rats. And and the social sciences are somewhere in between where, you know, we share a fair bit of vocabulary. But their boundaries between certain domains of social scientific inquiry, especially those that are intensively invested in statistical and causal inference, and those that take a more discursive, interpretive, qualitative take.


And were you when you mined the language used by these different fields, were using something kind of like Amazon, statistically improbable phrases where you looked at phrases that were disproportionately common in one field than in others? Or how did you do it? Right.


So you can imagine across the space of all possible phrases that that would form what we'll call a probability distribution. And there are ways of assessing that the distance between two probability distributions. Right. So this one culture will very frequently talk about some things and very infrequently talk about others and and never at all talk about others. And so their metrics or there are divergences like the Kobuk, Lavallette Vergence and and the Chance and Shannon Metrick distance that that calculate the distance between these probability distributions of these probabilities of attention to different phrases.


So what's consequences follow from there being these varying degrees of shared culture versus non shared culture? Why does it matter? What does it mean?


Well, I think, you know, I'm interested in this study both instrumentally, but also humanistically. Like for me, it doesn't have to matter, but I think it does, you know, but I think it does matter.


I think that you. We have a situation where their ideas and phenomena that end up being highly relevant across a linguistic boundary that make it very difficult for ideas to actually pass across that boundary. And so this is just one of the dimensions. It's not the only dimension that shapes that kind of flow of ideas. You know, in earlier work, I studied plant biology and applied biology. And molecular plant biology represents about only about a tenth as much research attention and produces mammalian biology, which is, of course, relevant to medicine and health and, you know, all these other kinds of things.


And it's also showered with more resources and more award. And so the consequence of things like language boundaries, but also in this case that I'm describing status boundaries means that ideas flow from one area to the other.


They flow down the status hierarchy, but they don't follow up the status hierarchy. And they tend not to flow across these these language boundaries, and especially when the language boundaries involve, you know, very special, esoteric language. And so I think it highlights the fact that that language and jargon and do lots of things on the one hand to do very useful things. They make it so that we can talk about things that are of common interest in a very short hand, efficient way.


But they also exclude other people and they make it very difficult for somebody outside the field to understand that short form code.


And and so we're trying to exploit that, to suggest hypotheses, to propose questions to large scale experiments that would likely not have been asked by a scientist perusing the literature as a function of these kinds of cultural holes.


Hmm. I have now that you talk about this, I know I've observed this anecdotally in so in the sort of area of rationality, research and theory that I'm I've been most focused on, it's theoretically situated at the intersection of a bunch of different fields that include cognitive science and economics and philosophy that are all in different ways, asking questions about how do people form their preferences and in what sense are people being irrational if their actions don't seem to match their preferences, how should we interpret it of people's preferences seem inconsistent, that kind of thing.


And there is a lot of non shared language and a lot of questions that I would have expected one researcher to have heard about. But they haven't in part because they're being discussed in these kind of little enclaves in fields that use different language. And I think the same thing happens in in the study of, say, consciousness, where you have computer scientists and philosophers and cognitive scientists and neuroscientists all approaching the same questions, but from different angles with different language.


And that does make I mean, that makes collaboration, interdisciplinary work much harder. At the same time as I was complaining about that, I was thinking about a potential upside, which is maybe if different cultures develop different approaches to the same problem, you get this kind of diversity of approach that's actually a valuable richness, meaning maybe we're more likely to hit something great because we're covering more space than we would if everyone was working together.


Yeah, OK. And I think that's a very interesting and a validated intuition in a number of ways. So I have a colleague named Karim Lakhani at Harvard Business School who's been studying group problem solving behavior.


So he runs a kind of code competition, crowdsourcing environment for NASA and for a number of other environments. And he recently performed a study in which he compared competition approaches to solving certain programming problem to what I'll call a wiki approach. So the competition approaches or, you know, individual groups, individuals or individual teams end up, you know, trying to solve a certain problem and then also a wiki approach where everybody posts every interim solution along the way and are aware of everybody else's solution.


And what happens in that setting is that basically the entire community hill climbs.


So whoever has the best partial solution, everybody flocks to and then pursues that, which if the problem is simple in this case, the problem was simple. That was that ended up being the most efficient approach, ends up being very efficient. If the problem is complex, then you have what we often call in this kind of high dimensional universe of possible problems and solutions. A local máxima. You know, you kind of reached the hilltop, but you can't go higher there.


But you have to kind of travel to another part of the landscape before you can find the real peak. And I've found this in modern science that with the Internet, increasingly, people are becoming less independent. They're becoming more aware of other people's research. They're flocking together. And it does appear to decrease the. Amount of local knowledge and kind of independence of various investigations, you can imagine that, you know, five different cultures of research, if they're pursuing different research programs, could be thought of as five independent experiments.


Right. And if you had them all talking constantly with one another and they converge to a winning measure and a winning approach at every stage of the game, then you have one independent experiment.


So, yeah, and I think I think that's there's certainly services that independence provides.


And language is one of the ways to insulate people from that. There's one other thing that I wanted to make sure to ask you about. I forget which paper I read this and maybe it was the science paper. But you were talking about one benefit of looking at meta knowledge as being the potential discovery of what you called ghost theories, which are these unstated assumptions are premises, paradigms that kind of exist in the background of our thinking that maybe we're not always explicitly pointing at that, just shape the way we approach problems.


So I was hoping you could just talk about what, you know, give some examples of ghost theories and and talk about how if your your investigation of better knowledge has uncovered any of these are you know, if not yet, then how it might.


So so there is a famous paper in the history of science by a historian named Paul Forman called Winmar Culture, Causality and quantum theory adaptation by German scientists and mathematicians to a hostile environment in which he makes this claim that certain achievements in quantum physics and marma, Germany, in the 1920s ended up being partially the result of this atmosphere in which those scientists found themselves.


And so you have certain approaches in this case, rejections of things like causality, determinism and materialism that are really common in the air that end up shaping their intuitions, he argues. That led to a kind of, you know, a particular scientific approach. I mean, if you look back at Isaac Newton's time, he is producing deterministic systems in a state that is largely deterministic, absolutist monarchy and like everything is kind of has an absolutist flavor. And today, even though sometimes deterministic models may perform just as well as stochastic probabilistic models, probabilistic and stochastic models capture more of the mood that everything is contingent.


And so I think there are a number of ways in which certain ideas and theories can feel more or less natural because a certain intellectual value, because every because other things, other institutions behave that way or appear to behave that way.


And so it ends up acting like a kind of a soft or a subtle prior aghost theory that you end up kind of confirming in the sense that if you come towards a theory or a kind of a model that looks the same way, then it feels right for reasons completely independent of its experimental validation.


Yeah, well, so that that feels that's a very compelling model to me. And it sort of rings true. But it one reason I was interested to ask about this is it feels like the kind of thing that would be really hard to pin down in a way that you could test for it. So I was curious how you like like are there kind of quantitative approaches to uncovering ghost theories?


Well, I think, you know, this is it's hard, right? Precisely because people don't articulate it. And it requires a phenomena where you have multiple cultures that are focusing on the same class of phenomena who are themselves unaware in some sense that the diversity that that surrounds them. So this is where I think publications and patents and other articulated inscription based data fail us. And we're using ranking activities and sorting activities and other kind of intelligent adaptive survey and information tasks of people that allow them to kind of reveal how they would think about things.


So, for example, I published a paper with some colleagues, with a biologist, Andre Rojansky and some other colleagues a few years ago in which we explored how scientists who study metastasise the cancer process by which a mature cancer ends up spreading to other tissues and inflaming them with a cancer. And we asked people who existed in very different parts of the intellectual and scientific space. So cancer physicians and surgeons who are really, you know, kind of holding a 3-D cancer in their hands as they extract it.


And epidemiologists who are looking at just like scans and reams of data and geneticists who are thinking about, you know, drivers that might turn on or, you know, kind of radicalize abnormal growth in the cell.


And what we found was that and what we did was we gave all these scientists a kind of like textbook canonical pathway of the various stages of metastasis. And we asked them if they would add anything that we'd missed or if they would rearrange the steps of the. Process, and we found that every one of the roughly 30 physician scientists reordered that sequence in a different way. So there was this enormous diversity and they in some ways assumed that everybody else there was actually much more diversity than there was in the published literature.


So it's kind of like the opposite of a ghost theory. It's like people actually assume that there's agreement on these things, whereas there's only agreement and they're very local group of people who study this particular problem from their particular angle. And that certainty or that that feeling that other people share, that opinion ends up again suggesting that their assumption about agreement is itself a kind of a ghost theory, which is in this case, wrong. Right.


Well, maybe the theory here is that there is a correct ordering. Right. Which the actual data shows. Well, I don't know, maybe maybe one of those small groups is actually correct and the others are wrong. But that wasn't the sense that I got, right?


Well, some some of them were. Yes, some of them were small differences, but some of them were large, like, is it possible for a metastasized cancer to to then subsequently metastasize? There are really two big groups, you know, some who believe that it can't and others that believe that it can.


So exactly the ghost theory here is that there is that there's agreement and and correctness and true, it may be that one of these one of these 30 persons is absolutely correct. And it may be that they're as of yet undiscovered elements in that process that will reconfigure our view of it for everybody. But that's I mean, that's kind of one example of how you can tease out intuitions about, you know, how probable people think certain kinds of explanations are, even if that doesn't come out in their writing, because maybe they're not able to test that thesis to their epistemological standards that their field or maybe they just assume that everybody agrees with it when they may or may not be correct.


Yeah, that's interesting. I had I had sort of assumed that the study of ghost theories would have to involve a lot of qualitative creative hypothesis generation work at the outset and that the quantitative, more like rigorous hypothesis testing would come in only later after you'd sort of uncovered some promising potential ghost theories. But it seems like you're pointing at a kind of a general approach that may actually be pretty fruitful and yielding ghost theories, which is asking people not just object level questions about their field, but their impressions of other people's answers to those questions as well to uncover sort of false consensus.




And it also it I mean, there are also traces and these are softer traces, but in approaches like the one I described earlier on. So, for example, where you're able to identify a systematic field level strategy for asking certain classes of questions and following questions with other questions, that suggests a kind of, you know, hidden theory that those next questions are the fruitful questions to ask. And they trace out what you imagine, you know, would be the kind of the class of scientific theories that are worth asking, developing, testing, etc.


So you can kind of see from the trajectory of scientific questions what assumptions would need to have been held for people to have made the inference to move from from this thing to that thing. Does that make sense?


Hmm, I think so. Do you have any examples from social sciences, either of of those theories you've uncovered or that others have uncovered or have of your own hypotheses about what kinds of ghost theories might be operating? That's a good question.


I mean I mean, I am a social scientist. Most much of what I've studied is in the natural sciences, in the biological and the physical sciences, that because the data is just more concrete's in the quantifiable in the hard sciences.


Well, that's that's that's very interesting. So the place in which I started studying this was chemistry, molecular biology elsewhere. And it turns out that, yes, it's because the nouns and the verbs are so well behaved.


But, you know, a chemical in a reaction are you know, they're described in very similar ways, systematically over large corporate text. And and you can actually quantify even this.


I have a recent paper where we explore how ambiguity works across fields and we measure ambiguity as a function of of how frequently other meanings are synonyms with slightly different patterns of meaning end up basically being replaced with each other conditional on context.


And what we find is that indeed, chemistry, biology and medicine use natural language in the most templated, precise ways that humanities and the social science. Use them in in much more ambiguous, ambiguous ways, I think ambiguous game with a little Freudian slip there. I got to think, well, OK, so actually the paper is the paper is about an ambiguous and ambiguous game. And it shows actually that regardless of which field you're in, conditioning on the amount of ambiguity that's there, that the more the more ambiguous the claims are.


In your abstract, in your title, in your paper, the the more likely it is for people who build on your work to build and engage with others who are also building on your work. So basically really important, ambiguous works like Darwin's Origin of Species and and like current structure of scientific revolutions and the kind of humanistic and social sciences and setting up powerful debates. And those debates are the things that fuel scientific fields, not canonical answers, you know, that solve those are the ways to kill a field.


So, yes, I think ambiguous games are precisely taking place in the social sciences, but they're also taking place in the sciences. I mean, really important work often ends up being important because it has many interpretations and fuels debates for generations to come, I admit.


So it's probably important for me to hear that because the the prejudice that I had in mind when asking this question was I was thinking about some fields that have have taken kind of a post-modernist turn in the last few decades, like comparative literature or probably some aspects of sociology as well. Continental philosophy for sure. And I've always sort of had this like emperor has no clothes sneaking suspicion in the back of my mind about a lot of these fields that, like, it's not just that they're dense and and hard for outsiders to understand, but that even the insiders within the fields aren't actually communicating with each other.


They're sort of giving the semblance of communicating with each other while just sort of performing obscurantism at each other or something. But that's a very hard thing to to prove in any kind of conclusive way, because the response can always be while you just don't understand. And, you know, we say we understand each other and so how can you really argue with that? And in fact, I wrote one of my earliest blog posts was this kind of half baked idea about how you might demonstrate this using information theory.


If you could show that words in a field like in comparative literature or some other sort of more post-modern field were being used in extremely different ways within the field, perhaps that would be a way to show that people weren't effectively communicating because they were using words so differently.


Yes. So this is actually precisely how we measure it. We basically take the synonyms substitution entropy over context, which is exactly way I was.


Yeah. So this is this is ah, this is are measure of ambiguity because I mean, that's precisely what it means. It means that if conditional on the context, you have no idea what this word means because it's substituted with every other possible meaning equally.


And this and so it does turn out that that's more intense in the humanities. We haven't really looked at the time trend. I think. I mean, it's you know, I certainly believe that it's possible for one kid to have, you know, an excess of obscurantism. But it certainly appears that there is a kind of integrating benefit of some level of ambiguity, that things that end up being more ambiguous than their peers, wherever their peers are. And this ends up being true not just like as a whole, but within every particular field that we've studied.


Those things that are more ambiguous end up drawing together the things that end up building on them into a field, if you will.


OK, so I got some validation there and also some correction that the thing that I was assuming was he was wholly bad has an upside as well. So that's probably good, right?


Well, it feels I mean, it's some of you know, when politicians speak as ambiguously as they do and we, you know, have a cynical take on this, as you know, that, you know, basically they're just trying to curry to the Metty audiences and garner as many votes as they possibly can so that they can gain the privileges of office. You know, that function of kind of drawing people together through that ambiguity around these shared symbols may actually perform some function.


That's that's integrative.


And so so we can leave it as an exercise to the listeners if they want to, to think that postmodernism is is doing the good kind of ambiguous or the bad kind of ambiguous an exercise.


It's the next study. Yes. All right, well, I think this concludes this section of the rationally speaking podcast, so let's move on now to the rationally speaking tech.


Welcome back. Every episode, we invite our guest on, rationally speaking to introduce a pick for the episode that's a book or article or website or something that has influenced his or her thinking in it in an interesting way. So, James, what's your pick for today's episode?


OK, I have two picks. One is a Science magazine article from 2009 by Michael Schmidt and Hod Lipson called Distilling Free Form Natural Laws from Experimental Data. And the second is an article in the Proceedings of the National Academy of Sciences by Charles Kemp and Joshua Tennenbaum called the discovery of structural form from 2008. And the way in which these articles are similar is they both basically take just kind of raw data from the world and they throw it at this kind of machine learning robotic algorithm, Bayesian thinker.


They both have different forms and then they discover enormous amounts about really the structure or the form of those things. So in the case of of Schmidt and loops in space, they take these very quasi random two inch pendulum movements and they basically throw it at this machine that induces and produces things that look like the manifold equations, you know, the ways in which a physicist might draw those out themselves if they were theoretically articulating them. Kempen Tennenbaum, take association data, email data from the Bush administration, and they take association data between the characteristics of various animals and voting data between judges and the Supreme Court.


And they induce what the form is that characterizes that data is that a continuum is in the case of the Supreme Court, or is it a tree in the case of the the Bush administration, or is that another kind of organization? And what's interesting about these pieces is, at least for me, the way they've influenced my thinking is that because of the presence of these assumptions and these and these ghost theories and these heuristics with the kind of explosion of increasing data, physical and social data from sensors and also archival data, that's kind of extractable.


I'm really interested in approaches that that weaken the assumptions that we put into our analysis and in some senses that allow us to reveal, based on last year's data, what the assumptions were that led people to find the structures that they found in the data that they had available to them. So in some sense, these kind of automative, automated approaches to analysis also allow us to kind of reveal our biases to ourselves and to some degree, overcome them.


Interesting. Wouldn't there still be biases built into the way that we set up these algorithms that are mining data, or is it just that they're going to be different biases? And so it's useful to have different approaches to investigating the world that are biased in different ways so that we can notice those differences?


Well, I mean, when you when you have more data, you can have weaker models. When you have less data, you require stronger models. So I would say that these are weaker models, you know, so they they really exploit the data and allow a much wider range of possible answers to emerge than than many of the data, many of the models from before. So I think it's not just that they're different, although I think I think the diversity of questions is important, but I think it's also important for us to systematically try to, where possible, create the explanations that give us the most insight with the fewest assumptions.


Excellent. Cool. Well, we'll link to both of those papers. I like that you have you came up with a a thematically related pair of text. That doesn't happen often enough and rationally speaking, so. Well, we'll link to both of those as well as to your website. And maybe I'll even throw in a link to my old blog posts in which I came up with the idea of measuring entropy in different fields in order to expose postmodernism.


Please do. I'll give you some credit. That's excellent.


Thank you, James. Thank you so much for joining us. It's been a fascinating conversation. My pleasure, Julia. Thanks.


This concludes another episode of Rationally Speaking. Join us next time for more explorations on the borderlands between reason and nonsense. The rationally speaking podcast is presented by New York City skeptics for program notes, links, and to get involved in an online conversation about this and other episodes, please visit rationally speaking podcast Dog. This podcast is produced by Benneton and recorded in the heart of Greenwich Village, New York. Our theme, Truth by Todd Rundgren, is used by permission.


Thank you for listening.