AI, Papermills & Research Integrity with Adam Day

About the podcast

‍

Real stories. Honest reflections. Conversations about leadership, change, and the work that shapes our world.

‍

Hosted by Sowmya Mahadevan, Chief Orchestrator at Kriyadocs, The Publisherspeak Podcast features interviews with people making meaningful moves—inside and outside the world of scholarly publishing. From academic leaders and publishing professionals to coaches and change-makers, each episode goes beyond job titles to explore personal journeys and the mindsets behind impactful work.

‍

In this episode

‍

Host Sowmya Mahadevan sits down with Adam Day, founder and CEO of Clear Skies, to talk about research integrity and the tools reshaping how publishers safeguard trust in the scholarly record.

‍

Adam shares his career journey from editorial roles to data science and entrepreneurship, the story behind creating Papermill Alarm and Oversight, and what it takes to build tools that tackle misconduct at scale. He offers a candid look at the frontline of research integrity and why the future of publishing depends on innovation, vigilance, and collaboration.

‍

Dive into the conversation below or watch the episode here.

‍

Full conversation

‍

Sowmya: Hi, Adam. It's wonderful to have you on The Publisherspeak Podcast. Thank you for taking time to be on our podcast today. I'm really looking forward to hearing a lot about you. But first of all, I was looking at your LinkedIn profile and you started off your career as a marketing professional, then you went into publishing. You'd done a bunch of things in publishing. I see that you've been an editor, you've been a publisher, and then you became a data scientist, and then you're a founder, an entrepreneur, and solving problems for publishing. Talk a little bit about how this happened. How did this carrier journey actually happen for you.

‍

Adam: Yeah, well, thank you very much for having me. It all sort of ties together to one thing, which is machine learning. As a physics undergraduate, I studied machine learning as a undergraduate project. So, I spent a huge amount of time really working deeply with data and working with code. Then I went into this marketing role, which was essentially in a small team running a small company, which I really loved. I really loved being able to kind of see the entire operation that was going on and sort of be a part of influencing it. But one thing that I realized when we were in this company is that we used to work in a little factory and we made these scientific components, is that there was sort of like a pattern to how they were made and how they were costed. And if you just put certain inputs in, you could work out what it was going to cost to make very easily without having to do any manual sums or, you know, try and cost it out or anything like that. And I built this sort of simple machine learning model that would do that. And I think that was maybe a bit of a weird thing to do.

‍

But then I went into publishing and one of the first things I did was worked out that you could just match up similar papers using machine learning, and then you wouldn't have to search for referees. You could just find similar papers and work out that the author of a similar paper should be a referee for whatever paper you're looking at. And I don't think anyone was very excited by the idea at that time. Machine learning was not something that people really talked about in those days. I think gradually, over time, I was proven right that machine learning is a good thing and does work. But that was what got me into data science. And I did a lot of machine learning as a data scientist at sage. And that's a really big part of what Clear Skies does as well, which is we take data, which describes a problem, we teach an algorithm to recognize whatever that particular thing is, and that's what we use to build our services.

‍

Sowmya: Fantastic. I mean, that's an incredible journey. But I can see that you're rooted in machine learning and making sure that you're able to get a machine to make sense out of data. That seems to have been a very common thread through your entire career. Is there any story behind the name Clear Skies, but anything as to why you named your company Clear Skies?

‍

Adam: Well, I used to fly gliders as a hobby. And there's one thing about flying gliders, which actually it's sort of changed since I used to fly them. But it used to be that you didn't require any kind of paperwork. You didn't actually even need a license to fly one. It did help to learn, though. It's worth. It's worth learning before just having a go at it. But once you're sort of up in the air, as long as there are no clouds, as long as you've got clear skies, you can fly anywhere you want. And that felt really great. That's a really great feeling, being able to do that. And I think that was kind of the feeling I wanted people to have when they thought about the company as well. And so I think it's just got a very kind of positive kind of vibe about it.

‍

Sowmya: That's wonderful. I think there's a really nice story to Clear Skies and the background to it, but it kind of means that you can see be a lot more clearer. But I think in your journey where you said you did dabble in machine learning for a while, but then it wasn't necessarily something that took off on day one, but was it a point where you thought, okay, you know what, now I have to step out of what I'm doing and actually do this more full time and establish clear size. What gave you that sort of a confidence? Did you see some sort of a transformation in the publishing world, some problem that was coming at it? What was that that gave you that extra push to say, okay, you know what, now it's my time to go ahead and establish this, because I do see a market for it.

‍

Adam: Yeah, I think it's actually, it's a common misconception, I think people have, that Clear Skies was started to do research integrity, paper mill detection, things like that. That's certainly what we're known for and something that we do extremely well.

‍

But the reason it was started was because actually the whole time I've been involved in publishing, I felt like there's something missing which is really being able to understand peer review and how well it's being done. If you ask any publisher, say, well, you know, what are your peer review standards like? Oh, great, fantastic. We're the best at peer review. And the thing is that they don't actually know because they've never been able to compare their data with someone else's data. And they never would have been able to do that because the data is generally considered to be quite highly sensitive and not something that you would necessarily share, especially not with a competitor.

‍

So, the original idea behind Clear Skies was like, well, if we can make that comparison, we don't have to share the data. We can do the analysis and give people consistent feedback on how well they're doing with peer reviews. So, things like receipt to first decision time, that would be a really standard metric that a lot of publishers use. But you can calculate that metric in tons of different ways.

‍

There's all sorts of different ways you can do that. And I thought, well, even if we do something that simple and we just calculate that one metric very simply, but consistently across everyone. Then we've given everyone something that they can use that is comparable to other publishers, other journals, and then they can say, okay, this is the receipt to first decision time. And therefore we know how we're doing in terms of processing times. We can actually measure ourselves. So I think that was really where it started. It was like, we need to have a better understanding of the peer review system. And from a publisher's point of view, they need to have feedback on how their processes are working and understand when there might be scope to an opportunity to improve.

‍

Sowmya: You have worn a lot of hats within the publishing space, right? You've been on the editor side, you've been analyzing data for a while, and I think you've been. You've been also a publisher in one of your previous experiences. How much of that experience you feel has enriched you in kind of understanding this domain a little bit more in order to see, was that helpful to see some of the challenges that were there, close at hand? Have you dealt with such problems? And how much of that did it really guide you when you were trying to branch off and say, okay, let's solve these problems at Clear Skies?

‍

Adam: Yeah. So I'd say there are two really key skills at the heart of what Clear Skies does, and one of them is actually understanding the way that misconduct investigation works. So that was something I did a lot as an editor. But there are some real nuances in there that are very difficult to understand from the outside if you haven't actually done this. Probably one of the biggest surprises for people is that we spend more time thinking about how to treat people who actually have done something wrong fairly. Because it's very easy to just say, somebody has done something which constitutes research fraud, therefore they're a bad person, bad actor. And we don't want to support them and we don't want them to. We don't care if they get into trouble. And actually, really we do.

‍

We want to make sure that everyone is treated with respect and treated fairly. And there can be all sorts of reasons why someone does something that would breach our standard interpretation of research ethics. And we really have to try and be understanding. So that's one of the skills. One of the skills is really understanding all of that nuance, which is very difficult to do if you haven't actually really done the groundwork. The other side, though, is the data, because the kind of data that exists is also extremely nuanced. So, stores of scholarly metadata, we have great services, things like Crossref.

‍

Crossref's incredible. I mean, how many industries have a service like that where they can get metadata from everything they're that's happening? Then you've got other things like OpenAllux, which kind of builds on Crossref and brings in other data sources. You've got things like orcid. But all of those data sources have certain nuances that you have to understand. Like for example, Crossref is usually fed by publishers. So publishers decide what goes in there. Different publishers decide to put different things in. So there is inconsistency across the whole data set. Sometimes references are missing, sometimes abstracts are missing. And so understanding those kind of limitations that those put on the tools that you can build is really critical. So, I think that, I think that all of that has really helped to sort of bring, bring things together and make something that works. That's what's been really critical.

‍

Sowmya: Yeah, no, I think it's very, very true. Publishing is a very detail oriented and a very nuanced field. It's no two publishers have the same format. No two publishers and every publisher brings in something a little different. But that also means that there are, there's a lot of different variations in kind of what happens. And there's not an industry standard. I think that's one of the challenges that as an industry that we face. But I think your experience being on as within the publishing ecosystems probably really enriched your experience as a problem solver for the publishing world as well. Just for our listeners, I know you have a Papermill Alarm as one of your products and oversight as one of your products. Talk to us a little bit about what are the, what do these products do. I know Clear Skies is all around research integrity. But sake of our listeners, give us a little bit of background about what problems do you actually solve for the publishers.

‍

Adam: Yeah, so I think one of the really key things with Papermill Alarm is that there were various ways that were known to try and identify paper mills. Some of them were better than others. This is going back to sort of 20, 22 or so. I had a lot of concerns because people would sometimes say, oh well, we'll see these papers coming in with webmail addresses on them. Now it's true paper mill papers often have webmail addresses on them. However, a lot of perfectly good papers have webmail addresses on them. And also we've seen cases where paper mills have been able to mint institutional addresses and use those. So it's really, really weak as a way to determine if there's a problem or not, and it can cause all sorts of problems to try and use that. Then you have different methods that will work, like, say, in one field and not another field, or maybe in one journal and not another journal.

‍

The great thing that the Papermill Alarm did was build a method that works generally across everything. And the initial first pass at it worked on PubMed, so it wouldn't do certain fields. It wouldn't do physics, for example. But a few months later, we updated that to cover everything. And I think that was a really important step to be able to say, okay, we have a general tool that will work on anything. That said we had about I think it was about 60 different methods that had been either tested or were sort of on a backlog for testing at the time and that was one of them. And we've now added at least a couple of dozen more. They're all things being sort of worked through different stages, so it's not just been like, this was a good thing to do. It was also something that could be built on, as we can add these other methods on top and build something that's a much more complete service. It now covers a lot of different things that are useful in different ways. So, it's not all paper mills. Also, the definition of aper mill, I think, is a little bit vague. Different people imagine different things when you say paper mill. So, I think we need to start being a bit more precise about what we mean. Yeah, it was a good place to start because it was a good thing to build on. We could get to where we are now from, from that as a basis.

‍

Sowmya: And I think your product Oversight won the gold category in the EPIC Awards Integrity tools category. Congratulations on that. That's pretty impressive accomplishment. What does Oversight do? What problem does that solve?

‍

Adam: So, Oversight again goes back to a very long way, which was this same point about needing to understand peer review from a more neutral position. But the basic idea was to have an index of research integrity. And I might be a bit more precise about that. We should say it's not really about integrity. I think of integrity as a character trait. It's about doing the right thing. And so when people, we talk about research integrity, we're talking about or at least I'm talking about, because different people, different meanings.

‍

But I'm talking about is somebody doing this honestly, openly, clearly, transparently, those sorts of things. What oversight is much more about is actually about research standards. It's about how well is peer review being done by this journal? How well is research being done by this institution? And what we're not looking at is the best work that appears in a journal or comes out of an institution.

‍

What we're looking for is an absence of bad things. And I think that's an idea that people weren't very receptive to some time ago. But now it's very clear that this is necessary and important because we need to be able to see where our standards being upheld and where we can see that It's a really valuable thing to know because we know that the standards at an institution are really good or the standards at a publisher are really good.

‍

One funny story was going back to the kind of start of Clear Skies. I went around a lot of publishers and just spoke to them, just had chats with them and said, what do you think about these kind of problems that we have now? What do you think about paper mills? And the really funny thing was that there were some publishers who didn't know anything about paper mills. And the usual reason for that was they would say, oh, well, we've got really good peer review. And we think that would probably put anyone off. If they've got a fake manuscript or whatever, they won't be able to get in here because our peer review is so strong that we will kick anything like that out and they'll be wasting their time for even trying. And I would look at the data because I would have data on essentially the data that's in oversight now. I'd have data on everyone and the publishers that said that the publishers who seemed to know the least about paper mills were the ones that were handling them the best because they were really strong on peer review and they were keeping them out. And I like to think of something like paper mills as a symptom of a problem, not the problem.

‍

The problem is that bad science is happening for whatever reason. There are loads of reasons why it's happening, but one symptom of that is that we see paper mills. Paper mills are a bit like a canary in the coal mine. If we find them in a journal, then it's a sign that the journalised problems with peer review. Not that the paper mill itself is the problem, the problem is that the journal is unable to keep them out. And so that's really what oversight's about. We want to be able to help the journal to manage that standard and keep those problematic articles out and see how they're doing and be able to actually see all the data around. Here's how we're performing, here's how much we're keeping these things out, and potentially here's a lot of the other data that we always wanted to show, like how fast are we doing peer review and things like that.

‍

Sowmya: Let's talk a little bit about AI now because it's always a fun topic to talk about, especially when it comes to publishing. I know there's a lot of talk around, should AI be allowed in research and science or not? I think this conversation has gone way past whether it should be allowed or shouldn't be allowed. I think AI has now become the way of life. So, I think there was six months back I don't think I had a ChatGPT account, but now it's like my day starts with having some conversation or the other with ChatGPT and figuring something out or putting a presentation together. So, AI has become so super ubiquitous in terms of how we operate and I'm sure that's going to spill over into how science is also happening. So, this whole conversation around AI generated text, should that be allowed or not allowed? What is your take on it? I mean, I feel like, is that even an appropriate conversation to have? But on the other side, AI can also fabricate and it can make this whole concept of bad science a lot more easier. But that's the danger of that. So, what are your takes on AI generated science? AI generated text creeping into publishing?

‍

Adam: It's difficult to sort of have just one, one thought about it. So, what I will say is that I do a lot of coding and coding has changed enormously since LLM based tools became part of the process. And I'd say that the best way I like to think about that is like autocomplete. So, autocomplete used to be you're typing a sentence and it would, you know, your autocomplete would say, here's the rest of your sentence and you go, yeah, that's what I was going to type anyway. So, you saved me a few seconds and the autocomplete powers of GenAI tools for coding are really incredible. Absolutely incredible.

‍

So, I've had the experience of being about to start typing something, not even having pressed anything and it will give me the next half hour of my work and say, this is what you were going to type. And I'll look at it and think, no, that's better than what I was going to type. That's, it's incredibly useful. However, there is a massive caveat to that, which is that sometimes it's wrong. So, if you're an experienced coder and you can look at it and say, oh, you got it wrong, it's fine, you can handle it and you can say, here's the right way to type it and get it, you know, fix it. If you are not an experienced coder, that's a massive danger area because it can hallucinate garbage into your work. So, we are really careful with how those tools get used.

‍

That makes me think though, about everything else. So, when we're talking about publishing, we're talking about the scientific process. It's not just the article, it's the whole thing. It's somebody will get all the way from. You can do ideation with an LLM. You can say, I've been looking for ideas for my research paper and you can get those ideas from ChatGPT. Then you do your literature search. You can do all of that with Gen tools. Then you can do your summarization of that literature and everything you learn from the literature can actually come through that lens of an AI tool. Then you can go and do your research and do your coding that would come up in that research. And if you don't know how to do the coding and you get the AI to do it for you, you're going to have all those kinds of issues.

‍

But those sorts of issues can come in anywhere along that chain. So, when we get to the point of writing up the paper, the risks from AI are a small part of the whole. The whole process of making science now has these risks around it, which we all have to understand and we all have to learn how to deal with them when it comes to the article itself. So, I actually just put out a blog post today about a tool that we've been using called Pangram. And Pangram claims to be GenAI detection that actually works. And based on the work that I've done testing it, I would say that's right. It does seem to work incredibly well.

‍

It has a very low false positive rate where we are able to measure that. Do you want to take a guess? So, I ran a random sample of abstracts from articles that were published in 2025 year to date. Do you want to take a guess at what percentage of them had signs of GenAI in them?

‍

Sowmya: I would hazard a guess of upwards of, say, 60%?

‍

Adam: Okay, well that's higher than what we found. We found 35%.

‍

Sowmya: Okay. Especially from an abstraction standpoint because I would have assumed that even if the paper was written by a human, the abstract generation the LLMs are really good at it. Right?

‍

Adam: Yeah. So, that might also be one source of bias in that as a check. But it seems that a very, very high proportion of articles have some sort of GenAI input. And I would conclude from that is because GenAI is being used to help write the article and the article itself is perfectly genuine. I think GenAI itself is not going to be a sign of research misconduct. But again, we can come back to sort of nuance. There could be a nuanced way to look at that. It could be that we find that there are certain GenAI patterns that appear when we have other problems as well. So, for the moment, the tools that we have are generally based around understanding an article in context. So, mostly this is network analysis. It's about saying this is what this article is connected to and that's a very difficult thing to fake your way around. With GenAI, we don't want to focus on the content. Content's useful But if we focus on content, then we're focusing on something that a fraudster can control. So, our methods are all about understanding the context around an article. That's a really hard thing to control, really hard to game your way around. So, I think that for the future understanding that is going to remain the most important thing for detecting problems. However, GenAI detection as of right now is actually looking quite bright. This tool that we are, that we blogged about from Pangram, it looks very good. It seems to have very good success at detecting GenAI. However, I don't think that that is a strong sign of misconduct, but it could still be useful context in itself, part of that overall picture.

‍

Sowmya: No, absolutely. I look forward to your further blogs and posts about how this Pangram shapes up. It was wonderful to have a very spirited conversation at Publisherspeak earlier this year and thanks to you for joining us at that conference as well. And I think one of the topics that was that came up was the future of publishing.

‍

Adam: Right.

‍

Sowmya: So, with so many headwinds coming, both in terms of AI and with respect to the funding situation and things like that, what are your thoughts on where do you see the publishing industry headed, what are you excited about and what are you sort of worried about with respect to the this industry that we are all a part of?

‍

Adam: I mean, it's all about people and that's kind of what we've always thought from the start. We wanted to come up with a company slogan for Clear Skies and the one that we kept coming back to was your partner in research integrity.

‍

So, it's about people, it's about people working together. The thing that the publishing industry has always had, that is, I think, supported it all this time, has been that publishers can say when they've got a piece of research, that a human being has looked at that research and said that it looks sound. And this is an imperfect way to determine whether research is sound or not. However, it is the best way that we have or likely to have in the future.

‍

Now, as we run into an era where I think we can expect that there will be large amounts of garbage generated by AI that will be everywhere, right? It will be in our news, it will be in our social media. I think as we move into that space, and of course it will be in science as well. But as we move into that space, that function of being able to extract the truth from that slop is going to be increasingly important. And I think publishers have a lot to worry about right now. There's worries about disintermediation, which have been discussed at length. There are worries about library funding. There are worries about business model sustainability of business models. But I do think that one thing has been incredibly powerful at making this a resilient industry. So, I think sticking with that and focusing on that is going to work out very well.

‍

There was a point made some years ago in the scholarly kitchen about how the web was invented at CERN for the purpose of communicating scientific research. That's what the web is for originally. And the web has disrupted every single industry on the planet except the one for communicating scientific research. And there's something really profound in that. There's something that makes this industry resilient. It's being able to qualify this is the truth from whatever else there is. It's peer review and it's that editorial function that exists in publishers. I think that actually it's very easy to look at this current time and say there's a whole load of uncertainty and there absolutely is, but that thing is very fundamental and I can only see the importance of it increasing over time. So, that's what I think. I think there's a lot to be optimistic about, certainly a lot to be careful about, but we're building the tools to help with a lot of that.

‍

Sowmya: That's a very, very positive note to sort of end this podcast on, because I think you're absolutely right. I think it's the humans, the authors, the people who are part of the community that make publishing what it is. And if it's the peer review. And with more content being generated easily, I think the value of that human interaction is actually going to only go up and not go down. I think that's a very, very valid, valuable point that you make there. Any last words for our audience, Adam?

‍

Adam: I guess the one thing I would say is that going into building something like this, I mean, I started out, um, entirely on my own. I just had things that I wanted to build, and I just went ahead and built them. But the reality of building anything is that it's impossible to do it on your own, and it requires having other people who support you. So, I'm very grateful to, for example, Kriyadocs for supporting us in a number of ways, both in partnerships and of course, inviting us to this podcast, but also to a lot of other people, people like Adrian Stanley, who've been great supporter of Clear Skies and very valued advisor as well.

‍

But I think that is probably the main message that I've always wanted to get across is that fundamentally all of this is about people. We build AI tools, but they're there to support people. And I think that one of the sort of meanings of the word Oversight, we mean human oversight. We mean we want this tool that allows a human being to really get a grip on a huge volume of data without having to actually understand all those nuances and really just be able to intuitively understand it. So, yeah, I guess that would be the final thing. We're really very grateful for everyone who's helped.

‍

Sowmya: On that note, thank you so much, Adam, for joining The Publisherspeak Podcast. Some really interesting takeaways for me here, and I think you shed a light on what's coming up, what we as an industry can look forward to. And it is the human in the loop that is really going to make the difference. It's all about the people. And that's what publishing is all about. Thank you so much for joining us today.

‍

Adam: Thank you.

‍

Explore The Publisherspeak Podcast on Spotify here.

Papermills, AI, and the Fight for Trust in Publishing: A Conversation with Adam Day

About the podcast

In this episode

Full conversation

Table of contents

Frequently asked questions

Latest from the blog

The Real Engine Behind Scaling Teams: Corporate Culture

Scaling teams goes beyond processes — culture is the real engine. Stanley J reflects on key takeaways from the ANI meetup and explores how leadership and everyday behaviors shape truly scalable teams.

OASPA 2025: The Who, The How, The Why and The When

Our Growth Director, Jason de Boer, shares his experience at OASPA 2025 and reflects on the thought-provoking panel discussions that shaped the event.

Kriyadocs at Frankfurt Book Fair 2025

We are excited to return to the Frankfurt Book Fair (15-19 October 2025)!

Ready to witness what agility
in publishing looks like?

Papermills, AI, and the Fight for Trust in Publishing: A Conversation with Adam Day

About the podcast

In this episode

Full conversation

Table of contents

Frequently asked questions

Latest from the blog

The Real Engine Behind Scaling Teams: Corporate Culture

Scaling teams goes beyond processes — culture is the real engine. Stanley J reflects on key takeaways from the ANI meetup and explores how leadership and everyday behaviors shape truly scalable teams.

OASPA 2025: The Who, The How, The Why and The When

Our Growth Director, Jason de Boer, shares his experience at OASPA 2025 and reflects on the thought-provoking panel discussions that shaped the event.

Kriyadocs at Frankfurt Book Fair 2025

We are excited to return to the Frankfurt Book Fair (15-19 October 2025)!

Ready to witness what agility in publishing looks like?

Ready to witness what agility
in publishing looks like?