While NotebookLM was till now available only on the web, the release of apps for Android phones and the Apple iPhone, has a relevance of timing too — the release of a Video Overviews option for a user’s individual notebooks within the app; a feature that arrives soon after Audio Overviews, both use-cases that work better on a user’s phone, than a typical desktop. Johnson spoke with HT on the sidelines of the I/O 2025 conference, about user adoption of this unique and still one of its kind artificial intelligence (AI) platform, the thought behind Video Overviews, training NotebookLM’s underlying models and countering hallucinations, as well as the roadmap for the months that lie ahead. Edited excerpts.
Also Read:Critical steps to unlock our vision for a universal AI assistant: Google DeepMind CEO Demis Hassabis
Q. NotebookLM’s much awaited Android and iOS apps arrived yesterday. How do you see this development have a bearing on user adoption and basically how it’ll be used on a phone? Any specific features you feel users have the chance to now fully explore?
Steven Johnson: You’d be surprised to know that from the second we released Audio Overviews in September, people were asking for an app so that they can play the audio on their phone when they’re going somewhere and also if they have quick questions about something. We had, a powerful web app that was fine, but people clearly wanted to be able to get all the features in a proper app. There was just a lot of interest from our users. But when Audio Overviews launched, we had seven full time engineers. We were tiny by Google standards. We had nobody to do mobile. We basically built a V1 of the app that was ready by the end of last year, but we decided to not rush this and make sure that it’s fully feature complete.
There are a lot of things you can do on the website that you can’t do in the mobile app. But we wanted it to be really stable, a really nice looking, functional app. So we took a little more time to get it right. We recently started supporting images, which means users can actually have images of the source. I want to take a picture of this building that I’m designing and I’m an architect and I then want to be able to talk to Notebook about the design of this building — basically all those things that you can do with multimodal models. It is really exciting in terms of the overall roadmap.
We also announced that the new Video Overviews feature is coming, which is built by the same team that did audio overviews. It’s sort of a mini lecture with slides, with images, and charts taken from your sources, and single person speaking. Since it’s not a two person conversation format, it’s an efficient way of getting you the information, where the visual component adds to it. We’re also starting to think about creating notebooks for people in advance, such as curating notebooks on different topics. One of the problems with NotebookLM is that when you come on day one, you don’t have any notebooks, and you’ve to create them yourself before the product becomes useful. We’re going to start rolling out pre-packaged notebooks on various topics. For NotebookLM as a platform for sharing information, there’s a lot of room to explore.
Also Read: Google I/O: Decoding Gemini app, AI in Search, Google Beam and Workspace updates
Q. The latest announcement is about Video Overviews landing on NotebookLM soon. Can you tell us a bit more about this, and which models underline this feature?
SJ: It’s basically all Gemini, because it’s not generating any actual motion video but taking static images from your slides to create a video in the sense of it showing images over a timeline with a voiceover. But we’re very interested in the amazing things that are possible. I believe there’s a lot in this current video overview platform, and there are a lot of features that can develop even before we get to full motion video. It’s a little different from audio overviews in that way, since it can generate a timeline, generate a sort of pull quote from your sources, generate an agenda, generate a contrasting visual and more.
Part of what we’re asking the model to do is it is given data sources and therefore given these building blocks, now what’s the best story you can tell or explanation you can curate to help someone understand this material. We can just keep adding more building blocks. We won’t just take a chart from your sources, but we’ll actually spontaneously generate a chart based on the facts in your sources. We can’t quite do that yet, but you know that’s coming. And so there may be ways to kind of like generate images or visuals like that will get more and more powerful over time. I believe we’re going to find that this video platform is actually the foundation of a lot of really important stuff that we do.
Q. NotebookLM is source centric, unlike broad knowledge based AI. Can you tell us about the technical mechanisms and strategies employed within NotebookLM’s architecture to effectively curb hallucinations, especially when dealing with complex or contradictory source materials?
SJ: The core proposition of NotebookLM from the beginning has been that you give us your sources and we will restrict the model to those. Basically, restrict the model to the information in those sources, which we think, is valuable for a variety of reasons. It’s also valuable to talk to a general interest, general purpose model, obviously. But we think there are a lot of use cases where you want to limit the knowledge of the model to the documents you give. That can be because you want to personalise it, or because you want a specific kind of expertise. That can also be because you’re working on a project that you want the model to be an expert in and you want to ground it in that. It also has the added advantage of reduced hallucinations when you actually put something directly into the context of the model. It’s just more accurate.
We have a state of the art citation system, and a user can always see where the information came from. You can use that as a way of exploring complex information. Simply follow the citation into the text and then you can read it right there. Among the things we’ve done from the beginning is to make sure your original documents are part of what you’re doing, and not just off in the background somewhere or ignored all-together. But it does leave open the possibility that if a user, puts in a series of documents explaining why the earth is flat, depending on how severe the statements are, the model will report those things according to the sources or it will say something like, according to your sources, the earth is flat, although that is not really generally considered to be true. But in general, we let the user define the truth in those sources and we’ve just decided that’s the most responsible way to do it.
Also Read: Google building Gemini to be a proactive, personal universal AI assistant
Q. How difficult is it to train and define this sort of a model? Simply because the data set is not in your control?
SJ: There was just a lot of training in the underlying Gemini model before it got to be used in NotebookLM. Because NotebookLM was part of Google’s AI program, there was a lot of attention on source grounding from the beginning with Gemini. But that’s something that happens at the model level, as part of the training process. The model must be good at sticking to the facts of documents that have been put into the context of the model. In general, from what I’ve seen, Gemini is the best model at this. We’re building Notebook at the right time, at the right tech company, and I’m very happy we have this model and not other models.
And that’s just something that goes into the training runs of the models, and the reinforcement learning processes is like teaching it how to stick to the facts of documents. By the time it gets to us on the Notebook team, what we are doing is basically writing system prompts to make sure that behaviour comes out and is, you know, as powerful as it can be.
Q. Given that users are uploading their personal and potentially sensitive research materials, data security and privacy are paramount. Can you give us a sense of the specific security measures and protocols NotebookLM has in place to protect user data?
SJ: Right now we have no ‘keep my data private’ setting toggle for on or off, because it’s all private. We currently do not use any of the user data to train the model in any way. If we ever change that, we would obviously give the user control over that. The only thing that can occasionally happen is if you ask a question and give negative feedback on the question’s response, a human might look at the question and answer pair to figure out what went wrong there. But there’s no training. We understand that people are putting their journals, even putting a book that they’re writing, into notebooks on NotebookLM. We want them to be assured that that information is not going to get in the training set of models.
Also Read: Tech Tonic | Why are AI companies this interested in Google Chrome?
Q. What next for NotebookLM? Are there key areas or functionalities you’d like to focus on, and is integration with other Google apps, on the roadmap?
SJ: In the early days, people were asking us internally, whether NotebookLM is for students, or authors? We insisted that it is students and yes, it is for authors, but it’s not just for them. This is a bigger platform, for anybody who’s working on a project that involves multiple documents where they have to synthesise information across those documents and make sense of things. We deliberately avoided focusing on one particular user, and were going to try and see this as a bigger thing. But that means that there’s just so many different directions we can go.
I personally would love to expand the tools for writing inside of NotebookLM. You can do some really clever writing in the chat, and we do have notes that you can copy and paste in there. But I’d like it to be a kind of notes editor with my sources in the background. With different kinds of studio outputs, like video overviews and audio overviews, you can imagine things such as interactive tutors that help you learn different topics. Maybe we can create a marketplace where people could sell notebooks on different topics. There are just so many things we want to do. The team has grown a lot, so we have more capacity.
In the last couple of months, we shipped a lot of new features such as mind maps, source discovery, and now the mobile apps. But it is hard to prioritise because we just have a lot of ambition. In terms of interactions with other Google apps, it obviously it would be nice to be able to jump back and forth between Docs, Drive and NotebookLM. We could do a much better job of it, but ay ear ago, we were a very small team and most people hadn’t heard of us. So it was hard to get the attention of anybody at Drive team, for example. Why would they spend time helping this little seven person startup inside of Google, but now people are answering our calls, so I think you’ll see more NotebookLM features. The connection between Gemini and Notebook will strengthen as well.
I think that people have really embraced this idea of having a tool dedicated to helping you understand things from complex documents, for example. It’s a new thing. Word processors help you organise your words and pages, and convey your ideas visually. But a tool that is really trying to help you understand whatever you’re trying to understand, is a new class of software in a way. Whether you’re a student or a knowledge worker or an author, users seem to get that and see the value. Our slogan right now is — understand anything, and whatever it is, throw it in there and we’ll help you understand it.
免責聲明:投資有風險,本文並非投資建議,以上內容不應被視為任何金融產品的購買或出售要約、建議或邀請,作者或其他用戶的任何相關討論、評論或帖子也不應被視為此類內容。本文僅供一般參考,不考慮您的個人投資目標、財務狀況或需求。TTM對信息的準確性和完整性不承擔任何責任或保證,投資者應自行研究並在投資前尋求專業建議。