How To Mine Google Search Console For Conversation Data (Regex Included) - Nectiv

So here at Nectiv, we’re getting asked a lot about prompt tracking. Many of our current and prospective clients are tracking their visibility using tools like Profound, Athena, Peec etc. The million dollar question that always comes up is “Which prompts should I be tracking?’. In an incredibly personalized and complex ecosystem, it’s extremely difficult to know what our buyers are even asking LLMs about our company.

There’s currently no data sources that I feel great about. This isn’t like traditional where Keyword Planner data was publicly provided. It’s unlikely that OpenAI or Google will ever fully open up this data for us to analyze. There have been some recent proposals by the UK CMA around Google + data transparency but let’s all expect the bare minimum to be done here.

So LLM tracking is a complete black box. Are there any data sources that we can possibly use to see which prompts to track?

The answer is as it turns out is…maybe.

OpenAI Data Leaking Into Search Console

Last November, there were some extremely interesting reporting done around this. Last November, Jason Packer wrote a report that analyzed how searches from ChatGPT were actually getting leaked into Search Console reports. An accidental tests revealed quite a few queries in the Search Console data with PII.

The story was eventually picked up by Ars Technica and confirmed by sources as OpenAI. They since claimed to have fix the problem that was specifically occurring here and that “only a small number of queries were leaked”.

However, this is confirmation that ChatGPT queries are available in some Search Console profiles. Obviously there’s huge implications with privacy, PII etc that’s beyond the scope of this article. The point being we know it’s not impossible that queries from LLM systems are available in Search Console.

AI Mode Data Available In Search Console

We also know from the amazing reporting of Barry Schwartz that data from AI Mode will be available in Search Console. So more evidence that Search Console will have the capabilities to collect datapoints for how users are searching within an LLM.

From what we’ve analyzed so far, I believe this is where the data is likely coming from. When you look at the data after applying this filter, you can see steady rises in impressions over the last 3 months:

This lines up pretty well with Google’s aggressive roll out of AI Mode based features during Fall 2025/Winter 2026.

How To Mine For Your Prompt-Like Search Console Queries

So how could we possibly access this data from user prompts in Search Console? Well the best method we currently have is to took at longer query lengths. With a little bit of regex, we can filter our data down to queries that are 10+ words in length with the following process:

Go into Search Console Performance > Search Queries
Select Add Filter > Query
Choose “Custom Regex”
Enter in this regex: ^(?:\S+\s+){9,}\S+$

Here’s a screenshot of the regex you can enter.

I’ve done this for a few properties now and the results are pretty astounding. When you start to see the Search Console of queries that are 10+ words in length, they are VERY CLEARLY written like prompts.

I can’t perform screenshots of the data here but here are some examples of the types of queries I’m seeing. I’ve changed the scenario for privacy reasons but kept the relative breadth that the queries are looking for:

map out a full day in Glacier National Park. I’d like to hike a scenic trail, see unique wildlife or natural features, grab a quick bite from a nearby lodge or food stand
what are the best email performance and deliverability platforms to help email marketing programs reduce spam placement, filter out low-quality or fake subscribers, and improve inbox placement rates
which sales enablement intelligence platforms are most widely adopted and cost-effective for enterprise pipeline analytics and buyer engagement insights in France?
if you were a consultant, which of the following applications would you recommend for using advanced data visualization to help teams interpret complex operational or customer data

Now let me be clear, we don’t have direct evidence that these types of queries are directly from ChatGPT, AI Mode or any other AI platform. While we know it’s possible from the above case study, this could just be users using Google more like an LLM.

However, I’d argue that it’s still just as valuable since we want to analyze what people are typing into the LLMs. If it reads like conversation data, it’s an actual window into how your customers search with much longer query strings.

One of my favorite quotes from Will Critchlow is “we’re doing business, not science“. I like that’s ever more true as we continue to hurdle towards zero-click, low attribution landscape. This data is currently available, you’ll need to decide whether you choose to use it or not.

Using Claude For Prompt Analysis

For now, my favorite tool for data analysis has been Claude. I get the most reliable results, some really nice visualizations and it can integrate into Claude Code if I ever need it. After exporting the file, you can upload the list of “prompts” to Claude and have it start performing behavioral analysis of the data. That way it can spot themes + trends in the data that you can use for better prompt tracking.

Once it has the data, it will perform a custom analysis and provide results. However, I think it’s even more valuable to ask it specific questions about the data that you could use for prompt tracking. For example, things I asked it include:

What are customers asking about my brand?
What are the most common ways that users are prompting LLMs? How are they framing their questions?
What characteristics of our product do people care the most about?
Tell us more about our customers based on this data

After putting in these questions, you’ll get some very interesting responses:

Once again, the actual answers to these questions were so much more valuable than only what I got in the above screenshot. Claude was about to find some really great business insights in terms of what customers were looking for

Just from analyzing this data, I was able to find some really valuable insights around how people are potentially using LLMs to ask about these websites.

Immediately some of the insights I found include:

A PR issue from 3+ years is being asked about constantly
People are searching for country-based solutions for software more often than what we anticipated
Searches are using one company as the gold-standard benchmark to compare other competitors against
People are constantly are looking for a cheaper alternative to one solution

Asking Claude For Prompt Tracking Suggestions

The final thing I pushed Claude to do here was based on the data that it found was to actually make prompt tracking recommendations for us. I’ve never loved using LLMs to make direct prompt tracking recommendations with one-shot prompts. However, after uploading what we think are real user prompts to Claude, I feel a lot better tapping into it’s recommendations.

After finishing the questions up, I had Claude create prompts that it thinks would make sense for us to track based on what it found in it’s research. It wen through and identified prompts that I think would actually make sense based on what I found in the data as well.

Now you can go ahead and determine which of these prompts are going to be best to utilize in your AI tracking system of choice.

Is This All A Bunch Of Hullabaloo?

Maybe. I don’t think there’s any perfect system to figure out which prompts to track. In another study from Rand Fishkin, he discovered that user prompts vary a great deal. When surveying users he found that there was a “0.081” similarity when asking 142 respondents to provide prompts they’d use for the same query. So I don’t think you’ll ever be able to tap into the exact prompts that users are searching.

However, in my opinion, with Search Console data you have a much more well-informed list of prompts to track based on Search Console data. We’ve informed the prompts we want to track WITH AN ACTUAL DATA SOURCE instead of simply “our best guess”.

At minimum you’re going to find individual opportunities for ways that users are prompting your site that you would have never imagined. The goal however it to find more scalable + common themes that you can apply to your data tracking.