Do you want/like AI bots crawling your Elements built sites?

Hi Elements Hosters,

I wanted to gauge your opinion on AI bots.

Currently we default to allowing AI bots to crawl your sites. This is mainly because a lot of people are using AI instead of traditional search nowadays to get answers to their questions, and AI wouldn’t be able to reference your site(s) as an authority if they are blocked, thus potentially reducing traffic to your site(s).

I’m wondering if we should switch our default to opt-in instead of out-out though.

What do y’all think, should we stick with allowing AI bots by default and block them on request, or block AI bots by default and allow them on request?

:thinking:

I’m not pro AI or anti AI but I would prefer to have AI blocked by default and allowed on request.

Jeff

I believe it should be open by default, if you’re selling a product or service you 100% want AI bots to know about you. If a customer was unaware that AI bots were blocked on their site it could really screw-up the potential of their business.

I think AI Bots should be on by default, with an easy way for users to opt-out.

Are these the same bots that steal photos for uncredited image creation?

Along the same lines, I do not believe any settings will stop AI bots from crawling a website.

No, I’ve noticed that AI hasn’t done so well at summarizing my site on search engines (it often mixes up my site with other completely unrelated sites.) I’m not so sold on the idea looking at how it digs up info on my projects or other people’s work.

AI is the new search engine. Block them at your own detriment. If you are not performing well in the AI summaries, I would suggest investigating what the bots want. I’m not going to lay it out, but what works is the same thing as has always worked and a brochure site is not going to cut it anymore for search or AI.

EDIT: I did add some things below. When I went back to look at my previous posts there have been some significant developments in the last 12 months.

@Flash 's post from last summer, SEO vs AI: is all different—but not, is a good read for those who are interested.

Thanks for the feedback everybody, I appreciate it. :heart_hands:

We’ll go ahead and leave the default policy as-is (Allow all AI Bots, Block on customer request), so this will be an opt-out policy (not an opt-in policy).

I’m working on opening up permissions on the managed Cloudflare accounts so customers can block these AI Bots by themselves if they want. In the meantime feel free to let me know if you want to block AI Bots on your site(s) and I can do that for you. :stop_sign: :robot:

Thanks for keeping the default open, Daniel — that lines up with what helps most small site owners right now. Adding some context for anyone reading this thread and wondering what running a website even looks like in the AI generation. Here’s the honest picture from where things sit in May 2026.

@chet Thanks for the mention. It’s funny, even that has changed. This is the current landscape.

The Current Best Practice

Google’s AI Overviews now appear on roughly 48% of tracked searches, up from about 31% a year ago. ChatGPT has around 800 million weekly users asking questions instead of running searches. Google traffic to publishers fell about 33% between November 2024 and November 2025. If your site only gets seen when someone types into Google and clicks a blue link, you’ve already lost ground.

The good news: site visitors who arrive from AI answers convert at much higher rates than typical organic visitors. The AI has pre-qualified them. They show up ready.

Whether to let AI bots in

Two camps, both reasonable:

  • Block AI bots if your content is your livelihood and you don’t want it used for training without payment. Cloudflare now offers a pay-per-crawl option as a middle ground.

  • Allow AI bots if you want to be cited inside ChatGPT, Perplexity, Claude, Gemini, and Google AI Overviews. Citations are the new front page.

For most folks here, allowing them in is the right call. Being invisible to AI search in 2026 is close to being invisible, full stop.

What a modern site needs

  1. Clean HTML that loads without JavaScript. AI crawlers do not render JS the way a browser does. Anything hidden behind tabs, accordions, or client-side rendering is invisible to them. Elements is solid here — static HTML output works in your favor.

  2. A working robots.txt that says what you mean. Many default setups block AI bots without the owner knowing. Audit yours.

  3. An llms.txt file at the root of your site. This is a Markdown file with a curated list of your most useful pages. The standard is young, but the cost to add one is near zero.

  4. Schema markup (JSON-LD) for articles, FAQs, how-tos, and products. This helps AI parse what each page is about.

  5. Semantic HTML — real H1s, H2s, H3s, lists, and strong tags. Question-based headings get cited about twice as often as statement headings.

  6. Fresh content. About 65% of AI bot hits go to pages updated within the past year. Anything older than six months without a real refresh tends to fade from citations.

  7. Named authorship and original content. Authors with real credentials and firsthand data get cited far more than anonymous SEO content.

Brochure, blog, or evergreen?

This is the question most people should ask themselves before building.

Brochure sites still have a place. If your goal is a calling card for clients who already know your name — a portfolio, a service page, a phone number, hours, a contact form — a brochure site works. It will get almost no AI visibility, and that’s fine if AI visibility is not your goal. Maintenance is light. Hosting is cheap. The site does its small job well.

Blog sites earn AI citations only when the writing has real substance. Thin SEO blogs full of generic FAQ answers are now what people call “AI slop” — AI can produce that same content faster than a person, so AI engines stopped citing it heavily. Listicles still do well (about 22% of AI Mode citations) when they offer real comparisons and named recommendations.

Evergreen content sites are the strongest play if you have expertise to share. Evergreen used to mean “write it once, walk away.” That model is gone. The new version is a living library: deep, original pieces on durable topics, refreshed every quarter or so, with current data and new examples added. Citations compound giving the site owner credibility. Pair the evergreen depth with your own point of view — your data, your experience, your name on it — and you build the kind of authority AI engines look for.

A practical recommendation for most Elements users: a small brochure-style core (home, about, contact, services) paired with a modest evergreen content section. Five to ten well-written, well-maintained pieces will do more than fifty thin ones. Quality is the whole game now.

We’re in a season where the old web and the new one sit side by side. People still type into Google. They also ask Claude or ChatGPT and never visit your site at all. Building for both is pretty much required, besides, building for AI is better for traditional SEO anyway–it’s simply best practice.

Also, there is a technical side to this.

As webmaster/designers the code makes a real difference. Even with the same content, two sites with different markup can greatly impact AI visibility.

tldr;

Semantic HTML + JSON-LD schema + open robots.txt + llms.txt + accurate sitemap. That covers about 80% of the technical work. The rest is content depth, freshness, and named authorship — the human side that no code trick can fake.

The basics every page needs

  • A clear <title> tag. AI uses this to label the page in citations.

  • A meaningful <meta name="description">. Still pulled into snippets and answer cards.

  • <html lang="en"> (or your language). AI uses this to route queries.

  • <meta charset="UTF-8"> and the viewport tag for mobile.

  • A canonical URL: <link rel="canonical" href="...">. This stops duplicate-content confusion.

  • HTTPS sitewide. Bots downgrade trust on plain HTTP.

Semantic HTML over div soup

AI parses your page by reading the tags. Real semantic markup helps it find the actual content:

  • One <h1> per page, then <h2> and <h3> in order.

  • <article> for the main content piece on a blog post.

  • <main>, <header>, <footer>, <nav>, <aside> for page regions.

  • <ul> and <ol> for actual lists, never styled divs pretending to be lists.

  • Use <strong> for emphasis. Plain bold CSS has no semantic meaning to crawlers.

  • <time datetime="2026-05-13">May 13, 2026</time> for any date that should be machine-readable.

  • <figure> and <figcaption> for images that have a story.

If your Elements stacks output <div> for everything, the AI sees a wall of soup. Semantic tags tell it where the real content lives.

Schema markup (JSON-LD)

This is the single biggest win after clean HTML. Drop a <script type="application/ld+json"> block in the <head> of each page with the right schema for that page type:

  • Blog posts: Article or BlogPosting with headline, author, datePublished, dateModified, and image.

  • FAQs: FAQPage with each Q/A pair. These get cited heavily.

  • Tutorials: HowTo with steps.

  • Sitewide: Organization or Person schema in the footer or homepage.

  • Local services: LocalBusiness with address and hours.

  • Products: Product with price, availability, and reviews.

Validate at schema.org or Google’s Rich Results Test before publishing.

Open Graph and Twitter cards

When someone pastes your URL into ChatGPT or Claude, these tags often get read first:

<meta property="og:title" content="Page Title">
<meta property="og:description" content="One-sentence summary">
<meta property="og:image" content="https://...">
<meta property="og:url" content="https://...">
<meta property="og:type" content="article">
<meta name="twitter:card" content="summary_large_image">

Robots and crawler control

Your robots.txt at the site root is where you tell bots what they can do. To allow the main AI crawlers:

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: CCBot
Allow: /

To block any of them, swap Allow: for Disallow:. A few worth knowing:

  • GPTBot — OpenAI’s training crawler

  • OAI-SearchBot — OpenAI’s live search (ChatGPT browsing)

  • ClaudeBot — Anthropic’s training crawler

  • Claude-Web / Claude-User — Anthropic’s live fetch

  • PerplexityBot — Perplexity

  • Google-Extended — Google’s AI training (separate from regular Googlebot)

  • CCBot — Common Crawl, which feeds many models

You can also block specific bots at the Cloudflare level, which is the dashboard work Daniel mentioned in this thread.

llms.txt

A new convention, simple to add. Place a Markdown file at /llms.txt listing your most useful pages:

# Your Site Name

> One-sentence description of what your site is about.

## Main Content
- [Page Title](https://yoursite.com/page): Brief description.
- [Another Page](https://yoursite.com/other): Brief description.

## About
- [About Us](https://yoursite.com/about): Who runs this site.

No solid evidence yet that crawlers fetch this on their own. The real use case is humans pasting your URL into Claude or ChatGPT — the tool often follows the llms.txt link from there.

Sitemap and freshness

  • An XML sitemap at /sitemap.xml.

  • Real <lastmod> dates that reflect when content was actually updated.

  • Submit it to both Google Search Console and Bing Webmaster Tools. Copilot and Meta AI use Bing’s index, and ChatGPT has used Bing in the past.

Bots use the lastmod field to decide what to recrawl. Refreshing dates without changing content is a known anti-pattern and gets pages demoted.

Image handling

  • Real alt text on every meaningful image. AI reads alt text directly.

  • Descriptive file names (pottery-wheel-class-2026.jpg, not IMG_4582.jpg).

  • A clear <figure>/<figcaption> pattern when the caption adds context.

Performance basics

Core Web Vitals still count. AI engines treat slow, broken sites as low quality. Fast TTFB, no layout shift, no render-blocking JS above the fold. Elements handles most of this for you if you stick to native stacks.

One pitfall to watch

If you use JavaScript-loaded content — review widgets, dynamic FAQs, lazy-loaded sections that appear after the page renders — AI crawlers usually do not see it. Either render it server-side or accept that those parts are invisible to AI.

Now go make a website!

Thanks for taking the time to post all that Flash. Perhaps I should finally make the effort to understand h1 and h2 etc…

Aspect Search Bots AI Training Bots
Main Use Indexing for search results Training/fine-tuning AI models
Storage Metadata & structured index Large raw text corpora for ML
Benefit to Site Usually drives traffic & visibility Limited or no direct traffic
Output Search snippets with links AI-generated responses (summaries, etc.)

Search bots = “Help people discover my content.”
AI training bots = “Help build/train an AI model with my content.”

“Many publishers are happy for their content to appear in Google Search (traffic benefit) but unhappy about it being used to train competitors’ AI models without compensation or attribution.”

above is from Grok.