Sitemap and robots in Next.js 15: Indexing Landings Without Hand-Maintained XML
Your Next.js 15 landing scores well on Core Web Vitals, JSON-LD and Open Graph are wired — yet Search Console still shows “Discovered – currently not indexed” for half your URLs. Content is often fine; crawlers simply lack a reliable URL inventory, or they waste budget on /api routes, preview deployments, and tracking-query duplicates.
A static public/sitemap.xml goes stale after every blog post or CMS case study. A robots.txt with Disallow: / can ship to production by mistake. In the App Router, Next.js 15 generates both files from app/sitemap.ts and app/robots.ts: type-safe, build-integrated, and deployed to Vercel Edge with the rest of the app.
This guide covers a production setup for SEO landings and bilingual sites: environment-aware robots, CMS-driven sitemaps, hreflang entries, Google limits, and Search Console checks.
Why Crawl Hints Still Matter in 2026
Google treats sitemaps as hints for discovery and freshness, and robots.txt as host-level crawl rules. Neither forces indexing, but gaps and mistakes produce repeatable GSC patterns.
| GSC symptom | Common technical cause |
|---|---|
| New URLs slow to appear | No sitemap, weak internal links |
/api/* or /_next/* indexed | robots missing service paths |
| Staging in the index | open preview + weak protection |
?utm= duplicates | canonical helps; crawl noise remains |
For a 5–15 page service landing plus a 20–50 post blog, a dynamic sitemap pays off immediately: one source of truth, lastModified from the CMS, sensible changeFrequency per content type.
Robots is not a substitute for page-level noindex. Thank-you pages and CMS drafts need robots: { index: false } in metadata; robots only narrows paths that should never be fetched.
robots.ts: Environments and the Sitemap Directive
app/robots.ts exports a MetadataRoute.Robots object. Next serves /robots.txt with the correct content type.
import type { MetadataRoute } from 'next';
const BASE = process.env.NEXT_PUBLIC_SITE_URL ?? 'https://example.com';
export default function robots(): MetadataRoute.Robots {
const isProduction = process.env.VERCEL_ENV === 'production';
if (!isProduction) {
return { rules: { userAgent: '*', disallow: '/' } };
}
return {
rules: {
userAgent: '*',
allow: '/',
disallow: ['/api/', '/admin/', '/_next/', '/private/'],
},
sitemap: `${BASE}/sitemap.xml`,
host: BASE.replace(/^https?:\/\//, ''),
};
}
VERCEL_ENV !== 'production' should return full disallow on previews. Pair that with Vercel Deployment Protection — robots alone won’t stop shared preview links.
Use an absolute sitemap URL in the Sitemap: directive. Keep /api/ disallowed; sensitive handlers should still return non-HTML responses or auth failures.
If you need this implemented on your stack — message on Telegram.
sitemap.ts: Static Routes, Blog, and CMS
app/sitemap.ts powers /sitemap.xml. Sites beyond ~50k URLs need a sitemap index via generateSitemaps; typical landings need a single array.
import type { MetadataRoute } from 'next';
const BASE = process.env.NEXT_PUBLIC_SITE_URL ?? 'https://example.com';
export default async function sitemap(): Promise<MetadataRoute.Sitemap> {
const staticRoutes: MetadataRoute.Sitemap = [
{ url: `${BASE}/`, changeFrequency: 'weekly', priority: 1 },
{ url: `${BASE}/services`, changeFrequency: 'monthly', priority: 0.9 },
];
const posts = await getBlogPosts();
return [
...staticRoutes,
...posts.map((post) => ({
url: `${BASE}/blog/${post.slug}`,
lastModified: post.updatedAt,
changeFrequency: 'weekly' as const,
priority: 0.6,
})),
];
}
Google largely ignores priority, but lastModified matters when prices or case studies change often. Pull dates from the CMS (_updatedAt in Sanity), not new Date() on every build — otherwise you signal a false sitewide refresh.
Align revalidate with your CMS on-demand flow so sitemap regeneration matches page updates. Cache slug lists with unstable_cache or fetch tags so dev doesn’t hammer the CMS on every /sitemap.xml hit.
For markdown in src/content/blog/en/*.md, read the filesystem at build time with gray-matter — the sitemap stays in sync with Git, no manual export step.
hreflang in the Sitemap
For ru / en prefix routes, wire alternates in each entry:
{
url: `${BASE}/en/services`,
alternates: {
languages: {
ru: `${BASE}/ru/services`,
en: `${BASE}/en/services`,
'x-default': `${BASE}/en/services`,
},
},
}
Mirror the same locales in alternates.languages inside generateMetadata. Wrong x-default is a frequent “International targeting” issue in GSC.
Omit noindex URLs from the sitemap — thank-you pages, drafts, internal search. Robots allow plus meta noindex is valid; sitemap inclusion is not.
Vercel Deploy and Search Console
Google caps sitemaps at 50,000 URLs and 50 MB uncompressed. Pick one trailing slash policy in next.config to avoid duplicate URLs in the index and sitemap.
Post-release checklist:
- Hit
/robots.txtand/sitemap.xml— 200, valid XML. - GSC → Sitemaps → submit → success.
- URL inspection on fresh sitemap entries.
- Pages report — no accidental “Blocked by robots.txt” on money pages.
Fully dynamic product sitemaps need revalidate or short CDN TTL; static builds only refresh sitemap on deploy. A thrown error in sitemap() can break the entire file — worth a build alert.
Need help? Telegram → or vic.kell@ya.ru
FAQ
Is a sitemap necessary for fewer than ten pages?
Worth it as insurance once a blog or campaign landings grow. Google discovers small sites via links; sitemaps speed up new URL reporting.
public/sitemap.xml vs sitemap.ts?
Static files rot under CMS and i18n. TypeScript sitemaps are reviewable in Git and regenerate on deploy.
Does Disallow block indexing?
Not fully — external links can still lead to indexed URLs without fetched content. Use metadata noindex to forbid indexing.
Multiple sitemaps in robots?
Point to the index file when you split; child sitemaps are discovered from there.
Core Web Vitals impact?
None direct. Keep junk URLs out of the sitemap so crawlers don’t stress preview origins.
Conclusion
robots.ts and sitemap.ts are the low-effort crawl layer every Next.js 15 landing should ship. Robots guards previews and service paths; sitemaps give Google an accurate, hreflang-aware URL list with real lastModified values.
Combined with CMS revalidation, edge deploy, and the metadata work you already did for canonicals and structured data, indexing stops depending on someone remembering to export XML after each release. An hour in Search Console beats weeks waiting for new case studies to surface on their own.