Office Scripts URL Status Checker for Power Automate
If you have a column of URLs in Excel and want Power Automate to mark each one as active, taken down, or removed, do not point a browser at them. A simple HTTP check from an Office Script does the same job in a fraction of the time and a fraction of the breakage.
- Use an Office Script with fetch(), not Power Automate Desktop browser automation. 200ms per URL vs 5-10 seconds.
- Trigger the script from Power Automate using the 'Run script' action; it reads and writes the Excel sheet directly.
- Status code alone is not enough for news article takedowns: many sites return 200 with a 'this story has been removed' page or redirect to homepage. Add a body keyword check.
- Set a real User-Agent header. Many sites return 403 to default bot agents and you will get false 'unreachable' results.
- Reach for a headless browser (Playwright) only when the 'removed' state renders client-side. PAD is the worst of both worlds for this job.
Someone on r/PowerAutomate asked the version of this question I see weekly: they have a list of URLs in Excel, they want each one checked, and the obvious answer feels like 'open a browser, paste the link, see what loads.' That is the wrong tool. A list of 500 URLs becomes a 50-minute job with browser automation and a 90-second job with HTTP.
This is the script I send people. It runs as an Office Script, gets triggered from Power Automate, and writes the status back to your sheet. The whole thing is one file.
The script
Drop this into Excel under Automate -> New Script. The sheet is expected to have URLs in column A starting on row 2 (row 1 is the header). The status writes into column B.
async function main(workbook: ExcelScript.Workbook) {
const sheet = workbook.getActiveWorksheet();
const range = sheet.getUsedRange();
const data = range.getValues();
const results: string[][] = [];
const REMOVED_PATTERNS = /(removed|retracted|not found|no longer available|page does not exist|article unavailable)/;
for (let i = 1; i < data.length; i++) {
const url = data[i][0] as string;
let status = 'Unreachable';
try {
const res = await fetch(url, {
method: 'GET',
redirect: 'manual',
headers: { 'User-Agent': 'Mozilla/5.0 (compatible; LinkChecker/1.0)' }
});
const code = res.status;
const body = code === 200 ? (await res.text()).toLowerCase() : '';
if (code === 404 || code === 410) status = 'Taken down';
else if (code >= 300 && code < 400) status = 'Likely removed';
else if (code === 200 && REMOVED_PATTERNS.test(body)) status = 'Likely removed';
else if (code === 200) status = 'Active';
else status = `HTTP ${code}`;
} catch {
status = 'Unreachable';
}
results.push([status]);
}
sheet.getRangeByIndexes(1, 1, results.length, 1).setValues(results);
}Triggering it from Power Automate
- In Power Automate, create a flow with whatever trigger you want (manual, scheduled, on file change).
- Add the 'Excel Online (Business)' connector and the 'Run script' action.
- Pick the workbook, then pick the script you just saved.
- The script reads the active sheet directly. No need to pass the URLs as parameters unless you want to.
Status logic, in plain language
- 200 with no removal keywords -> Active. The page is live and content looks intact.
- 404 or 410 -> Taken down. The server is explicitly saying the resource is gone.
- 3xx redirect (301, 302, 307, 308) -> Likely removed. News and corporate sites often redirect retracted articles to the homepage or a category page rather than returning a 404.
- 200 with 'removed', 'retracted', 'not found', 'no longer available' in the body -> Likely removed. This is the soft 404 case.
- Connection failure or timeout -> Unreachable. Could be the site is down, blocking your request, or the domain is gone.
Why news article takedowns need the body check
If you are checking misinformation links or retracted news articles specifically, status code alone will mislead you. I have looked at a lot of takedowns and the common patterns across major news sites are:
- The article URL stays at 200, but the body is replaced with editorial language: 'This story has been retracted', 'The content you requested is no longer available', 'This article has been removed pending review.'
- The article URL 301-redirects to the section homepage (e.g. /world/, /politics/) so the user lands on something coherent rather than a 404 page.
- The article URL 301-redirects to a corrections page that explains what was wrong with the original story.
- On smaller sites, the URL legitimately returns 404 or 410. Treat this as the cleanest signal you will get.
If you have the original article titles stored somewhere, an even cleaner signal is to fetch the page, parse the <title> tag, and compare. A title that has changed from 'Senator X charged with fraud' to 'Page not found' or 'Section homepage' is a high-confidence removal regardless of status code.
The User-Agent gotcha
Most production websites return 403 Forbidden to requests with no User-Agent or a default scripting User-Agent. If you skip the header, your status column fills up with false 'Unreachable' entries on perfectly live URLs.
Use any plausible browser UA. The script above sends 'Mozilla/5.0 (compatible; LinkChecker/1.0)', which is honest about what you are doing without tripping bot filters. Some sites still block this; if a specific domain returns 403 reliably, swap to a full Chrome UA string for that one.
When this approach is wrong
HTTP-first works for 90 percent of link checking. Reach for something heavier when:
- The site is a single-page app and the 'removed' state only renders after JavaScript runs. Server-side response is 200 with the same shell HTML for every URL. You need a headless browser. Use Playwright in Python or Node, not Power Automate Desktop.
- The URL requires authentication (intranet docs, paywalled content). You need to either authenticate the request or wire in the cookie. Office Scripts fetch() supports custom headers; getting the cookie there is the hard part.
- You are checking thousands of URLs against the same domain. Add a delay between requests or you will hit rate limits and get throttled. The script above runs sequentially, which is naturally polite. Parallelizing makes it fast but rude.
Why not Power Automate Desktop browser actions?
PAD opens a real browser, navigates, waits for the page to load, looks for an element, then closes. Per URL, that is 5-10 seconds and a chance of breaking every time the target site changes its layout. For a list of 500 URLs you are looking at 40-80 minutes of runtime and a flaky outcome.
HTTP fetch is 100-300 milliseconds per URL, no UI dependency, no fragility. Same list runs in 1-2 minutes.
Use PAD when there is no other option (legacy desktop apps, Windows-only line-of-business tools). For anything that speaks HTTP, talk HTTP.
Where to go from here
If you are running this regularly, three improvements pay for themselves:
- Log the HTTP status code in a third column so you can audit decisions later. 'Likely removed' from a 301 is different from 'Likely removed' from a 200-with-keyword.
- Cache results for 24 hours so reruns over the same list do not re-hit live sites. A timestamp column plus an 'if older than 24h' guard is enough.
- Fan out the work. If you genuinely have thousands of URLs and need them in minutes, replace the for-loop with Promise.all in batches of 10. Watch out for per-domain rate limits.
I build packaged automation kits at WorkflowKits. The 'use a browser when HTTP would do' anti-pattern is the single most common time-sink I see in low-code workflow projects. The fix is almost always one Office Script.