To have a better understanding of the waitAndScreenshot
carry out, let’s take a look at the log of the carry out in movement:
After the net web page is absolutely loaded, all interactive elements are highlighted and a screenshot is taken.
export const waitTillHTMLRendered = async (
internet web page: Internet web page,
timeout: amount = 30000,
checkOnlyHTMLBody: boolean = false
) => {
const waitTimeBetweenChecks: amount = 1000;
const maximumChecks: amount = timeout / waitTimeBetweenChecks; // assuming confirm itself would not take time
let lastHTMLSize = 0;
let stableSizeCount = 0;
const COUNT_THRESHOLD = 3;const isSizeStable = (currentSize: amount, lastSize: amount) => {
if (currentSize !== lastSize) {
return false; // nonetheless rendering
} else if (currentSize === lastSize && lastSize === 0) {
return false; // internet web page stays empty - didn't render
} else {
return true; // regular
}
};
for (let i = 0; i < maximumChecks; i++) {
const html = await internet web page.content material materials();
const currentHTMLSize = html.dimension;
const currentBodyHTMLSize = await internet web page.take into account(
() => doc.physique.innerHTML.dimension
);
const currentSize = checkOnlyHTMLBody
? currentBodyHTMLSize
: currentHTMLSize;
// logging
console.log(
"last: ",
lastHTMLSize,
" <> curr: ",
currentHTMLSize,
" physique html dimension: ",
currentBodyHTMLSize
);
stableSizeCount = isSizeStable(currentSize, lastHTMLSize)
? stableSizeCount + 1
: 0;
console.log(`Safe dimension rely: ${stableSizeCount}`);
// if the HTML dimension stays the equivalent for 3 consecutive seconds, it assumes the net web page has accomplished loading
if (stableSizeCount >= COUNT_THRESHOLD) {
console.log("Internet web page rendered completely..");
break;
}
lastHTMLSize = currentSize;
await internet web page.waitForTimeout(waitTimeBetweenChecks);
}
};
Step 2 (cont.) — Click on on Response Motion
: The clickNavigationAndScreenshot
carry out
This carry out is used to click on on on a selected issue on the internet web page and stay up for the net web page to load absolutely after which take a screenshot. For the click on on
movement, it makes use of 1 different carry out known as clickOnLink
.
export const clickNavigationAndScreenshot = async (
linkText: string,
internet web page: Internet web page,
browser: Browser
) => {
let imagePath;try {
const navigationPromise = internet web page.waitForNavigation();
// The Click on on movement
const clickResponse = await clickOnLink(linkText, internet web page);
if (!clickResponse) {
// if the hyperlink triggers a navigation on the equivalent internet web page, stay up for the net web page to load absolutely after which take a screenshot
await navigationPromise;
imagePath = await waitAndScreenshot(internet web page);
} else {
// if the hyperlink opens in a model new tab, ignore the navigationPromise as there is not going to be any navigation
navigationPromise.catch(() => undefined);
// swap to the model new tab and take a screenshot
const newPage = await newTabNavigation(clickResponse, internet web page, browser);
if (newPage === undefined) {
throw new Error("The model new internet web page cannot be opened");
}
imagePath = await waitAndScreenshot(newPage);
}
return imagePath;
} catch (err) {
throw err;
}
};
The clickOnLink
carry out
This carry out loops by way of all of the climate with the gpt-link-text
attribute (distinctive identifier acquired all through issue annotation) and clicks on the one which matches the hyperlink textual content material supplied by the LLM.
const clickOnLink = async (linkText: string, internet web page: Internet web page) => {
try {
const clickResponse = await internet web page.take into account(async (linkText) => {const isHTMLElement = (issue: Element): issue is HTMLElement => {
return issue instanceof HTMLElement;
};
const elements = doc.querySelectorAll("[gpt-link-text]");
// loop by way of all elements with `gpt-link-text` attribute
for (const issue of elements) {
if (!isHTMLElement(issue)) {
proceed;
}
// uncover the issue that comes with the centered hyperlink textual content material
if (
issue
.getAttribute("gpt-link-text")
?.consists of(linkText.trim().toLowerCase())
) {
// This if assertion is to cope with the case the place the hyperlink opens in a model new tab
if (issue.getAttribute("purpose") === "_blank") {
return issue.getAttribute("gpt-link-text");
}
// highlight and perform the press movement
issue.sort.backgroundColor = "rgba(255,255,0,0.25)";
issue.click on on();
return;
}
}
// offered that the loop ends with out returning
throw new Error(`Hyperlink with textual content material not found: "${linkText}"`);
}, linkText);
return clickResponse;
} catch (err) {
if (err instanceof Error) {
throw err;
}
}
};
Element Annotation Service
Let’s look deeper into the highlightInteractiveElements
carry out that’s named inside waitAndScreenshot
.
It’s a service that annotates the interactive HTML elements for the agent. It would highlight elements with a crimson bounding area
and add distinctive identifiers to them.
Take into consideration giving your AI agent a selected pair of glasses that lets it see the interactive spots on an web website — the buttons, hyperlinks, and fields — like glowing treasures on a treasure map.
That’s primarily what the highlightInteractiveElements
carry out does. It is sort of a highlighter for the digital world, sketching crimson bins spherical clickable objects and tagging them with digital nametags.
With the annotation, the accuracy of the agent’s interpretation of the image is principally improved. This concept is known as Set-of-Mark Prompting
.
Proper right here is an occasion of the annotated screenshot:
There’s a evaluation paper discussing the importance of this topic intimately: Set-of-Mark Prompting.
Proper right here’s the way in which it performs:
- It begins by eradicating any earlier digital nametags (html attribute
gpt-link-text
) that will confuse our AI. - Then, it lights up every clickable issue it finds with a crimson outline to help the AI spot the place to ‘click on on’.
- Each interactive issue will get a novel nametag. This tag/attribute can be utilized to ascertain the issue that Puppeteer can later work along with.
One key ingredient to remember is when dealing with puppeteer or another testing framework that programmatically interacts with the net, the issue with a hyperlink textual content material won’t be seen. Proper right here is a simple occasion:
<div sort="present: none">
<a href="https://www.occasion.com">
<span>Click on on me</span>
</a>
</div>
The mum or dad div is hidden, so the hyperlink isn’t seen. This issue should be excluded. Recursive checking the mum or dad issue is vital to ensure the issue is seen. See beneath graph for the logic:
Code implementation of the highlightInteractiveElements
carry out
import { Internet web page } from "puppeteer";const INTERACTIVE_ELEMENTS = [
"a",
"button",
"input",
"textarea",
"[role=button]",
"[role=treeitem]",
'[onclick]:not([onclick=""])',
];
/**
* Reset the distinctive identifier attribute and take away beforehand highlighted elements
* @param internet web page
*/
const resetUniqueIdentifierAttribute = async (internet web page: Internet web page): Promise<void> => {
await internet web page.take into account(() => {
const UNIQUE_IDENTIFIER_ATTRIBUTE = "gpt-link-text";
const elements = doc.querySelectorAll(
`[${UNIQUE_IDENTIFIER_ATTRIBUTE}]`
);
for (const issue of elements) {
issue.removeAttribute(UNIQUE_IDENTIFIER_ATTRIBUTE);
}
});
};
/**
* This carry out annotates the entire interactive elements on the internet web page
* @param internet web page
*/
const annotateAllInteractiveElements = async (internet web page: Internet web page) => {
// $$eval methodology runs Array.from(doc.querySelectorAll(selector)) contained in the `internet web page`and passes the top consequence as the first argument to the pageFunction.
// If no elements match the selector, the first argument to the pageFunction is [].
await internet web page.$$eval(
INTERACTIVE_ELEMENTS.be a part of(", "), // the selector could be outlined exterior the browser context
// the argument `elements` could be an empty array if no elements match the selector
carry out (elements) {
// any console.log will not be going to be seen inside the node terminal
// in its place, it will be seen inside the browser console
// cope with empty array
if (elements.dimension === 0) {
throw new Error("No elements found");
}
//======================================VALIDATE ELEMENT CAN INTERACT=================================================
// This run-time confirm should be outlined contained within the pageFunction because it's working inside the browser context. If outlined exterior, it will throw an error: "ReferenceError: isHTMLElement is not outlined"
const isHTMLElement = (issue: Element): issue is HTMLElement => {
// this assertion is to allow Element to be dealt with as HTMLElement and has `sort` property
return issue instanceof HTMLElement;
};
const isElementStyleVisible = (issue: Element) => {
const sort = window.getComputedStyle(issue);
return (
sort.present !== "none" &&
sort.visibility !== "hidden" &&
sort.opacity !== "0" &&
sort.width !== "0px" &&
sort.high !== "0px"
);
};
const isElementVisible = (issue: Element | undefined | null) => {
if (issue === null || issue === undefined) {
throw new Error("isElementVisible: Element is null or undefined");
}
let currentElement: Element | null = issue;
whereas (currentElement) {
if (!isElementStyleVisible(currentElement)) {
return false;
}
currentElement = currentElement.parentElement;
}
return true;
};
//========================================PREPARE UNIQUE IDENTIFIER================================================
const setUniqueIdentifierBasedOnTextContent = (issue: Element) => {
const UNIQUE_IDENTIFIER_ATTRIBUTE = "gpt-link-text";
const { textContent, tagName } = issue;
// if the node is a doc or doctype, textContent will be null
if (textContent === null) {
return;
}
issue.setAttribute(
UNIQUE_IDENTIFIER_ATTRIBUTE,
textContent.trim().toLowerCase()
);
};
//========================================HIGHLIGHT INTERACTIVE ELEMENTS================================================
for (const issue of elements) {
if (isHTMLElement(issue)) {
// highlight the entire interactive elements with a crimson bonding area
issue.sort.outline = "2px secure crimson";
}
// assign a novel identifier to the issue
if (isElementVisible(issue)) {
// set a novel identifier attribute to the issue
// this attribute can be utilized to ascertain the issue that puppeteer should work along with
setUniqueIdentifierBasedOnTextContent(issue);
}
}
}
);
};
/**
* This carry out highlights the entire interactive elements on the internet web page
* @param internet web page
*/
export const highlightInteractiveElements = async (internet web page: Internet web page) => {
await resetUniqueIdentifierAttribute(internet web page);
await annotateAllInteractiveElements(internet web page);
};
On this text, we’ve acquired gone by way of the construction of the AI agent, the code implementation of each step, and some concepts behind the design, akin to Set-of-Mark Prompting. The agent is a classy system that requires cautious orchestration of assorted suppliers to work efficiently, and in the mean time it has a great deal of factors and limitations. Once you’ve acquired any questions or suggestions, please be at liberty to attain out to me. I could be happy to debate this topic extra.
Jason Li (Tianyi Li, LinkedIn) is a Full-stack Developer working at Mindset Health in Melbourne Australia. Jason is obsessed with AI, front-end enchancment and space related utilized sciences.
Selina Li (Selina Li, LinkedIn) is a Principal Data Engineer working at Officeworks in Melbourne Australia. Selina is obsessed with AI/ML, data engineering and funding.
Jason and Selina wish to uncover utilized sciences to help of us get hold of their targets.
Besides in another case well-known, all images are by the authors.
Thank you for being a valued member of the Nirantara family! We appreciate your continued support and trust in our apps.
- Nirantara Social - Stay connected with friends and loved ones. Download now: Nirantara Social
- Nirantara News - Get the latest news and updates on the go. Install the Nirantara News app: Nirantara News
- Nirantara Fashion - Discover the latest fashion trends and styles. Get the Nirantara Fashion app: Nirantara Fashion
- Nirantara TechBuzz - Stay up-to-date with the latest technology trends and news. Install the Nirantara TechBuzz app: Nirantara Fashion
- InfiniteTravelDeals24 - Find incredible travel deals and discounts. Install the InfiniteTravelDeals24 app: InfiniteTravelDeals24
If you haven't already, we encourage you to download and experience these fantastic apps. Stay connected, informed, stylish, and explore amazing travel offers with the Nirantara family!
Source link