Core workflows

glass gives an agent a closed loop over a real GUI app. The tools fall into four moves.

build

glass_start builds (optional) and launches the app, sandboxed by default, and captures its logs. One session is active at a time; the backend is chosen per session — x11 or wayland on Linux, windows on a Windows host, or android to drive an app in an emulator.

see

glass_screenshot returns the window image. To check a change cheaply, glass_baseline_save a good frame, act, then glass_diff — it returns changed_pct and a bounding box as text, no vision tokens.

interact

glass_click, glass_type, glass_key, glass_scroll, and glass_drag inject input at window-relative coordinates (0,0 is the window’s top-left). glass_do batches a sequence into one call.

debug

glass_logs returns the app’s stdout/stderr. The glass_wait_for_element, glass_wait_for_region, and glass_wait_for_log tools block until a specific condition is met and time out softly with {matched:false}.

For the complete tool list and semantic (accessibility-tree) addressing, see the glass README.