Core workflows
glass gives an agent a closed loop over a real GUI app. The tools fall into four moves.
build
glass_start builds (optional) and launches the app, sandboxed by default, and captures its
logs. One session is active at a time; the backend is chosen per session — x11 or wayland
on Linux, windows on a Windows host, or android to drive an app in an emulator.
see
glass_screenshot returns the window image. To check a change cheaply, glass_baseline_save
a good frame, act, then glass_diff — it returns changed_pct and a bounding box as text,
no vision tokens.
interact
glass_click, glass_type, glass_key, glass_scroll, and glass_drag inject input at
window-relative coordinates (0,0 is the window’s top-left). glass_do batches a sequence
into one call.
debug
glass_logs returns the app’s stdout/stderr. The glass_wait_for_element,
glass_wait_for_region, and glass_wait_for_log tools block until a specific condition is met
and time out softly with {matched:false}.
For the complete tool list and semantic (accessibility-tree) addressing, see the glass README.