I tweeted at the OSUNLP and they're backed up on eval validation. In the meantime, here's the benchmark repo with the saved runs and also instructions on how to run it locally. https://github.com/theredsix/abp-online-mind2web-results
giancarlostoro 2 hours ago [-]
Interesting, I wonder if this would help with other projects too, one project that comes to mind is archivebox, I don't know if they still have the issue I'm thinking of, but archivebox eventually had the Chrome instances (as the meme goes) basically consume all available RAM. If by freezing execution this could stop that, it could be useful for more than just AI agents.
theredsix 33 minutes ago [-]
Yeah, I noticed CPU use goes to near zero during the pausing phase. You can also trigger pause via REST/MCP so a script can take advantage of these abilities as well.
agent-browser's biggest selling point is a CLI wrapper around CDP/puppeteer for context management. It'll have mostly the same pros/cons as CDP on the table.
And what does opus score with "regular" browser harnesses?