jesse_the_k: Robot dog from original Doctor Who (k9 to the rescue)
[personal profile] jesse_the_k

from someone who's a realist-for-now yet also wants to believe.

Adam Engst on Can Agentic Web Browsers Count?

tl;dr No, given a readily available data set on a webpage, they can't.

The sweetest and scariest part was his sympathy for Copilot's very anxious inner monologue as it tried to come up with answers while working to a deadline that nobody had created.

When it comes to system prompts, the anxious tone of Copilot’s internal responses suggests a “ship now, apologize later, if you’re caught” system prompt that, if reflected in a real-world workplace, would be problematic. Obviously, AIs don’t have feelings that can be hurt and won’t complain to HR, but such a culture tends to encourage people to cut corners and make poor decisions that compromise quality and customer service. If Copilot is any indication, the same is true for AIs.

⇾1

(no subject)

Date: 15/11/2025 12:22 am (UTC)
merrileemakes: (dino math)
From: [personal profile] merrileemakes
This was an interesting article, thanks for sharing.

I didn't know you could now see that internal monologue of some AIs. It helps to combat the black box problem, but opens up a whole bunch of new ones. Poor Copilot.
⇾1

(no subject)

Date: 15/11/2025 02:53 am (UTC)
jadelennox: Senora Sabasa Garcia, by Goya (Default)
From: [personal profile] jadelennox

I am absolutely floored by

The only problem was that when I checked the Google Sheet against the reported numbers, the results for the Strides of March meet were wrong (196 instead of 173). So I trimmed the data in the spreadsheet to just that meet and used BBEdit to compare it against the actual confirmation list, which is when I discovered that ChatGPT Atlas had gone full cuckoo with the Strides of March registration list. It had replaced numerous people with completely fabricated names and ages and increased the total number of registrants by four.

Since the spreadsheet contents for both the January Jicker and February Flash Dash meets were completely correct, I would guess this is another case of the chatbot losing its context window due to too much data. Had ChatGPT Atlas not hallucinated data in the spreadsheet, it would have received an A.

So: "it turned out that when I double checked it, it was utterly full of incorrect made-up nonsense, and it's mostly by chance that the results were close to right. But if that hadn't happened, it would have been perfect. Ah, well, B."

Edited Date: 15/11/2025 02:54 am (UTC)
⇾2

(no subject)

Date: 15/11/2025 09:39 pm (UTC)
yourlibrarian: Panic-r_becca (BUF-Panic-r_becca)
From: [personal profile] yourlibrarian
And this also assumes that whoever is working with or checking the AI's work actually know anything about the task or issue. Which if current students are any indication, they won't.
⇾1

(no subject)

Date: 15/11/2025 07:28 pm (UTC)
davidgillon: A pair of crutches, hanging from coat hooks, reflected in a mirror (Default)
From: [personal profile] davidgillon
When Google searches started adding AI summaries at the top, the first one I saw said something on the lines of "the 2024 budget is $183m, by 2026 this will have increased to $163m".

Popular Tags

March 2026

S M T W T F S
123456 7
891011 121314
15 16171819 2021
22232425262728
293031    

Style Credit

Powered by Dreamwidth Studios
Page generated Saturday, 4 April 2026 09:58 pm