[00:00:00.000 --> 00:00:04.540]   you should have security and privacy first and so on.
[00:00:04.540 --> 00:00:07.940]   So if you read everything into this key,
[00:00:07.940 --> 00:00:10.540]   you can perfectly understand what the purpose
[00:00:10.540 --> 00:00:15.540]   of Mozilla AI is and how it differs from Mozilla Firefox
[00:00:15.540 --> 00:00:18.940]   or Mozilla as a foundation in general.
[00:00:18.940 --> 00:00:23.380]   On the bottom right here, you will find a QR code
[00:00:23.380 --> 00:00:25.820]   that brings you directly to our GitHub repository
[00:00:25.820 --> 00:00:28.620]   where you can see the open source that we are building.
[00:00:28.620 --> 00:00:33.400]   So this is a short parable.
[00:00:33.400 --> 00:00:36.120]   It's a little bit of a story about myself
[00:00:36.120 --> 00:00:37.700]   that starts quite some time ago.
[00:00:37.700 --> 00:00:42.680]   And the first thing I'm gonna show you is my desktop image.
[00:00:42.680 --> 00:00:46.520]   This is a picture drawn by David Devoy
[00:00:46.520 --> 00:00:49.220]   and it's about Ada from Adam Zangeman.
[00:00:49.220 --> 00:00:52.300]   So this is a story by Matthias Kirchner
[00:00:52.300 --> 00:00:54.040]   and Sandra Brandtstatter.
[00:00:54.040 --> 00:00:56.340]   I don't remember which of the two is the illustrator
[00:00:56.340 --> 00:00:58.200]   and which one is the writer, sorry about that.
[00:00:58.200 --> 00:00:59.540]   But you can look for it.
[00:00:59.540 --> 00:01:01.900]   There's a video on YouTube about these
[00:01:01.900 --> 00:01:04.120]   and it's actually linked in the slides.
[00:01:04.120 --> 00:01:08.760]   It's a story of a little girl who likes to tinker
[00:01:08.760 --> 00:01:10.700]   with the hardware and software.
[00:01:10.700 --> 00:01:15.700]   And there's another guy in the story, a man who's a genius.
[00:01:15.700 --> 00:01:21.940]   He is a super smart and he can build any kind of tool
[00:01:21.940 --> 00:01:23.460]   and he builds them for the people.
[00:01:23.460 --> 00:01:25.580]   And of course it makes them commercial too
[00:01:25.580 --> 00:01:28.340]   because he has to be sustainable in his job.
[00:01:28.340 --> 00:01:32.120]   But at the same time, it also applies some of his own twist
[00:01:32.120 --> 00:01:33.560]   to every single tool.
[00:01:33.560 --> 00:01:36.000]   And whether it is because he cares about people,
[00:01:36.000 --> 00:01:39.760]   cares about security, has some biases in making decisions
[00:01:39.760 --> 00:01:41.720]   or likes or dislikes things.
[00:01:41.720 --> 00:01:43.240]   All the things that he builds are things
[00:01:43.240 --> 00:01:45.800]   that cannot be really easily modified.
[00:01:45.800 --> 00:01:49.300]   And what this girl does is taking parts of these things
[00:01:49.300 --> 00:01:51.160]   from the dumpster, putting them together
[00:01:51.160 --> 00:01:54.000]   and building super custom tools.
[00:01:54.000 --> 00:01:57.980]   So as you might guess, I strongly related with this girl.
[00:01:57.980 --> 00:02:00.740]   When I was about her age, 10, 11 year old,
[00:02:00.740 --> 00:02:04.660]   I had my first business car seeing an inventor
[00:02:04.660 --> 00:02:09.180]   and I very much relate and very much want what we do
[00:02:09.180 --> 00:02:12.260]   with AI today to be very similar to that.
[00:02:12.260 --> 00:02:16.460]   Let's fast forward from when I was 10 year old
[00:02:16.460 --> 00:02:18.820]   to about 20 years later.
[00:02:18.820 --> 00:02:21.140]   In 2005, which is still 20 years ago
[00:02:21.140 --> 00:02:22.660]   because I'm not that young,
[00:02:22.660 --> 00:02:24.780]   I was talking about power browsing.
[00:02:24.780 --> 00:02:27.600]   The main idea for me was starting from this metaphor,
[00:02:27.600 --> 00:02:30.460]   like we can look at the reality,
[00:02:30.460 --> 00:02:33.420]   these very nice flowers in different ways.
[00:02:33.420 --> 00:02:36.980]   So let's say we have some eye issue,
[00:02:36.980 --> 00:02:38.600]   we have myopia for instance,
[00:02:38.600 --> 00:02:40.640]   and what we see is kind of blurred.
[00:02:40.640 --> 00:02:43.180]   We can correct it with some kind of lenses
[00:02:43.180 --> 00:02:46.960]   that are built by us after we find
[00:02:46.960 --> 00:02:49.100]   how the system can be improved
[00:02:49.100 --> 00:02:51.300]   and we can actually see the reality as it is.
[00:02:51.300 --> 00:02:53.500]   But we can also improve reality somehow
[00:02:53.500 --> 00:02:56.780]   if there's too much light, we can put sunglasses
[00:02:56.780 --> 00:02:58.500]   and we can see things in focus,
[00:02:58.500 --> 00:03:02.060]   but at the same time, not be blinded by lights.
[00:03:02.060 --> 00:03:04.900]   And I wanted to try apply the same paradigm
[00:03:04.900 --> 00:03:08.540]   to some more technological things.
[00:03:08.540 --> 00:03:11.580]   And at the time, early 2000s,
[00:03:11.580 --> 00:03:16.420]   websites were completely overwhelmed by pop-ups and ads.
[00:03:16.420 --> 00:03:20.460]   And I would say we passed to better times.
[00:03:20.460 --> 00:03:23.580]   Now maybe we have gone back to ads and pop-ups
[00:03:23.580 --> 00:03:25.780]   and things we would like to block.
[00:03:25.780 --> 00:03:29.260]   And my metaphor here was the eye
[00:03:29.260 --> 00:03:30.940]   without a very good eyesight,
[00:03:30.940 --> 00:03:33.780]   was Internet Explorer at the time.
[00:03:33.780 --> 00:03:36.780]   And already 20 years ago, I was a Firefox fan.
[00:03:36.780 --> 00:03:37.940]   And I said, if you install it,
[00:03:37.940 --> 00:03:40.040]   you can already put some ad blocker
[00:03:40.040 --> 00:03:42.580]   and reduce part of the content that you're seeing
[00:03:42.580 --> 00:03:45.740]   to get at least the content you're interested in
[00:03:45.740 --> 00:03:48.840]   more emphasized inside of the page.
[00:03:48.840 --> 00:03:52.100]   And the equivalent of the sunglasses in my case
[00:03:52.100 --> 00:03:54.260]   was having some kind of bot,
[00:03:54.260 --> 00:03:55.660]   some kind of automatic tool
[00:03:55.660 --> 00:04:00.440]   which could just crawl a website, extract its contents
[00:04:00.440 --> 00:04:02.140]   and just make them available
[00:04:02.140 --> 00:04:05.440]   as just the content that you're interested in.
[00:04:05.440 --> 00:04:07.660]   And I developed part of these tools.
[00:04:07.660 --> 00:04:09.120]   If you look for power browsing,
[00:04:09.120 --> 00:04:11.540]   you're gonna find probably a super old Wiki of mine.
[00:04:11.540 --> 00:04:13.900]   It's actually 20 years old.
[00:04:13.900 --> 00:04:17.500]   And some of those tools were built at the time in the Pearl
[00:04:17.500 --> 00:04:21.260]   and they ran on the laptop you see on the picture here.
[00:04:21.260 --> 00:04:25.800]   It was an old laptop, I think a Compaq Armada
[00:04:25.800 --> 00:04:27.900]   that ran inside the drawer
[00:04:27.900 --> 00:04:31.440]   connected to my, at the time, very slow DSL
[00:04:31.440 --> 00:04:36.440]   and always on so I could contact it via a Symbian phone
[00:04:36.440 --> 00:04:39.540]   and just get information that was pre-scraped for me.
[00:04:39.540 --> 00:04:41.740]   So I didn't have to download too much stuff.
[00:04:41.740 --> 00:04:44.200]   So this was a nice experiment.
[00:04:44.200 --> 00:04:47.160]   And I learned a lot while I was tinkering with these things.
[00:04:47.160 --> 00:04:50.580]   And if you fast forward now 20 more years,
[00:04:50.580 --> 00:04:52.300]   it's just last year,
[00:04:52.300 --> 00:04:56.200]   I wanted to retry this power browsing experiment.
[00:04:56.200 --> 00:04:59.520]   And what I did was trying to use a cloud
[00:04:59.520 --> 00:05:02.320]   and do some, let's say crossover
[00:05:02.320 --> 00:05:04.340]   between vibe coding and reversing.
[00:05:04.340 --> 00:05:07.300]   So I wanted to apply the same techniques
[00:05:07.300 --> 00:05:10.800]   I used 20 years before using cloud code.
[00:05:10.800 --> 00:05:12.280]   So the problem was the following.
[00:05:12.280 --> 00:05:17.280]   There was the Italian railways website called Trenitalia
[00:05:17.280 --> 00:05:20.700]   that had train time tables.
[00:05:20.700 --> 00:05:25.700]   And I wanted to be able to read those train time tables
[00:05:25.700 --> 00:05:27.840]   without having to connect to the website,
[00:05:27.840 --> 00:05:30.840]   without having to follow all its plethora of menus
[00:05:30.840 --> 00:05:33.280]   and see potential advertisements.
[00:05:33.280 --> 00:05:35.040]   And when I did it the first time,
[00:05:35.040 --> 00:05:36.680]   again, I had to learn the website.
[00:05:36.680 --> 00:05:38.880]   I had to learn PERM, regular expression,
[00:05:38.880 --> 00:05:42.500]   how to write the crawler and do all from scratch.
[00:05:42.500 --> 00:05:45.400]   This time I just open cloud and I said,
[00:05:45.400 --> 00:05:47.920]   I would like to see how you can help me with that.
[00:05:47.920 --> 00:05:50.600]   And cloud started searching for information.
[00:05:50.600 --> 00:05:54.280]   It found a lot of good data sources.
[00:05:54.280 --> 00:05:56.640]   Actually in these 20 years, you might guess
[00:05:56.640 --> 00:06:00.120]   a lot of people have learned the same things
[00:06:00.120 --> 00:06:02.340]   that I learned in time and wrote documents
[00:06:02.340 --> 00:06:05.960]   and shared how they reverse engineered this API.
[00:06:05.960 --> 00:06:08.760]   And I just had to share a screenshot
[00:06:08.760 --> 00:06:13.320]   from the Firefox network browser to,
[00:06:13.320 --> 00:06:16.700]   and ask which do you think is the JSON file
[00:06:16.700 --> 00:06:18.760]   across all of these things that I have downloaded?
[00:06:18.760 --> 00:06:22.240]   Like where do I have to take information from
[00:06:22.240 --> 00:06:25.220]   to find information about trains?
[00:06:25.220 --> 00:06:29.720]   And I got everything, all the suggestions, all the tools.
[00:06:29.720 --> 00:06:33.280]   I actually asked cloud to build a small UI for me
[00:06:33.280 --> 00:06:36.120]   to automatically see timetables and it worked.
[00:06:36.120 --> 00:06:38.740]   Basically we're just out of the box.
[00:06:38.740 --> 00:06:41.440]   So is reverse engineering dead?
[00:06:41.440 --> 00:06:44.040]   Should we just ask cloud to do things?
[00:06:44.040 --> 00:06:47.320]   Well, it worked, but I noticed a few things
[00:06:47.320 --> 00:06:50.960]   which kind of give me also the motivation
[00:06:50.960 --> 00:06:53.020]   for the class that we're having today.
[00:06:53.020 --> 00:06:56.720]   So, well, first of all, we need to of course,
[00:06:56.720 --> 00:06:58.000]   add some caveats.
[00:06:58.000 --> 00:07:00.600]   This is a task that I had already done in the past.
[00:07:00.600 --> 00:07:03.920]   So of course I think I added some bias here
[00:07:03.920 --> 00:07:06.120]   into how to solve the problem, right?
[00:07:06.120 --> 00:07:08.800]   So I already had slight idea at least
[00:07:08.800 --> 00:07:11.200]   whether the suggestions that cloud was giving me
[00:07:11.200 --> 00:07:12.500]   were correct or not.
[00:07:12.500 --> 00:07:15.680]   So I could tell in advance whether I had to direct it
[00:07:15.680 --> 00:07:17.480]   in one direction or another.
[00:07:17.480 --> 00:07:19.040]   So first question is,
[00:07:19.040 --> 00:07:21.800]   would I have I been able to make sure it worked
[00:07:21.800 --> 00:07:24.720]   if I hadn't done it already in the past?
[00:07:24.720 --> 00:07:26.360]   Then the second thing that I realized
[00:07:26.360 --> 00:07:29.480]   was all the artifacts that were generated by cloud
[00:07:29.480 --> 00:07:31.120]   were on the platform.
[00:07:31.120 --> 00:07:33.320]   So yes, I could download them,
[00:07:33.320 --> 00:07:36.080]   but of course I need to ask cloud to create them first.
[00:07:36.080 --> 00:07:39.360]   If I had just asked cloud to tell me at what time
[00:07:39.360 --> 00:07:40.840]   a given train would have left,
[00:07:40.840 --> 00:07:42.720]   cloud would have been able to do that,
[00:07:42.720 --> 00:07:46.040]   but then I would have been dependent on cloud
[00:07:46.040 --> 00:07:48.400]   for all the subsequent questions, right?
[00:07:48.400 --> 00:07:51.760]   So you need to be explicit in that case
[00:07:51.760 --> 00:07:54.960]   and ask for some artifact that you can bring home
[00:07:54.960 --> 00:07:57.700]   and you can use autonomously the following times.
[00:07:58.760 --> 00:08:01.360]   And then I got very good learning references this time,
[00:08:01.360 --> 00:08:03.480]   which is something I didn't have the first time.
[00:08:03.480 --> 00:08:05.720]   I had to look for everything that I needed.
[00:08:05.720 --> 00:08:09.020]   Well, this time, well, of course I knew them already,
[00:08:09.020 --> 00:08:12.280]   but if I had been only interested in the quick answer,
[00:08:12.280 --> 00:08:14.980]   like what time does this train leave,
[00:08:14.980 --> 00:08:19.000]   I would probably have just skipped these references anyway.
[00:08:19.000 --> 00:08:20.480]   And last but not least,
[00:08:20.480 --> 00:08:23.200]   and I think this is probably the most important thing,
[00:08:23.200 --> 00:08:25.440]   I wrote zero lines of code
[00:08:25.440 --> 00:08:28.180]   and I learned nothing out of this experience,
[00:08:28.180 --> 00:08:31.340]   except, well, maybe prompting cloud.
[00:08:31.340 --> 00:08:33.880]   But if I have to check it out
[00:08:33.880 --> 00:08:37.240]   from what happened like 20 years ago,
[00:08:37.240 --> 00:08:39.520]   the main difference is that 20 years ago,
[00:08:39.520 --> 00:08:42.760]   I learned about HTTP protocol.
[00:08:42.760 --> 00:08:44.320]   I learned about curl.
[00:08:44.320 --> 00:08:46.440]   I learned about regular expressions.
[00:08:46.440 --> 00:08:49.040]   And of course it was not just this single crawler
[00:08:49.040 --> 00:08:50.800]   that I built, but many of them,
[00:08:50.800 --> 00:08:53.720]   but all the knowledge I had accumulated in time
[00:08:53.720 --> 00:08:57.580]   is knowledge that I could reuse all the years after that.
[00:08:57.580 --> 00:09:01.320]   And make those things into a profession.
[00:09:01.320 --> 00:09:03.880]   And I could tell you many times
[00:09:03.880 --> 00:09:07.820]   in which knowing a regular expression helped me afterwards.
[00:09:07.820 --> 00:09:11.720]   And of course, moving forward to today,
[00:09:11.720 --> 00:09:13.920]   I think there are some skills that you could learn
[00:09:13.920 --> 00:09:16.760]   that will be useful in the future.
[00:09:16.760 --> 00:09:19.920]   And they can be related, of course, to agents and cloud.
[00:09:19.920 --> 00:09:22.760]   What I believe is it's probably not just prompting
[00:09:22.760 --> 00:09:24.760]   and part of the contents of this class
[00:09:24.760 --> 00:09:28.560]   are actually equivalent skills to what 20 years ago
[00:09:28.560 --> 00:09:31.460]   were regular expressions, HTTP, and so on,
[00:09:31.460 --> 00:09:35.160]   that I hope you will be able to bring over in the future
[00:09:35.160 --> 00:09:38.400]   to next things you will want to build with AI.
[00:09:38.400 --> 00:09:42.300]   Another note was the following one that I got
[00:09:42.300 --> 00:09:47.080]   when I wrote the blog post about what I did.
[00:09:47.080 --> 00:09:51.500]   I asked Claude to pretend to be
[00:09:54.640 --> 00:09:56.620]   very critical about my post.
[00:09:56.620 --> 00:09:59.640]   And what Claude wrote was you're a technical person
[00:09:59.640 --> 00:10:03.340]   telling not technical people to make their lives harder
[00:10:03.340 --> 00:10:06.500]   to solve problems that mostly exist in your head.
[00:10:06.500 --> 00:10:11.000]   So if it weren't for elements hallucinating
[00:10:11.000 --> 00:10:13.700]   from time to time and me trying to believe that
[00:10:13.700 --> 00:10:15.880]   as much as I could just not to be very offended
[00:10:15.880 --> 00:10:17.200]   about this response,
[00:10:17.200 --> 00:10:21.100]   I would have at least taken this a bit personally.
[00:10:21.100 --> 00:10:24.660]   But I want to share some of the concerns I had
[00:10:24.660 --> 00:10:28.020]   and ask you whether you are also concerned
[00:10:28.020 --> 00:10:29.720]   about these or not.
[00:10:29.720 --> 00:10:33.160]   So are these things just in my head or not?
[00:10:33.160 --> 00:10:35.460]   If I can have thumbs up or thumbs down,
[00:10:35.460 --> 00:10:38.100]   depending on whether you agree with me or not,
[00:10:38.100 --> 00:10:40.220]   these are some of the experiences that I had
[00:10:40.220 --> 00:10:41.660]   in the last, let's say, year,
[00:10:41.660 --> 00:10:44.260]   playing a bit more with LLMs.
[00:10:44.260 --> 00:10:47.860]   So are you concerned about user experience
[00:10:47.860 --> 00:10:49.580]   which changes continuously?
[00:10:50.840 --> 00:10:52.540]   Inconsistent model performance.
[00:10:52.540 --> 00:10:54.380]   So you use the same system.
[00:10:54.380 --> 00:10:55.900]   Sometimes it works greatly.
[00:10:55.900 --> 00:11:00.900]   And sometimes it has a much worse performance than usual.
[00:11:00.900 --> 00:11:02.440]   Changes in pricing.
[00:11:02.440 --> 00:11:05.160]   So something that costed $20 before,
[00:11:05.160 --> 00:11:09.520]   now you can do unless you pay the next monthly payment.
[00:11:09.520 --> 00:11:17.740]   So let's say instead of 20, 50 or $100.
[00:11:17.740 --> 00:11:21.800]   Sustainability, do we always need to use a tool
[00:11:21.800 --> 00:11:25.040]   that can do everything instead of something
[00:11:25.040 --> 00:11:28.340]   that just works ad hoc for our specific problem?
[00:11:28.340 --> 00:11:30.560]   Do we need to contact something
[00:11:30.560 --> 00:11:34.300]   that runs on a huge data center
[00:11:34.300 --> 00:11:36.700]   rather than just contacting our own laptop?
[00:11:36.700 --> 00:11:40.600]   Are you concerned about sharing your personal information
[00:11:40.600 --> 00:11:43.800]   or the fact that you need to be always alive
[00:11:43.800 --> 00:11:45.500]   for things to work?
[00:11:45.500 --> 00:11:47.980]   Are you concerned about ads
[00:11:47.980 --> 00:11:50.280]   or more generally about having lack of control,
[00:11:50.280 --> 00:11:53.540]   like not being in full control of what you're running?
[00:11:53.540 --> 00:11:57.540]   So if you answered yes to at least one
[00:11:57.540 --> 00:12:02.380]   of these many questions, probably this class is for you.
[00:12:02.380 --> 00:12:05.820]   To lack of control, I would like to add the fact
[00:12:05.820 --> 00:12:07.580]   that just yesterday the news came out
[00:12:07.580 --> 00:12:12.020]   that Elon Musk is offering compute to Anthropic.
[00:12:12.020 --> 00:12:14.180]   And I see these as one of the examples
[00:12:14.180 --> 00:12:18.420]   where the dependencies that you have on the technology
[00:12:18.420 --> 00:12:21.380]   and that one technology has over other technologies
[00:12:21.380 --> 00:12:24.300]   and people in the world and different companies
[00:12:24.300 --> 00:12:27.460]   makes these things very unpredictable in the future.
[00:12:27.460 --> 00:12:29.420]   So you can't really tell whether something
[00:12:29.420 --> 00:12:31.740]   is always gonna be there for you.
[00:12:31.740 --> 00:12:34.940]   And once again, I feel much more comfortable knowing
[00:12:34.940 --> 00:12:38.220]   that at least part of the tasks that I want to accomplish,
[00:12:38.220 --> 00:12:41.140]   I can do them 100% under my control.
[00:12:43.900 --> 00:12:45.980]   Short part about philosophy.
[00:12:45.980 --> 00:12:49.260]   The main reason I introduced that is when I studied AI,
[00:12:49.260 --> 00:12:53.960]   my AI professor preached, David, I preached.
[00:12:53.960 --> 00:12:55.060]   Oh, thank you so much.
[00:12:55.060 --> 00:13:00.960]   Marta Somalvico, the professor who taught me AI first
[00:13:00.960 --> 00:13:04.340]   was so much into the philosophy of artificial intelligence.
[00:13:04.340 --> 00:13:07.940]   And while at the time, again, about 20 years ago,
[00:13:07.940 --> 00:13:12.380]   I was not sure I would have reused those learnings in time.
[00:13:12.380 --> 00:13:15.060]   I realized some of them are things that I find
[00:13:15.060 --> 00:13:16.540]   so useful every day.
[00:13:16.540 --> 00:13:18.660]   I apply them almost on a daily basis.
[00:13:18.660 --> 00:13:21.700]   And I wanted to share two things with you.
[00:13:21.700 --> 00:13:23.860]   The first one is one thing he told us,
[00:13:23.860 --> 00:13:26.880]   he repeated this to us almost every single class
[00:13:26.880 --> 00:13:31.040]   of the course that we had, that is the machine is a place.
[00:13:31.040 --> 00:13:34.580]   And what he meant is it's like a physical place
[00:13:34.580 --> 00:13:39.300]   where you are located and you solve a problem,
[00:13:39.300 --> 00:13:42.720]   you perform a task inside this machine.
[00:13:42.720 --> 00:13:45.300]   For us now, it's probably a metaphor.
[00:13:45.300 --> 00:13:47.460]   At the time of the Mechanical Turk,
[00:13:47.460 --> 00:13:49.260]   it was not such a metaphor.
[00:13:49.260 --> 00:13:53.140]   There was actually a person moving this automaton
[00:13:53.140 --> 00:13:56.060]   and playing chess against other people.
[00:13:56.060 --> 00:13:57.540]   But you can still think,
[00:13:57.540 --> 00:14:00.100]   even if just metaphorically right now,
[00:14:00.100 --> 00:14:04.820]   that when you work, when you create an AI agent,
[00:14:04.820 --> 00:14:06.740]   when you run an AI tool,
[00:14:06.740 --> 00:14:09.340]   there always is, if not you, somebody else,
[00:14:09.340 --> 00:14:11.460]   a person inside the system
[00:14:11.460 --> 00:14:13.820]   that is doing these things for you.
[00:14:13.820 --> 00:14:15.980]   They can have more or less autonomy,
[00:14:15.980 --> 00:14:18.440]   but it's very good to understand this principle
[00:14:18.440 --> 00:14:20.820]   and try to apply it also in your favor.
[00:14:20.820 --> 00:14:24.500]   Like, do you want to be the person inside the machine
[00:14:24.500 --> 00:14:27.880]   or do you want something else or somebody else to be there?
[00:14:27.880 --> 00:14:31.860]   This is the first concept and very related to this,
[00:14:31.860 --> 00:14:35.500]   so being enclosed inside some kind of container,
[00:14:35.500 --> 00:14:38.700]   there is John Searle's Chinese room argument,
[00:14:38.700 --> 00:14:42.600]   1980 at the time, so it was a bit later,
[00:14:42.600 --> 00:14:44.780]   but the main idea is the following.
[00:14:44.780 --> 00:14:46.740]   You have a room.
[00:14:46.740 --> 00:14:48.340]   Inside this room, there's a person
[00:14:48.340 --> 00:14:50.580]   who doesn't know the Chinese language,
[00:14:50.580 --> 00:14:53.360]   but they have an index that kind of maps
[00:14:53.360 --> 00:14:58.360]   every possible question to every possible answer in Chinese.
[00:14:58.360 --> 00:15:00.500]   And if they have a very good index
[00:15:00.500 --> 00:15:04.180]   and if they're very good at finding things in this index,
[00:15:04.180 --> 00:15:06.500]   what happens is that you could have a person
[00:15:06.500 --> 00:15:10.020]   outside of this room sending questions in
[00:15:10.020 --> 00:15:13.380]   and this guy checked the answer very quickly,
[00:15:13.380 --> 00:15:16.100]   providing you the answer as an output
[00:15:16.100 --> 00:15:19.880]   and you not realizing whether whatever's inside this room
[00:15:19.880 --> 00:15:22.740]   actually understands the language or not.
[00:15:22.740 --> 00:15:27.860]   And this can be interpreted in different ways.
[00:15:27.860 --> 00:15:30.980]   This can be used to talk about capabilities of a model.
[00:15:30.980 --> 00:15:32.780]   For instance, I don't really care
[00:15:32.780 --> 00:15:36.180]   if this thing is intelligent, if we have AGI,
[00:15:36.180 --> 00:15:40.140]   as long as it does exactly what I need.
[00:15:40.140 --> 00:15:42.540]   But it is also something that's very interesting
[00:15:42.540 --> 00:15:45.980]   from the point of view on like what you think
[00:15:45.980 --> 00:15:48.820]   about what this machine is capable of,
[00:15:48.820 --> 00:15:53.080]   which means can you be betrayed
[00:15:53.080 --> 00:15:54.960]   by just the answers that you get?
[00:15:54.960 --> 00:15:59.960]   Can you be made to think that the machine is better
[00:15:59.960 --> 00:16:01.820]   than what it is actually?
[00:16:01.820 --> 00:16:04.940]   And so I think even if you work with a black box,
[00:16:04.940 --> 00:16:09.640]   having an attitude of like probing this room
[00:16:09.640 --> 00:16:13.940]   to try and see exactly what the understanding is,
[00:16:13.940 --> 00:16:17.500]   if there's any understanding and what limitations you have
[00:16:17.500 --> 00:16:19.340]   is something that could be very useful.
[00:16:19.340 --> 00:16:23.340]   Otherwise you risk to have an approach like this one,
[00:16:23.340 --> 00:16:28.180]   Ronald Reagan in 1983 watched the premiere
[00:16:28.180 --> 00:16:33.180]   of the "War Games" movie and saw these war operation plan
[00:16:33.180 --> 00:16:36.140]   response, this was the AI of the time,
[00:16:36.140 --> 00:16:38.780]   the one that was playing tic-tac-toe
[00:16:38.780 --> 00:16:41.340]   against Matthew Broderick, if I remember well.
[00:16:41.340 --> 00:16:44.780]   And he was so concerned about the possibility
[00:16:44.780 --> 00:16:47.540]   of something like this happening of the security
[00:16:47.540 --> 00:16:52.540]   of military labs being compromised by teenagers
[00:16:52.540 --> 00:16:57.500]   that he was the first president having some ruling,
[00:16:57.500 --> 00:17:00.020]   having some laws about computer security
[00:17:00.020 --> 00:17:01.940]   in the United States.
[00:17:01.940 --> 00:17:04.660]   So this is also a way for us to interpret
[00:17:04.660 --> 00:17:09.660]   whatever comes out as a new AI tool.
[00:17:09.660 --> 00:17:12.960]   For instance, right now, when we talk about comparing
[00:17:12.960 --> 00:17:17.060]   commercial AI services with open source AI models,
[00:17:17.060 --> 00:17:19.100]   what happens very often is the following,
[00:17:19.100 --> 00:17:22.180]   that is you have an LLM, you try to serve it
[00:17:22.180 --> 00:17:25.700]   with a open source tool, let's say,
[00:17:25.700 --> 00:17:30.700]   or Llama or LM Studio or LlamaCPT or our own Llama file.
[00:17:30.700 --> 00:17:34.420]   And what you try to do is you talk with this LLM,
[00:17:34.420 --> 00:17:37.140]   you open a chat system, you try and ask some questions
[00:17:37.140 --> 00:17:39.500]   and you see how it performs for you.
[00:17:39.500 --> 00:17:40.800]   But then you try and compare it
[00:17:40.800 --> 00:17:42.580]   with a commercial AI service,
[00:17:42.580 --> 00:17:46.580]   which at least according to this description by Anthropic,
[00:17:46.580 --> 00:17:49.680]   this is almost two years old already,
[00:17:49.680 --> 00:17:53.980]   is not limited to just a single language model.
[00:17:53.980 --> 00:17:56.380]   So you've already seen how to train a language model,
[00:17:56.380 --> 00:17:59.420]   but what's in these boxes when you chat with them
[00:17:59.420 --> 00:18:03.000]   as services, as systems that you have available offline.
[00:18:03.000 --> 00:18:05.520]   There's still your input, there's still your output,
[00:18:05.520 --> 00:18:09.980]   but you don't just have an LLM, you have a retrieval,
[00:18:09.980 --> 00:18:13.460]   you have extra tools that are called, you have a memory,
[00:18:13.460 --> 00:18:16.060]   and you have a plethora of extra engineering
[00:18:16.060 --> 00:18:19.820]   that makes it a bit unfair to just compare the AI service
[00:18:19.820 --> 00:18:22.140]   with the single LLM.
[00:18:22.140 --> 00:18:24.620]   So one of the things that we try to do today
[00:18:24.620 --> 00:18:28.500]   is take an open source LLM or any kind of LLM,
[00:18:28.500 --> 00:18:30.800]   because you can actually switch the one you use
[00:18:30.800 --> 00:18:32.680]   with any system you want,
[00:18:32.680 --> 00:18:35.940]   and add to it all the different components
[00:18:35.940 --> 00:18:39.260]   that make it into something that's way more comparable
[00:18:39.260 --> 00:18:40.560]   to a commercial system.
[00:18:40.560 --> 00:18:43.680]   So this is it for my introduction.
[00:18:43.680 --> 00:18:45.300]   Now it's David's turn.
[00:18:45.300 --> 00:18:48.060]   I'm gonna stop sharing and leave the work to him.
[00:18:48.060 --> 00:18:49.880]   - Thank you.
[00:18:49.880 --> 00:18:50.720]   - Okay.
[00:18:50.720 --> 00:19:04.600]   Can you see the screen?
[00:19:04.600 --> 00:19:09.080]   Yes, all right.
[00:19:09.080 --> 00:19:14.080]   So the idea is we want to show
[00:19:14.080 --> 00:19:16.320]   how you can build your own agent.
[00:19:18.040 --> 00:19:21.800]   Hopefully you should leave this talk
[00:19:21.800 --> 00:19:24.400]   with the impression that it's not that complicated.
[00:19:24.400 --> 00:19:28.440]   All these different tools that you use,
[00:19:28.440 --> 00:19:32.120]   you understand better what they are doing under the hoods.
[00:19:32.120 --> 00:19:36.560]   So the idea that I want to share
[00:19:36.560 --> 00:19:38.760]   is that every agent that you use
[00:19:38.760 --> 00:19:43.760]   is based on around five core components, five things.
[00:19:43.760 --> 00:19:47.860]   Even so, if from the outside, they look really different,
[00:19:47.860 --> 00:19:50.880]   they are all based around these core components.
[00:19:50.880 --> 00:19:55.700]   Where I come from trying to share this
[00:19:55.700 --> 00:19:58.080]   is we have been working in Mozilla AI
[00:19:58.080 --> 00:19:59.920]   on a couple of projects.
[00:19:59.920 --> 00:20:01.660]   One of them was AnyAgent.
[00:20:01.660 --> 00:20:04.140]   So trying to build a single interface
[00:20:04.140 --> 00:20:05.720]   for different agent frameworks
[00:20:05.720 --> 00:20:07.980]   that were popping up last year.
[00:20:07.980 --> 00:20:11.000]   There was like OpenAI agent framework.
[00:20:11.000 --> 00:20:14.020]   There was the Google agent SDK,
[00:20:14.020 --> 00:20:18.600]   LanChain, Ahno, Llama Index, so a lot of frameworks,
[00:20:18.600 --> 00:20:20.080]   agentic frameworks were coming up,
[00:20:20.080 --> 00:20:22.240]   open-source agentic frameworks.
[00:20:22.240 --> 00:20:24.740]   And we just tried to build an interface on top
[00:20:24.740 --> 00:20:26.880]   to discover what was common across them
[00:20:26.880 --> 00:20:28.340]   and what was different.
[00:20:28.340 --> 00:20:32.080]   Turns out that there was not much difference between them
[00:20:32.080 --> 00:20:36.200]   to the point that we built a very simple
[00:20:36.200 --> 00:20:38.780]   Python implementation that was called TinyAgent,
[00:20:38.780 --> 00:20:42.680]   based on an idea by the Haging phase.
[00:20:43.840 --> 00:20:47.640]   CTO, I think, he made a side project one weekend
[00:20:47.640 --> 00:20:49.800]   that was like a TinyAgent, a simple agent
[00:20:49.800 --> 00:20:54.800]   that was just in TypeScript and just use MCP to call tools.
[00:20:54.800 --> 00:20:58.560]   And it was like 400 lines of TypeScript.
[00:20:58.560 --> 00:21:01.440]   So we took that idea, we implemented it in Python,
[00:21:01.440 --> 00:21:04.560]   and then we started experimenting ourselves in our products.
[00:21:04.560 --> 00:21:07.760]   And it turned out that something as simple
[00:21:07.760 --> 00:21:12.300]   as 400 lines of Python was more than enough
[00:21:12.300 --> 00:21:14.560]   to build an agentic system around it
[00:21:14.560 --> 00:21:16.680]   if you have the right components.
[00:21:16.680 --> 00:21:18.860]   Then another project that I did recently
[00:21:18.860 --> 00:21:22.640]   was like porting this idea to C++.
[00:21:22.640 --> 00:21:27.640]   This is AgentCPP, it's an even simpler agent loop
[00:21:27.640 --> 00:21:29.120]   with core components,
[00:21:29.120 --> 00:21:31.680]   and we will use it later for the example.
[00:21:31.680 --> 00:21:35.840]   So the thesis is that an agent is a loop.
[00:21:35.840 --> 00:21:38.560]   This is the most straightforward implementation
[00:21:38.560 --> 00:21:39.660]   that I can think of.
[00:21:41.780 --> 00:21:44.800]   You have a first phase where you are sending inputs
[00:21:44.800 --> 00:21:46.800]   to the model.
[00:21:46.800 --> 00:21:51.400]   This model is a large language model or a language model.
[00:21:51.400 --> 00:21:54.200]   It can be small if you run it locally,
[00:21:54.200 --> 00:21:55.320]   but it works the same.
[00:21:55.320 --> 00:21:59.840]   You send it a text or potentially other type of inputs,
[00:21:59.840 --> 00:22:01.320]   and it generates text.
[00:22:01.320 --> 00:22:03.920]   Based on the response,
[00:22:03.920 --> 00:22:07.600]   you check whether the agent wants to execute
[00:22:07.600 --> 00:22:10.480]   any of the tools, and then you need to take control.
[00:22:10.480 --> 00:22:13.080]   So your program needs to takes control,
[00:22:13.080 --> 00:22:15.820]   execute the tools, and put the results back
[00:22:15.820 --> 00:22:18.160]   into the input for the model.
[00:22:18.160 --> 00:22:22.120]   You just repeat this loop until the model decides
[00:22:22.120 --> 00:22:26.160]   to not call any tools, so you consider the loop broken.
[00:22:26.160 --> 00:22:28.560]   And this is when it stops.
[00:22:28.560 --> 00:22:29.700]   And this is very simple,
[00:22:29.700 --> 00:22:31.320]   but this is actually what's happening
[00:22:31.320 --> 00:22:32.820]   when you're using cloud code,
[00:22:32.820 --> 00:22:37.060]   when you're using any of the agents
[00:22:37.060 --> 00:22:40.440]   that you might be using for coding or other tasks.
[00:22:40.440 --> 00:22:44.320]   So the five components are basically the model
[00:22:44.320 --> 00:22:45.880]   that we already mentioned.
[00:22:45.880 --> 00:22:47.160]   This is the brain.
[00:22:47.160 --> 00:22:51.400]   It takes the inputs and it outputs other text
[00:22:51.400 --> 00:22:54.380]   or responses or potentially tool calls.
[00:22:54.380 --> 00:22:55.920]   Then there are the tools,
[00:22:55.920 --> 00:22:59.680]   which is actually the functions that you expose the agent
[00:22:59.680 --> 00:23:01.880]   so it can perform actions.
[00:23:01.880 --> 00:23:04.440]   So depending on what tools you give the agent,
[00:23:04.440 --> 00:23:07.120]   the agent might be able to do different things.
[00:23:07.120 --> 00:23:10.640]   The instructions that are what defines
[00:23:10.640 --> 00:23:14.160]   how the agent should behave, how the tools should be used.
[00:23:14.160 --> 00:23:18.280]   You can complicate each of these components
[00:23:18.280 --> 00:23:21.040]   as much as you want, as we will see,
[00:23:21.040 --> 00:23:24.160]   but at the core, they are the same.
[00:23:24.160 --> 00:23:28.380]   They are just part of the input that you give the agent
[00:23:28.380 --> 00:23:32.080]   that takes hopefully higher priority
[00:23:32.080 --> 00:23:33.520]   than the rest of the messages,
[00:23:33.520 --> 00:23:37.560]   because it's defining how the agent should behave.
[00:23:37.560 --> 00:23:40.320]   Then there are callbacks, also called hooks,
[00:23:40.320 --> 00:23:45.320]   which are like, it's a way of inject deterministic code
[00:23:45.320 --> 00:23:49.080]   at different stages of the loop.
[00:23:49.080 --> 00:23:51.560]   We will see later what are these different stages,
[00:23:51.560 --> 00:23:56.120]   but this is just a way of like have a intervention
[00:23:56.120 --> 00:23:59.120]   in a deterministic way at different stages of the loop.
[00:23:59.120 --> 00:24:02.380]   And finally, there is the loop itself.
[00:24:03.380 --> 00:24:07.700]   So a model is like anything that implements a method
[00:24:07.700 --> 00:24:09.860]   that receives an input,
[00:24:09.860 --> 00:24:12.620]   which is usually formatted as a list of messages.
[00:24:12.620 --> 00:24:16.960]   That is like a user message, the assistant message,
[00:24:16.960 --> 00:24:20.220]   which is the response from the model, the tool results.
[00:24:20.220 --> 00:24:22.800]   So all of these are usually just a list of things
[00:24:22.800 --> 00:24:24.720]   that you give to the model.
[00:24:24.720 --> 00:24:27.100]   The model then needs to take that
[00:24:27.100 --> 00:24:29.220]   and convert it into tokens.
[00:24:29.220 --> 00:24:31.360]   Each model does this differently.
[00:24:31.360 --> 00:24:33.520]   And it doesn't matter whether you're running the model
[00:24:33.520 --> 00:24:36.660]   locally or whether you're running it through an API,
[00:24:36.660 --> 00:24:41.180]   like if you are using the OpenAI, Anthropic or Gemini API,
[00:24:41.180 --> 00:24:43.560]   it's, you can wrap it with the same interface,
[00:24:43.560 --> 00:24:44.840]   which is the same.
[00:24:44.840 --> 00:24:49.840]   You receive a list of messages and a list of tools available
[00:24:49.840 --> 00:24:54.300]   and you give that to the core language model
[00:24:54.300 --> 00:24:56.540]   and the model decides what to do with those inputs,
[00:24:56.540 --> 00:24:58.080]   what to generate from that.
[00:24:58.800 --> 00:25:03.800]   Then the tools themselves basically need to have three things.
[00:25:03.800 --> 00:25:07.100]   One is the identifier.
[00:25:07.100 --> 00:25:11.360]   So when the agent says, when the model responds saying,
[00:25:11.360 --> 00:25:14.000]   I want to call the tool, get weather,
[00:25:14.000 --> 00:25:17.200]   you need to have a way to identify that this is the tool
[00:25:17.200 --> 00:25:19.160]   that the agent is calling.
[00:25:19.160 --> 00:25:21.680]   Then you need a description and a schema.
[00:25:21.680 --> 00:25:24.280]   So this is something that you provide the agent,
[00:25:24.280 --> 00:25:28.000]   the model to, in order for the model to understand
[00:25:28.000 --> 00:25:30.360]   how to use the tool, what the tool can do
[00:25:30.360 --> 00:25:33.280]   and what are the arguments, what are the expected outputs.
[00:25:33.280 --> 00:25:35.680]   This is something that you can consider
[00:25:35.680 --> 00:25:37.500]   part of the instructions.
[00:25:37.500 --> 00:25:41.060]   And they basically guide the agent on deciding
[00:25:41.060 --> 00:25:43.820]   when to use the tool and how to use this tool.
[00:25:43.820 --> 00:25:46.040]   And finally, you need to actually have some code
[00:25:46.040 --> 00:25:48.480]   that implements the tool itself.
[00:25:48.480 --> 00:25:52.940]   So the agent says, get weather for Paris.
[00:25:52.940 --> 00:25:55.600]   And you actually need some code somewhere
[00:25:55.600 --> 00:25:58.980]   that takes that function call or tool call
[00:25:58.980 --> 00:26:03.580]   and actually executes, find the results
[00:26:03.580 --> 00:26:05.880]   and returns it to the model.
[00:26:05.880 --> 00:26:09.600]   And usually, not usually, almost always,
[00:26:09.600 --> 00:26:12.680]   this code lives outside the model.
[00:26:12.680 --> 00:26:14.440]   So it's important to know that the model
[00:26:14.440 --> 00:26:17.000]   can directly code your code.
[00:26:17.000 --> 00:26:19.760]   How this usually works is that the model
[00:26:19.760 --> 00:26:22.280]   emits an special JSON format.
[00:26:22.280 --> 00:26:24.640]   Each model has their own format.
[00:26:24.640 --> 00:26:26.400]   Unfortunately, they don't agree.
[00:26:26.400 --> 00:26:29.080]   They don't use the same chat template.
[00:26:29.080 --> 00:26:32.560]   But it's basically a JSON where the model says the tool
[00:26:32.560 --> 00:26:36.000]   identifier, the arguments, and then your code
[00:26:36.000 --> 00:26:41.040]   needs to take care of receiving that, applying the function,
[00:26:41.040 --> 00:26:43.800]   calling the function, and then giving the result back
[00:26:43.800 --> 00:26:45.200]   to the model.
[00:26:45.200 --> 00:26:46.780]   So that's tools.
[00:26:46.780 --> 00:26:49.960]   And then we have instructions.
[00:26:49.960 --> 00:26:51.880]   You can consider kind of the simplest one
[00:26:51.880 --> 00:26:56.640]   because it's usually what people refer to as system prompt.
[00:26:56.640 --> 00:26:58.120]   I like instructions.
[00:26:58.120 --> 00:27:00.360]   It's a simple concept to me.
[00:27:00.360 --> 00:27:04.920]   It's just you tell the agent how it should behave,
[00:27:04.920 --> 00:27:06.920]   how it should use the tools.
[00:27:06.920 --> 00:27:10.700]   And it's usually the very first message
[00:27:10.700 --> 00:27:14.840]   that the agent receives before any other user messages.
[00:27:14.840 --> 00:27:17.560]   And this is a very simple primitive.
[00:27:17.560 --> 00:27:21.120]   But you can complicate it as much as you want
[00:27:21.120 --> 00:27:27.120]   because there are new paradigms like the skills,
[00:27:27.120 --> 00:27:29.040]   just these things that keep coming up.
[00:27:29.040 --> 00:27:31.320]   But essentially, they are just instructions.
[00:27:31.320 --> 00:27:35.280]   They are just trying to find smarter ways of loading
[00:27:35.280 --> 00:27:39.200]   these instructions into the agent, into the model context,
[00:27:39.200 --> 00:27:40.800]   without overwhelming it.
[00:27:40.800 --> 00:27:43.160]   But at the end of the day, they are instructions.
[00:27:43.160 --> 00:27:45.960]   There are something-- there are strings
[00:27:45.960 --> 00:27:51.100]   that help the model decide what to do and how to do it.
[00:27:51.100 --> 00:27:56.560]   So callbacks-- interesting part of what makes a good agent
[00:27:56.560 --> 00:27:59.320]   loop or not.
[00:27:59.320 --> 00:28:02.680]   This is where-- so usually in the loop,
[00:28:02.680 --> 00:28:07.080]   it's up to the model to decide when to stop
[00:28:07.080 --> 00:28:09.560]   and when to call a tool.
[00:28:09.560 --> 00:28:12.820]   But because you own the loop in code,
[00:28:12.820 --> 00:28:16.320]   you can actually inject callbacks or hooks
[00:28:16.320 --> 00:28:18.480]   at different points of the loop.
[00:28:18.480 --> 00:28:22.740]   So you can execute code based on certain conditions.
[00:28:22.740 --> 00:28:24.580]   So you can always execute some code
[00:28:24.580 --> 00:28:28.800]   before the agent loop starts, before the model gets actually
[00:28:28.800 --> 00:28:32.020]   called, after the model gets actually called,
[00:28:32.020 --> 00:28:35.220]   before a tool gets executed, after a tool gets executed,
[00:28:35.220 --> 00:28:40.060]   and finally, after the loops is complete.
[00:28:40.060 --> 00:28:41.220]   Sorry.
[00:28:41.220 --> 00:28:43.800]   The important part of this is that you
[00:28:43.800 --> 00:28:49.600]   should be able to mutate, alterate the inputs that
[00:28:49.600 --> 00:28:56.940]   get flowing that go to the model on the next iteration.
[00:28:56.940 --> 00:28:59.640]   So later, I'm going to show a couple of examples of what you
[00:28:59.640 --> 00:29:02.280]   can do with callbacks like here.
[00:29:02.280 --> 00:29:06.080]   So with this basic primitive-- and this is just six points
[00:29:06.080 --> 00:29:07.280]   where you can put--
[00:29:07.280 --> 00:29:09.440]   you can implement basic logging.
[00:29:09.440 --> 00:29:12.240]   So just logging what the agent is doing.
[00:29:12.240 --> 00:29:15.880]   You can implement more elaborated telemetry tracing
[00:29:15.880 --> 00:29:17.200]   with open telemetry.
[00:29:17.200 --> 00:29:19.640]   For this, you just need to inject a callback
[00:29:19.640 --> 00:29:23.280]   before and after the tool calls and the model calls
[00:29:23.280 --> 00:29:26.720]   and just send this to a server in a specific format.
[00:29:26.720 --> 00:29:29.000]   You can do context engineering.
[00:29:29.000 --> 00:29:32.520]   You can have some code that checks
[00:29:32.520 --> 00:29:34.880]   if you are excelling a number of tokens
[00:29:34.880 --> 00:29:38.080]   or if the history is getting too long.
[00:29:38.080 --> 00:29:40.200]   And you can summarize it and replace
[00:29:40.200 --> 00:29:41.920]   the history with your summary.
[00:29:41.920 --> 00:29:46.200]   You can do this with callbacks.
[00:29:46.200 --> 00:29:47.920]   You can implement gas rails.
[00:29:47.920 --> 00:29:51.440]   So for example, checking before any tool call
[00:29:51.440 --> 00:29:54.880]   if the agent is trying to do something suspicious, something
[00:29:54.880 --> 00:29:57.040]   that could be dangerous.
[00:29:57.040 --> 00:29:59.480]   And you can do a deterministic check
[00:29:59.480 --> 00:30:02.520]   and prevent the agent from executing that tool.
[00:30:02.520 --> 00:30:05.800]   You can put human in the loop to approve tool calls
[00:30:05.800 --> 00:30:08.000]   to edit the tool arguments.
[00:30:08.000 --> 00:30:12.640]   And you can also decide how to recover from a failure
[00:30:12.640 --> 00:30:14.280]   after a tool execution.
[00:30:14.280 --> 00:30:17.840]   So all those things, you can do that with callbacks.
[00:30:17.840 --> 00:30:19.520]   Finally, we have the loop.
[00:30:19.520 --> 00:30:21.520]   I have here a pseudocode.
[00:30:21.520 --> 00:30:23.480]   But if you write it in Python, it's
[00:30:23.480 --> 00:30:26.320]   going to look very much similar to this.
[00:30:26.320 --> 00:30:30.800]   So it's exactly the same loop as I described before,
[00:30:30.800 --> 00:30:34.440]   but now kind of like trying to make explicit
[00:30:34.440 --> 00:30:39.120]   at which points of the loop each callback should be called
[00:30:39.120 --> 00:30:42.320]   and how you should stop the loop if there are no tool calls.
[00:30:42.320 --> 00:30:45.760]   And if not, you just execute all the tool calls,
[00:30:45.760 --> 00:30:49.640]   append the results, and go back to the beginning.
[00:30:49.640 --> 00:30:51.440]   So that's the loop.
[00:30:51.440 --> 00:30:56.800]   And the loop can get a little bit more complicated,
[00:30:56.800 --> 00:31:03.120]   like if today you use copilot or cloth code.
[00:31:03.120 --> 00:31:06.440]   It's all still using these basic components,
[00:31:06.440 --> 00:31:09.720]   but there are different ways that you
[00:31:09.720 --> 00:31:13.200]   can alter the default loop.
[00:31:13.200 --> 00:31:16.960]   For example, before a tool gets executed,
[00:31:16.960 --> 00:31:20.120]   the users can approve or reject that.
[00:31:20.120 --> 00:31:23.240]   And based on how you implement your loop,
[00:31:23.240 --> 00:31:25.760]   you can decide to completely stop the loop,
[00:31:25.760 --> 00:31:30.000]   or you can decide to continue and just skip this tool call.
[00:31:30.000 --> 00:31:33.520]   Also, when a tool execution fails,
[00:31:33.520 --> 00:31:37.560]   either because the agent passed a wrong argument
[00:31:37.560 --> 00:31:40.000]   or just there is another error, you
[00:31:40.000 --> 00:31:42.560]   can also decide I want to stop the loop,
[00:31:42.560 --> 00:31:45.680]   or maybe I want to give the agent the opportunity
[00:31:45.680 --> 00:31:46.600]   to recover.
[00:31:46.600 --> 00:31:50.520]   So I just inject the error as a message,
[00:31:50.520 --> 00:31:52.680]   and I just initiate a new iteration.
[00:31:52.680 --> 00:31:55.520]   So hopefully, the agent in the same loop
[00:31:55.520 --> 00:32:01.080]   can read the error and recover itself from it.
[00:32:01.080 --> 00:32:08.600]   And this is something that each of the agentic frameworks that
[00:32:08.600 --> 00:32:11.800]   are there try to do differently.
[00:32:11.800 --> 00:32:14.680]   I recommend that you try to build your own agent loop
[00:32:14.680 --> 00:32:18.520]   and play yourself with this.
[00:32:18.520 --> 00:32:20.360]   It cannot be a better solution than the one
[00:32:20.360 --> 00:32:22.640]   that you built specifically for yourself.
[00:32:22.640 --> 00:32:26.840]   So this is an example simplified from agent CPP.
[00:32:26.840 --> 00:32:29.080]   There are a bunch of examples there.
[00:32:29.080 --> 00:32:34.400]   And the code, even if it's C++, it should not be that hard.
[00:32:34.400 --> 00:32:38.120]   But you can directly map these five components very clearly.
[00:32:38.120 --> 00:32:40.720]   You can see what are instructions, what is a model,
[00:32:40.720 --> 00:32:42.960]   what are tools, and what are callbacks,
[00:32:42.960 --> 00:32:47.160]   and how the loop works.
[00:32:47.160 --> 00:32:50.640]   And you can do that exercise to implement
[00:32:50.640 --> 00:32:54.160]   you can do that exercise to many different open source
[00:32:54.160 --> 00:32:56.120]   projects that are out there.
[00:32:56.120 --> 00:33:02.400]   You can go there and try to find these primitives in their loops.
[00:33:02.400 --> 00:33:07.280]   So even if you check Watson Agents, a project by Davide,
[00:33:07.280 --> 00:33:09.680]   which is a very simple loop, it still
[00:33:09.680 --> 00:33:15.880]   has the same core components as Pi, Hermes, OpenClaw,
[00:33:15.880 --> 00:33:17.280]   and Cloud Code.
[00:33:17.280 --> 00:33:20.000]   That was open source only by accident recently.
[00:33:20.000 --> 00:33:22.960]   But you can verify that it's still the same.
[00:33:22.960 --> 00:33:26.120]   Still the five core components, they all look the same.
[00:33:26.120 --> 00:33:29.480]   It just depends how many callbacks they produce,
[00:33:29.480 --> 00:33:32.960]   they introduce by default, how many tools they expose,
[00:33:32.960 --> 00:33:34.440]   how those tools work.
[00:33:34.440 --> 00:33:38.000]   But the core components remain the same.
[00:33:38.000 --> 00:33:43.640]   And now let me try to share a terminal for a demo.
[00:33:43.640 --> 00:33:49.000]   You can find the demo in the agent CPP repo.
[00:33:49.000 --> 00:33:53.320]   I'm going to try to run this one very quickly on my machine
[00:33:53.320 --> 00:33:56.760]   using a local model just to showcase
[00:33:56.760 --> 00:33:58.920]   how simple you can start.
[00:33:58.920 --> 00:34:02.240]   But from that very simple foundations,
[00:34:02.240 --> 00:34:06.640]   you can evolve to a much larger and complex projects.
[00:34:06.640 --> 00:34:08.760]   So this is a memory example.
[00:34:08.760 --> 00:34:12.440]   So a lot of these agentic frameworks
[00:34:12.440 --> 00:34:14.400]   introduce the concept of memory.
[00:34:14.400 --> 00:34:17.960]   In practice, memory can be implemented as a tool,
[00:34:17.960 --> 00:34:20.600]   as a callback, or as both.
[00:34:20.600 --> 00:34:23.320]   In this example, they are just three tools.
[00:34:23.320 --> 00:34:25.640]   So we have some very basic instructions
[00:34:25.640 --> 00:34:29.320]   where we tell the agent or the assistant
[00:34:29.320 --> 00:34:32.440]   that there is some memory available about the user
[00:34:32.440 --> 00:34:34.640]   and how it can be used.
[00:34:34.640 --> 00:34:38.880]   And we expose to the agent three very simple tools.
[00:34:38.880 --> 00:34:40.320]   They can list memories.
[00:34:40.320 --> 00:34:44.040]   So check what is currently available in the memory.
[00:34:44.040 --> 00:34:47.440]   They can read the memory, and they can write to the memory.
[00:34:47.440 --> 00:34:51.720]   So those are three tools, instructions, no callbacks,
[00:34:51.720 --> 00:34:52.920]   just the loop.
[00:34:52.920 --> 00:34:56.080]   And I'm going to try to share now my terminal.
[00:34:56.080 --> 00:35:08.760]   So I am here at the examples memory inside the agent CPP
[00:35:08.760 --> 00:35:09.600]   repo.
[00:35:09.600 --> 00:35:12.760]   I have already compiled this example.
[00:35:12.760 --> 00:35:15.160]   And if I run it, I am just loading
[00:35:15.160 --> 00:35:20.040]   a small open-waste model, Granite 4.0.
[00:35:20.040 --> 00:35:21.960]   This is already a couple of months old,
[00:35:21.960 --> 00:35:23.800]   so it's not the best model out there.
[00:35:23.800 --> 00:35:26.760]   But it runs really fast on my laptop.
[00:35:26.760 --> 00:35:29.440]   So the memory agent is ready.
[00:35:29.440 --> 00:35:34.800]   I can tell it to just some fact about me.
[00:35:34.800 --> 00:35:36.640]   So first, I just say hi.
[00:35:36.640 --> 00:35:39.960]   And you can see here we are logging that immediately
[00:35:39.960 --> 00:35:43.120]   the agent is checking if there is something
[00:35:43.120 --> 00:35:45.000]   important in the memory.
[00:35:45.000 --> 00:35:49.240]   Using the list memory, there is nothing in the memory
[00:35:49.240 --> 00:35:51.000]   right now.
[00:35:51.000 --> 00:35:53.440]   You can see the tool result here.
[00:35:53.440 --> 00:35:56.360]   And the agent just responds hello.
[00:35:56.360 --> 00:35:58.000]   It looks like I don't have any memories.
[00:35:58.000 --> 00:36:00.000]   How can I assist you today?
[00:36:00.000 --> 00:36:05.600]   And I can say I really love Galician cuisine,
[00:36:05.600 --> 00:36:08.400]   for example, which is where I am from.
[00:36:08.400 --> 00:36:10.680]   And you can see the agent takes this information.
[00:36:10.680 --> 00:36:16.080]   And in the loop, it says, OK, let's call the write memory
[00:36:16.080 --> 00:36:20.600]   tool with the arguments fabric cuisine and Galician.
[00:36:20.600 --> 00:36:22.800]   So I am storing that information.
[00:36:22.800 --> 00:36:25.520]   And then the agent responds and breaks the loop.
[00:36:25.520 --> 00:36:28.280]   So each time I can write, it's because the agent
[00:36:28.280 --> 00:36:30.880]   broke the loop.
[00:36:30.880 --> 00:36:34.800]   And the thing about memory is that the idea
[00:36:34.800 --> 00:36:39.760]   is that I am storing this in an external source.
[00:36:39.760 --> 00:36:42.800]   In this case, it's just a very simple JSON, right?
[00:36:42.800 --> 00:36:47.000]   But you can use a database or an external server or whatever.
[00:36:47.000 --> 00:36:51.000]   But now if I go ahead and start a new conversation,
[00:36:51.000 --> 00:36:55.120]   so a complete different chat, I come back here tomorrow.
[00:36:55.120 --> 00:36:58.400]   And I am like, hi, what do you know about me?
[00:36:58.400 --> 00:37:03.880]   So the agent called the list memory.
[00:37:03.880 --> 00:37:15.640]   And I think the model is a little bit not super smart
[00:37:15.640 --> 00:37:20.800]   because it should have decided to read the memory.
[00:37:20.800 --> 00:37:24.080]   But now I give it a hint.
[00:37:24.080 --> 00:37:27.920]   So first, it check if there was available memories.
[00:37:27.920 --> 00:37:30.680]   They were available memories, but it didn't really
[00:37:30.680 --> 00:37:33.560]   understood that it can use another tool to actually read
[00:37:33.560 --> 00:37:34.640]   the memory.
[00:37:34.640 --> 00:37:37.200]   Now it did because I give it a hint.
[00:37:37.200 --> 00:37:42.160]   And it can retrieve that my favorite cuisine is Galician.
[00:37:42.160 --> 00:37:45.880]   So this is a very basic example.
[00:37:45.880 --> 00:37:49.320]   You can check the code here.
[00:37:49.320 --> 00:37:51.040]   And it's not super crazy.
[00:37:51.040 --> 00:37:52.280]   It's just the fine tools.
[00:37:52.280 --> 00:37:55.800]   And maybe it looks a little bit scary because it's C++.
[00:37:55.800 --> 00:38:02.760]   But you should be able to map and identify the five core
[00:38:02.760 --> 00:38:05.280]   components that I described before.
[00:38:05.280 --> 00:38:09.760]   And what I like about building basic examples
[00:38:09.760 --> 00:38:13.920]   with these core components is that then these more complex
[00:38:13.920 --> 00:38:17.160]   tools or agents that you are using, like cloud code
[00:38:17.160 --> 00:38:20.400]   or whatever, you can then start to understand
[00:38:20.400 --> 00:38:24.680]   how the memory feature is working on cloud code.
[00:38:24.680 --> 00:38:26.440]   How does it work?
[00:38:26.440 --> 00:38:29.200]   You can start thinking how you could improve this memory
[00:38:29.200 --> 00:38:30.360]   implementation.
[00:38:30.360 --> 00:38:33.360]   Like right now, it's all based on tools.
[00:38:33.360 --> 00:38:35.400]   But you could maybe have a callback
[00:38:35.400 --> 00:38:38.440]   that before the agent loop starts,
[00:38:38.440 --> 00:38:42.080]   it just automatically reads whatever memory is available
[00:38:42.080 --> 00:38:45.040]   and injects it into the context.
[00:38:45.040 --> 00:38:49.440]   So you can start exploring how to build your own agent loop
[00:38:49.440 --> 00:38:54.040]   the way that works the best for you, always around those five
[00:38:54.040 --> 00:38:55.840]   components.
[00:38:55.840 --> 00:39:03.800]   And with that, I think it was the final slide from my side.
[00:39:03.800 --> 00:39:05.840]   And I can go back to David.
[00:39:05.840 --> 00:39:10.960]   Sure I am.
[00:39:10.960 --> 00:39:11.920]   I didn't go away.
[00:39:11.920 --> 00:39:13.920]   Yeah, thanks David.
[00:39:13.920 --> 00:39:15.560]   First of all, I want to ask you all
[00:39:15.560 --> 00:39:18.360]   if you have any specific questions to this.
[00:39:18.360 --> 00:39:20.040]   Otherwise, we can go forward.
[00:39:20.040 --> 00:39:23.160]   In the meantime, you can add them anytime on Slack
[00:39:23.160 --> 00:39:24.120]   if you want.
[00:39:24.120 --> 00:39:27.520]   And David can answer them to you while I continue.
[00:39:27.520 --> 00:39:29.080]   Don't worry about this.
[00:39:29.080 --> 00:39:30.960]   I'm going to share my screen again
[00:39:30.960 --> 00:39:36.600]   and go to the next stage here.
[00:39:36.600 --> 00:39:37.640]   Know your tools.
[00:39:37.640 --> 00:39:43.160]   So we have seen one example, one example which was in CPP
[00:39:43.160 --> 00:39:46.960]   and relied on agent CPP, the tool that David built.
[00:39:46.960 --> 00:39:50.840]   He also referred to another tool that
[00:39:50.840 --> 00:39:53.760]   was called Wasm browser agents.
[00:39:53.760 --> 00:39:57.040]   And I'm going to let you know a little more about that.
[00:39:57.040 --> 00:39:59.560]   But the thing I want you to stress the most
[00:39:59.560 --> 00:40:01.520]   is one thing that David said before,
[00:40:01.520 --> 00:40:04.960]   which I believe is super important, which is it's not
[00:40:04.960 --> 00:40:08.200]   necessarily important to know how to solve these
[00:40:08.200 --> 00:40:10.480]   in one language or another.
[00:40:10.480 --> 00:40:13.360]   I think especially now that you can
[00:40:13.360 --> 00:40:15.760]   have Cloud Code choose one language or another
[00:40:15.760 --> 00:40:18.440]   and help you with whatever you're doing right now.
[00:40:18.440 --> 00:40:20.000]   I think the most important thing is
[00:40:20.000 --> 00:40:22.320]   to know what are the bricks that allow
[00:40:22.320 --> 00:40:25.800]   you to build these components.
[00:40:25.800 --> 00:40:30.000]   So I'm going to move from one tool to the other,
[00:40:30.000 --> 00:40:32.040]   from one language to another.
[00:40:32.040 --> 00:40:34.200]   I hope this is not too confusing for you.
[00:40:34.200 --> 00:40:36.480]   Please stop me anytime if you think there's
[00:40:36.480 --> 00:40:38.880]   something that is not clear.
[00:40:38.880 --> 00:40:41.000]   But I'm going to show you something that's been written
[00:40:41.000 --> 00:40:44.360]   in Python that has some JavaScript that
[00:40:44.360 --> 00:40:46.200]   runs in browsers.
[00:40:46.200 --> 00:40:48.280]   I'm also going to show you other code that
[00:40:48.280 --> 00:40:50.560]   runs directly on the terminal.
[00:40:50.560 --> 00:40:52.640]   And I'm going to show you one of the tools
[00:40:52.640 --> 00:40:56.240]   that we referred to before that was Py, which is basically
[00:40:56.240 --> 00:40:59.080]   an agentic software that just works
[00:40:59.080 --> 00:41:01.760]   with extensions and plugins.
[00:41:01.760 --> 00:41:05.480]   All of these to try and find the same components
[00:41:05.480 --> 00:41:09.200]   that David has talked about before.
[00:41:09.200 --> 00:41:14.480]   So the first thing, even before knowing the tools per se,
[00:41:14.480 --> 00:41:19.920]   is getting an experience with one agent that runs.
[00:41:19.920 --> 00:41:25.480]   And this one is called Wasma Agents.
[00:41:25.480 --> 00:41:27.200]   It's a blueprint by Mozilla.
[00:41:27.200 --> 00:41:29.160]   The URL is over here.
[00:41:29.160 --> 00:41:34.560]   Let me just copy that and put it in here in Slack.
[00:41:34.560 --> 00:41:40.560]   So the reason why I built this tool in the first place,
[00:41:40.560 --> 00:41:43.320]   it was about one year ago, or one year and a couple of months
[00:41:43.320 --> 00:41:44.000]   ago, I guess.
[00:41:44.000 --> 00:41:47.280]   And the idea was I would like to have something
[00:41:47.280 --> 00:41:50.240]   that runs everywhere.
[00:41:50.240 --> 00:41:52.080]   And by everywhere, it means everywhere there
[00:41:52.080 --> 00:41:54.560]   is a browser that supports it.
[00:41:54.560 --> 00:42:00.680]   And that relies on JavaScript and Wasma WebAssembly
[00:42:00.680 --> 00:42:03.080]   to run things without having the need
[00:42:03.080 --> 00:42:05.520]   to install anything specific.
[00:42:05.520 --> 00:42:07.000]   So at the time, I kind of failed,
[00:42:07.000 --> 00:42:11.080]   meaning there was no good support for models directly
[00:42:11.080 --> 00:42:13.160]   running in the web browser.
[00:42:13.160 --> 00:42:18.280]   But if you go and look for WebGPU on Hiking Face,
[00:42:18.280 --> 00:42:22.000]   you're going to find some excellent examples of Wasm
[00:42:22.000 --> 00:42:26.520]   and WebGPU used together to run models locally
[00:42:26.520 --> 00:42:28.240]   inside your browser without having
[00:42:28.240 --> 00:42:29.720]   to install any application.
[00:42:29.720 --> 00:42:32.520]   And I don't recall the link right now.
[00:42:32.520 --> 00:42:34.760]   I'm going to check for it later and send that
[00:42:34.760 --> 00:42:36.440]   to you in the Slack channel.
[00:42:36.440 --> 00:42:38.920]   I think it's going to be useful for you.
[00:42:38.920 --> 00:42:41.360]   So this is super simple code.
[00:42:41.360 --> 00:42:44.880]   There are many demos source files.
[00:42:44.880 --> 00:42:46.720]   This one is called local model.
[00:42:46.720 --> 00:42:51.200]   And you're going to find it in the Blueprint repository.
[00:42:51.200 --> 00:42:53.280]   And it has a few tools.
[00:42:53.280 --> 00:42:56.000]   One of them is count character or currencies,
[00:42:56.000 --> 00:43:00.000]   which is a super simple few lines Python code that
[00:43:00.000 --> 00:43:03.080]   counts the currencies of a given character inside the word.
[00:43:03.080 --> 00:43:08.840]   And if you have ever read about how many Rs are in strawberry,
[00:43:08.840 --> 00:43:12.880]   you know exactly why this thing has been built for.
[00:43:12.880 --> 00:43:16.880]   The second one is a tool that is called Visit Webpage.
[00:43:16.880 --> 00:43:21.080]   And the idea is it allows you to basically visit the web page,
[00:43:21.080 --> 00:43:24.520]   download it as Markdown, converting it to Markdown,
[00:43:24.520 --> 00:43:27.160]   and feeding it to the LLM.
[00:43:27.160 --> 00:43:30.680]   The final one relies on a service called Tabili.
[00:43:30.680 --> 00:43:32.480]   It's a paid API.
[00:43:32.480 --> 00:43:36.800]   It has, I think, a relatively large,
[00:43:36.800 --> 00:43:39.560]   I would say, just for the simple experiments,
[00:43:39.560 --> 00:43:42.000]   a number of calls that you can do without having
[00:43:42.000 --> 00:43:42.880]   to pay anything.
[00:43:42.880 --> 00:43:46.600]   It still asks you for your credit card just to unlock it.
[00:43:46.600 --> 00:43:51.240]   So I'm going to show you a few examples that use this.
[00:43:51.240 --> 00:43:54.160]   After you run it the first time, there's
[00:43:54.160 --> 00:43:56.280]   a possibility of setting up the environment, which
[00:43:56.280 --> 00:43:58.680]   is all loaded into the browser.
[00:43:58.680 --> 00:44:02.880]   And then you can configure your local LLM server access.
[00:44:02.880 --> 00:44:06.120]   So this works out of the box with the Ollama, MM Studio,
[00:44:06.120 --> 00:44:08.280]   and Summer Camp is kind of a custom configuration
[00:44:08.280 --> 00:44:10.720]   for a workshop that I did previously.
[00:44:10.720 --> 00:44:12.960]   But you can also make it work with the LLama file.
[00:44:12.960 --> 00:44:18.360]   And these are the parameters to connect to your own LLama file.
[00:44:18.360 --> 00:44:23.440]   So if you don't know what LLama file is, it's LLama file.
[00:44:23.440 --> 00:44:26.280]   I'm pretty sure, yes, that's in my history
[00:44:26.280 --> 00:44:29.200]   because I'm working on this every single day of my life
[00:44:29.200 --> 00:44:30.120]   lately.
[00:44:30.120 --> 00:44:33.160]   And LLama file is a way to distribute and run
[00:44:33.160 --> 00:44:34.920]   LLMs with a single file.
[00:44:34.920 --> 00:44:36.800]   So if you have had a chance already
[00:44:36.800 --> 00:44:42.480]   to play with LLama CPT or, again, LM Studio or LLama,
[00:44:42.480 --> 00:44:45.120]   you probably already have an idea about what it means
[00:44:45.120 --> 00:44:47.760]   to serve a model locally.
[00:44:47.760 --> 00:44:50.120]   If you have not tried it yet, you
[00:44:50.120 --> 00:44:53.920]   can either try one of those services or those applications.
[00:44:53.920 --> 00:44:55.560]   Sorry, they're not services.
[00:44:55.560 --> 00:44:59.680]   Or you can try LLama file, which is our own interpretation
[00:44:59.680 --> 00:45:02.200]   of how a local LLM should work.
[00:45:02.200 --> 00:45:07.400]   And to be fair, this relies 95% on LLama CPT.
[00:45:07.400 --> 00:45:09.240]   But the main idea is that we wanted
[00:45:09.240 --> 00:45:12.360]   to have a tool that would work with the lowest
[00:45:12.360 --> 00:45:14.240]   friction possible for users.
[00:45:14.240 --> 00:45:16.480]   So this tool is a single file.
[00:45:16.480 --> 00:45:20.840]   You can download it from Hugging Face, Hugging Face,
[00:45:20.840 --> 00:45:27.240]   Mozilla AI, LLama file 0.10.0 is the latest URL.
[00:45:27.240 --> 00:45:30.720]   And I'm going to share the link immediately in case any of you
[00:45:30.720 --> 00:45:35.480]   wants to try and download this on the fly.
[00:45:35.480 --> 00:45:39.000]   On the main page, you can find a huge variety
[00:45:39.000 --> 00:45:41.600]   of LLama files from the smallest one, which
[00:45:41.600 --> 00:45:46.560]   is a 1.7 billion model with 1-bit quantization, which
[00:45:46.560 --> 00:45:48.280]   is 292 megabytes.
[00:45:48.280 --> 00:45:50.200]   It's not going to be great for everything,
[00:45:50.200 --> 00:45:53.120]   but it's funny to have because it's sunny and runs basically
[00:45:53.120 --> 00:45:54.240]   everywhere.
[00:45:54.240 --> 00:46:00.440]   Up to gemma 4, 31 billion, 24 gigabytes file to download.
[00:46:00.440 --> 00:46:05.000]   And this requires a bit more powerful hardware.
[00:46:05.000 --> 00:46:07.960]   All of these files are just simple executables
[00:46:07.960 --> 00:46:11.320]   that you can just download and run on your system.
[00:46:11.320 --> 00:46:12.960]   If you are on a Unix-like system,
[00:46:12.960 --> 00:46:17.080]   you need to add the executable attribute to the file
[00:46:17.080 --> 00:46:18.080]   to start it.
[00:46:18.080 --> 00:46:21.880]   If you are on Windows, you just add the .exe extension
[00:46:21.880 --> 00:46:23.960]   and then double click on this.
[00:46:23.960 --> 00:46:26.880]   These files, regardless of the operating system you are,
[00:46:26.880 --> 00:46:28.480]   should run out of the box.
[00:46:28.480 --> 00:46:33.440]   They should, by default, use CPU if you have no GPU,
[00:46:33.440 --> 00:46:36.160]   or they can use acceleration.
[00:46:36.160 --> 00:46:38.880]   All of these LLama files are prepackaged
[00:46:38.880 --> 00:46:41.760]   to work with GPU acceleration on Linux,
[00:46:41.760 --> 00:46:44.280]   but you can separately download libraries
[00:46:44.280 --> 00:46:47.760]   to let them run on Windows too.
[00:46:47.760 --> 00:46:51.160]   And I'm going to run some LLama files in the background
[00:46:51.160 --> 00:46:53.720]   to show you how things run in real time
[00:46:53.720 --> 00:46:57.000]   so you can get an idea about how to run these tools.
[00:46:57.000 --> 00:46:58.440]   So let me go back to this page.
[00:46:58.440 --> 00:47:01.480]   Let me start the LLama file.
[00:47:01.480 --> 00:47:05.640]   So here I have a directory with a few of them.
[00:47:05.640 --> 00:47:13.080]   I'm going to take this model, .3.5.9 billion, and start it.
[00:47:13.080 --> 00:47:16.280]   I'm going to also use --server so you can see
[00:47:16.280 --> 00:47:19.120]   what is happening in real time.
[00:47:19.120 --> 00:47:19.960]   Oh, of course.
[00:47:19.960 --> 00:47:26.560]   OK, now it's ready.
[00:47:26.560 --> 00:47:28.280]   And we're going to connect to this,
[00:47:28.280 --> 00:47:31.520]   and we're going to ask it the simplest question ever.
[00:47:31.520 --> 00:47:35.520]   How many times does the letter R occur in the word strawberry?
[00:47:35.520 --> 00:47:39.080]   And please note that here it's not the regular word,
[00:47:39.080 --> 00:47:41.680]   so we're not expecting the usual answer.
[00:47:41.680 --> 00:47:45.240]   It's purposely misspelled, so you really
[00:47:45.240 --> 00:47:48.440]   have to either have a very good tokenizer that
[00:47:48.440 --> 00:47:51.680]   allows you to get exactly how many Rs are there,
[00:47:51.680 --> 00:47:53.920]   or to call a tool.
[00:47:53.920 --> 00:47:55.480]   In particular, the tool that we made
[00:47:55.480 --> 00:47:58.520]   available, which is the one called
[00:47:58.520 --> 00:48:01.200]   count character occurrences.
[00:48:01.200 --> 00:48:04.080]   So below here, you already see that I called the tool
[00:48:04.080 --> 00:48:06.440]   in the past, and I saw it running before,
[00:48:06.440 --> 00:48:08.240]   but I'm going to clear everything here
[00:48:08.240 --> 00:48:11.120]   and just run it on the fly.
[00:48:11.120 --> 00:48:17.400]   So let me clear the agent output here and run it again.
[00:48:17.400 --> 00:48:23.440]   So the first hit to the model is to the completion endpoint,
[00:48:23.440 --> 00:48:27.200]   and the request is the following one.
[00:48:27.200 --> 00:48:28.720]   You are a helpful agent.
[00:48:28.720 --> 00:48:29.800]   Use available tools.
[00:48:29.800 --> 00:48:32.320]   These are the instructions, one of the few components
[00:48:32.320 --> 00:48:34.960]   that David told you about before.
[00:48:34.960 --> 00:48:36.400]   Then there is the user prompt.
[00:48:36.400 --> 00:48:38.600]   How many times does the letter R occur
[00:48:38.600 --> 00:48:42.120]   in the word strawberry, which is another one of the components.
[00:48:42.120 --> 00:48:47.160]   Then there is the model that we chose, the 3.59 billion.
[00:48:47.160 --> 00:48:49.720]   The tools, another component that David told you
[00:48:49.720 --> 00:48:52.200]   about before, and each of the tools
[00:48:52.200 --> 00:48:56.640]   is passed as JSON in the request that you're sending to the LLM.
[00:48:56.640 --> 00:49:00.720]   So the LLM does not need to have any knowledge about the tools.
[00:49:00.720 --> 00:49:02.960]   The tools are not implemented in the LLM.
[00:49:02.960 --> 00:49:07.720]   There's something external that we are making available to it.
[00:49:07.720 --> 00:49:10.040]   And there are a few tools.
[00:49:10.040 --> 00:49:12.200]   One of them is count character occurrences,
[00:49:12.200 --> 00:49:16.600]   which uses these parameters, in particular, the character,
[00:49:16.600 --> 00:49:18.720]   the one that we want to look for,
[00:49:18.720 --> 00:49:23.200]   and the word, the one we are looking into.
[00:49:23.200 --> 00:49:24.920]   And then there is another function,
[00:49:24.920 --> 00:49:26.920]   the one to visit web pages.
[00:49:26.920 --> 00:49:28.560]   And then there is another function
[00:49:28.560 --> 00:49:31.600]   to do the tabular web search.
[00:49:31.600 --> 00:49:34.040]   And this is all our request.
[00:49:34.040 --> 00:49:36.000]   So we're basically asking the LLM
[00:49:36.000 --> 00:49:39.280]   that we specify to follow the instructions
[00:49:39.280 --> 00:49:42.840]   by using all the tools that are available-- well, not all.
[00:49:42.840 --> 00:49:45.400]   One, at least, of the tools that are available--
[00:49:45.400 --> 00:49:47.400]   to answer the user prompt.
[00:49:47.400 --> 00:49:49.680]   And the response we get from the model
[00:49:49.680 --> 00:49:53.960]   is you should do a tool call.
[00:49:53.960 --> 00:49:56.360]   And the tool call is to a function
[00:49:56.360 --> 00:49:58.920]   that's called count character of currencies
[00:49:58.920 --> 00:50:01.240]   to which we pass this string, which
[00:50:01.240 --> 00:50:07.640]   is the misspelled strawberry, to look for the character r.
[00:50:07.640 --> 00:50:08.680]   I just saw a question.
[00:50:08.680 --> 00:50:10.840]   Thank you so much for trying Lambda File on the fly.
[00:50:10.840 --> 00:50:14.640]   I really like that you're tinkering with this already.
[00:50:14.640 --> 00:50:16.980]   I'm sorry I didn't show you it clearly enough,
[00:50:16.980 --> 00:50:18.560]   because I just came through this.
[00:50:18.560 --> 00:50:20.720]   You add the minus minus server.
[00:50:20.720 --> 00:50:24.920]   And let me show that to you.
[00:50:24.920 --> 00:50:27.240]   I can block this for now.
[00:50:27.240 --> 00:50:30.480]   And I can just rerun it again.
[00:50:30.480 --> 00:50:35.480]   You just add this minus minus server parameter to that.
[00:50:35.480 --> 00:50:39.200]   So let me run it again, in case we want to see other stuff
[00:50:39.200 --> 00:50:40.600]   running in real time.
[00:50:40.600 --> 00:50:42.080]   Oh, yeah, sure, good idea.
[00:50:42.080 --> 00:50:44.040]   I'm going to paste the command there.
[00:50:44.040 --> 00:50:46.920]   So in this case, the command is for that specific model
[00:50:46.920 --> 00:50:48.180]   the 9 billion model.
[00:50:48.180 --> 00:50:50.420]   But don't worry, it's going to work for any of them.
[00:50:50.420 --> 00:50:54.020]   If you don't, you just have what was pasted there
[00:50:54.020 --> 00:50:57.020]   as a screenshot, so like a terminal user interface
[00:50:57.020 --> 00:51:00.140]   chat that you can use.
[00:51:00.140 --> 00:51:03.080]   Another thing I should tell you about knowing your tools,
[00:51:03.080 --> 00:51:05.820]   unfortunately, the smallest models
[00:51:05.820 --> 00:51:08.420]   are not working for tool calling.
[00:51:08.420 --> 00:51:12.240]   The first model working for tool calling,
[00:51:12.240 --> 00:51:14.320]   let's say, that has been trained to do
[00:51:14.320 --> 00:51:17.480]   tool calling in this web page.
[00:51:17.480 --> 00:51:18.760]   Let me check.
[00:51:18.760 --> 00:51:22.480]   Is the QAN 3.5 0.8 billions?
[00:51:22.480 --> 00:51:25.000]   It's this one over here.
[00:51:25.000 --> 00:51:27.760]   Just so you know, if you run one of the Bonsai model,
[00:51:27.760 --> 00:51:30.480]   you won't be able to do tool calling.
[00:51:30.480 --> 00:51:32.000]   And oh, that's great.
[00:51:32.000 --> 00:51:34.640]   You also opened the localhost 8080,
[00:51:34.640 --> 00:51:37.040]   and you could directly connect to the llama CPP UI.
[00:51:37.040 --> 00:51:37.760]   Perfect.
[00:51:37.760 --> 00:51:40.560]   Great.
[00:51:40.560 --> 00:51:41.220]   OK, cool.
[00:51:41.220 --> 00:51:44.000]   So you saw how to run the model.
[00:51:44.000 --> 00:51:44.880]   Please try.
[00:51:44.880 --> 00:51:46.880]   Please tell us if it breaks because it's
[00:51:46.880 --> 00:51:48.560]   very good feedback for us.
[00:51:48.560 --> 00:51:51.600]   And I'm just going to keep this running and move back
[00:51:51.600 --> 00:51:55.920]   to the examples that we have in the browser.
[00:51:55.920 --> 00:51:57.920]   Oh, also, it runs on Windows out of the box.
[00:51:57.920 --> 00:51:59.640]   That's great.
[00:51:59.640 --> 00:52:03.720]   You know, I'm developing these for Windows too.
[00:52:03.720 --> 00:52:06.440]   I have one Windows machine on which I tested.
[00:52:06.440 --> 00:52:08.400]   We have different colleagues testing them
[00:52:08.400 --> 00:52:10.900]   on different Windows machines with different GPUs.
[00:52:10.900 --> 00:52:13.960]   But one of the biggest issues we have with llama files
[00:52:13.960 --> 00:52:16.920]   is we just don't have all the hardware in the world.
[00:52:16.920 --> 00:52:20.200]   So the more feedback we get from people both today
[00:52:20.200 --> 00:52:22.600]   during the class and in general, if you
[00:52:22.600 --> 00:52:24.480]   want to play with this in the future,
[00:52:24.480 --> 00:52:26.720]   is getting to know more about how
[00:52:26.720 --> 00:52:29.080]   it performs on your own systems.
[00:52:29.080 --> 00:52:31.760]   Don't feel bad about giving us negative feedback.
[00:52:31.760 --> 00:52:32.920]   Like we are super happy.
[00:52:32.920 --> 00:52:35.520]   It's a win-win because it will work better for you
[00:52:35.520 --> 00:52:38.320]   when we fix it, and we will have something which is more stable.
[00:52:38.320 --> 00:52:41.200]   So thank you so much for the feedback that you're giving me.
[00:52:41.200 --> 00:52:43.760]   And sorry if I turn on the other side when I say thank you,
[00:52:43.760 --> 00:52:46.140]   but it's where the chat is, the select chat.
[00:52:46.140 --> 00:52:49.000]   So I will face the monitor only when looking
[00:52:49.000 --> 00:52:51.800]   at the slides and the pages.
[00:52:51.800 --> 00:52:52.280]   OK.
[00:52:52.280 --> 00:52:57.640]   So first call, it was getting how many characters we had,
[00:52:57.640 --> 00:52:59.840]   how many occurrences of the R character
[00:52:59.840 --> 00:53:02.760]   we had in the misspelled strawberry word.
[00:53:02.760 --> 00:53:05.680]   And then you can see from the follow completions
[00:53:05.680 --> 00:53:10.440]   that there is a new request that pastes everything we had back.
[00:53:10.440 --> 00:53:15.360]   So the system instructions, the user query, the assistant.
[00:53:15.360 --> 00:53:20.440]   Now there's a new section where the assistant, which
[00:53:20.440 --> 00:53:25.840]   is basically your LLM when it's called the tool, that said,
[00:53:25.840 --> 00:53:31.880]   I'm counting these occurrences of the character
[00:53:31.880 --> 00:53:33.760]   are in the strawberry.
[00:53:33.760 --> 00:53:36.040]   And the tool that you called--
[00:53:36.040 --> 00:53:39.880]   the tool called that you made has a very specific ID.
[00:53:39.880 --> 00:53:43.040]   And you also now have the answer from that tool called.
[00:53:43.040 --> 00:53:45.600]   The tool called with that ID answered
[00:53:45.600 --> 00:53:50.000]   that the result is 5, which also means
[00:53:50.000 --> 00:53:53.160]   that you can take any word, not necessarily one word which
[00:53:53.160 --> 00:53:57.280]   is in the vocabulary, not something which you can easily
[00:53:57.280 --> 00:53:57.800]   tokenize.
[00:53:57.800 --> 00:54:00.400]   You can just take something like this,
[00:54:00.400 --> 00:54:04.240]   and you can still ask how many Rs appear there.
[00:54:04.240 --> 00:54:06.380]   Honestly, I don't know.
[00:54:06.380 --> 00:54:09.320]   If you want to check it live, I'm just going to trust it.
[00:54:09.320 --> 00:54:11.760]   When I see that the output of the function
[00:54:11.760 --> 00:54:13.640]   is the correct one.
[00:54:13.640 --> 00:54:16.840]   And just think that we have moved
[00:54:16.840 --> 00:54:22.440]   from trusting an LLM, which is a stochastic predictor
[00:54:22.440 --> 00:54:26.080]   of the next token, to trusting a tool.
[00:54:26.080 --> 00:54:29.200]   Because now we can check in our output
[00:54:29.200 --> 00:54:33.440]   and see that after the assistant calls the tool,
[00:54:33.440 --> 00:54:36.440]   it has this answer from the tool.
[00:54:36.440 --> 00:54:39.880]   So it's not the LLM anymore which has generated
[00:54:39.880 --> 00:54:44.040]   how many times these are occurred.
[00:54:44.040 --> 00:54:47.360]   It's a tool that ran exactly how we expected because we
[00:54:47.360 --> 00:54:49.040]   know the code that it ran.
[00:54:49.040 --> 00:54:50.520]   And to me, this is a very good way
[00:54:50.520 --> 00:54:52.960]   to be, again, more confident about what
[00:54:52.960 --> 00:54:55.040]   the model answers us.
[00:54:55.040 --> 00:54:57.040]   There was a short moment in time.
[00:54:57.040 --> 00:54:59.040]   We're running a tiny nine--
[00:54:59.040 --> 00:55:03.760]   well, not tiny-- everywhere, but a relatively small 9 billion
[00:55:03.760 --> 00:55:07.080]   model on my laptop provided a result
[00:55:07.080 --> 00:55:12.400]   that was more correct than GPT service online.
[00:55:12.400 --> 00:55:15.280]   Because you know there was this moment in which GPT
[00:55:15.280 --> 00:55:17.560]   was not very good at answering this kind of question.
[00:55:17.560 --> 00:55:19.160]   Well, this tool did.
[00:55:19.160 --> 00:55:22.200]   So this is what I mean when I say
[00:55:22.200 --> 00:55:24.040]   when you have these kind of tools
[00:55:24.040 --> 00:55:26.440]   and you are in complete control of them,
[00:55:26.440 --> 00:55:29.360]   you can understand how they work, where they break,
[00:55:29.360 --> 00:55:31.360]   and how to make them better.
[00:55:31.360 --> 00:55:36.320]   So I hope you kind of have the same feeling,
[00:55:36.320 --> 00:55:38.920]   will have the same feeling when playing with this,
[00:55:38.920 --> 00:55:43.720]   that you are more in control of what is happening.
[00:55:43.720 --> 00:55:46.560]   I think that was very helpful for us in this phase
[00:55:46.560 --> 00:55:51.520]   was being able to check out these locks.
[00:55:51.520 --> 00:55:53.800]   These locks, I didn't tell you.
[00:55:53.800 --> 00:55:57.800]   If you, on Mac at least, hit Command-Option and I,
[00:55:57.800 --> 00:55:59.760]   you can turn them on and off.
[00:55:59.760 --> 00:56:04.200]   Or I think you can go on the menu, whatever your system is.
[00:56:04.200 --> 00:56:06.560]   And oh, this is Search Tabs.
[00:56:06.560 --> 00:56:10.200]   Sorry, this is the menu.
[00:56:10.200 --> 00:56:11.520]   OK, menu.
[00:56:11.520 --> 00:56:17.520]   And then you can go in, I think, More Tools and Web Developer
[00:56:17.520 --> 00:56:18.240]   Tools.
[00:56:18.240 --> 00:56:19.800]   OK, yes, that's it.
[00:56:19.800 --> 00:56:24.360]   More Tools, Web Developer Tools, and you can enable this.
[00:56:24.360 --> 00:56:26.400]   So I find this super convenient.
[00:56:26.400 --> 00:56:29.520]   And I think this is still one of the components
[00:56:29.520 --> 00:56:32.080]   that Dadit was talking about before,
[00:56:32.080 --> 00:56:34.720]   that he's having callbacks which allow
[00:56:34.720 --> 00:56:37.280]   you to get extra information.
[00:56:37.280 --> 00:56:39.960]   In this case, we don't have an explicit callback
[00:56:39.960 --> 00:56:43.080]   in the agent, but we have an extra way
[00:56:43.080 --> 00:56:45.400]   of checking whatever passes through the network,
[00:56:45.400 --> 00:56:47.720]   because it's every call that is sent to the LLM
[00:56:47.720 --> 00:56:50.280]   and every response that the LLM gives us.
[00:56:50.280 --> 00:56:52.560]   And we can take advantage of these
[00:56:52.560 --> 00:56:55.800]   to get more information about that.
[00:56:55.800 --> 00:56:59.760]   So let's go to the second example that was connecting
[00:56:59.760 --> 00:57:01.800]   to a web page in this case.
[00:57:01.800 --> 00:57:05.560]   So I will also try and rerun these on the fly.
[00:57:05.560 --> 00:57:07.760]   So let me delete this.
[00:57:07.760 --> 00:57:11.760]   And same setup, same model.
[00:57:11.760 --> 00:57:13.520]   Let me just check that I left it running,
[00:57:13.520 --> 00:57:14.600]   because I don't remember.
[00:57:14.600 --> 00:57:15.880]   OK, yes.
[00:57:15.880 --> 00:57:18.840]   And the question is, how many stars
[00:57:18.840 --> 00:57:23.080]   does the Mozilla AI Any Agent project have on GitHub?
[00:57:23.080 --> 00:57:25.960]   So in this case, the model wouldn't be able to answer
[00:57:25.960 --> 00:57:28.640]   unless it had access to the web.
[00:57:28.640 --> 00:57:31.080]   So let us try and run this agent.
[00:57:31.080 --> 00:57:33.680]   We already have a response, which was the correct one,
[00:57:33.680 --> 00:57:35.400]   but let me try and run it again.
[00:57:35.400 --> 00:57:41.120]   So the first thing the agent does
[00:57:41.120 --> 00:57:47.400]   is it tries to search Mozilla AI Any Agent GitHub stars using
[00:57:47.400 --> 00:57:47.960]   Tableau.
[00:57:47.960 --> 00:57:51.000]   So it uses an API to do a web search
[00:57:51.000 --> 00:57:54.600]   and connect to get information about this project.
[00:57:54.600 --> 00:57:58.320]   Then it probably doesn't find all the information it needs.
[00:57:58.320 --> 00:58:01.760]   And then it goes on GitHub to get information
[00:58:01.760 --> 00:58:03.520]   about the Any Agent project.
[00:58:03.520 --> 00:58:08.400]   So in this case, it's the actual HTML page from GitHub.
[00:58:08.400 --> 00:58:11.160]   Then it parses the output.
[00:58:11.160 --> 00:58:14.200]   And then it gives you the information
[00:58:14.200 --> 00:58:16.360]   that the project has 1,200 stars.
[00:58:16.360 --> 00:58:18.000]   Let me just show it as markdown.
[00:58:18.000 --> 00:58:21.440]   It's going to be more readable.
[00:58:21.440 --> 00:58:25.240]   And if you go and check, I think I have it here.
[00:58:25.240 --> 00:58:27.560]   The project actually has 1,200 stars.
[00:58:27.560 --> 00:58:30.880]   So we are doing pretty good here.
[00:58:30.880 --> 00:58:33.520]   Two examples, two running examples.
[00:58:33.520 --> 00:58:36.880]   I think this is already pretty good.
[00:58:36.880 --> 00:58:40.080]   OK, the third one is a bit more advanced.
[00:58:40.080 --> 00:58:46.680]   Let me clear this and try and run it on the fly again.
[00:58:46.680 --> 00:58:51.800]   So what is the title of the latest post on this website?
[00:58:51.800 --> 00:58:53.880]   That is my personal blog post.
[00:58:53.880 --> 00:58:55.560]   When was it published?
[00:58:55.560 --> 00:58:57.000]   What is it about?
[00:58:57.000 --> 00:58:59.960]   And what is the absolute URL of the image
[00:58:59.960 --> 00:59:02.440]   at the beginning of the post?
[00:59:02.440 --> 00:59:06.240]   So let me just show you the website before.
[00:59:06.240 --> 00:59:08.200]   Let me just try and do this.
[00:59:08.200 --> 00:59:13.480]   I think I have it here.
[00:59:13.480 --> 00:59:17.760]   No, I don't have the full one, so this is the website.
[00:59:17.760 --> 00:59:20.840]   The main page is just a list of posts.
[00:59:20.840 --> 00:59:25.000]   You can see there is my Vive reversing post here.
[00:59:25.000 --> 00:59:28.720]   And the latest one is this one from August last year.
[00:59:28.720 --> 00:59:34.200]   I hate to see my blog posting, which I need to catch up with.
[00:59:34.200 --> 00:59:35.720]   This is the post.
[00:59:35.720 --> 00:59:38.240]   It's called Chest.
[00:59:38.240 --> 00:59:42.880]   And this is the first image that you see in the blog post.
[00:59:42.880 --> 00:59:44.640]   So let us get that here.
[00:59:44.640 --> 00:59:47.640]   And what we're trying to do here is a tiny crawler,
[00:59:47.640 --> 00:59:52.720]   a custom crawler, but we are not writing a line of code
[00:59:52.720 --> 00:59:54.040]   to write this crawler.
[00:59:54.040 --> 00:59:56.720]   What we do is we just take a 9 billion model, which is
[00:59:56.720 --> 00:59:59.440]   something that runs on cheap hardware.
[00:59:59.440 --> 01:00:04.000]   We are providing it with both a search engine and a visit
[01:00:04.000 --> 01:00:05.480]   web page tool.
[01:00:05.480 --> 01:00:07.880]   But in this case, you will see it will likely only
[01:00:07.880 --> 01:00:10.400]   use the visit web page tool because we already
[01:00:10.400 --> 01:00:12.040]   provided the URL.
[01:00:12.040 --> 01:00:13.480]   And then we give a question.
[01:00:13.480 --> 01:00:17.520]   So let's see how it works this time.
[01:00:17.520 --> 01:00:19.880]   It does the first call.
[01:00:19.880 --> 01:00:23.360]   The answer is you should try and connect
[01:00:23.360 --> 01:00:26.640]   tool calls, function, visit web page,
[01:00:26.640 --> 01:00:28.920]   and the URL that was provided.
[01:00:28.920 --> 01:00:31.080]   Then it connects to the web page.
[01:00:31.080 --> 01:00:33.240]   Then it follows the link to go to the post.
[01:00:33.240 --> 01:00:34.880]   And then it provides this answer,
[01:00:34.880 --> 01:00:37.440]   which I'm going to format as markdown,
[01:00:37.440 --> 01:00:40.080]   which is this is the title of the post.
[01:00:40.080 --> 01:00:41.880]   This was the date.
[01:00:41.880 --> 01:00:43.760]   And this is what it is about.
[01:00:43.760 --> 01:00:46.120]   And this is the link for the image.
[01:00:46.120 --> 01:00:49.640]   And the link for the image is exactly this one,
[01:00:49.640 --> 01:00:51.800]   which is the one I showed you before by mistake.
[01:00:51.800 --> 01:00:59.920]   So what the tool did was find--
[01:00:59.920 --> 01:01:00.680]   sorry, the tool.
[01:01:00.680 --> 01:01:04.280]   What the LLM did was check out the list of tools,
[01:01:04.280 --> 01:01:08.880]   find that the tool it had to call was the fetch web page,
[01:01:08.880 --> 01:01:14.280]   connect to the main website, parse the list of posts,
[01:01:14.280 --> 01:01:18.040]   understanding what was the first post out of all of them,
[01:01:18.040 --> 01:01:21.600]   and then following that link, opening the web page,
[01:01:21.600 --> 01:01:24.320]   parsing it, and getting all the information that
[01:01:24.320 --> 01:01:27.200]   was related to that.
[01:01:27.200 --> 01:01:31.360]   All of these in a few lines of code,
[01:01:31.360 --> 01:01:36.720]   which I'm going to show you now, and a model that
[01:01:36.720 --> 01:01:41.360]   runs on even small hardware.
[01:01:41.360 --> 01:01:43.720]   So I was going to tell you, I'm going
[01:01:43.720 --> 01:01:45.560]   to show you the code for this.
[01:01:45.560 --> 01:01:48.240]   Let's just do that from within the browser.
[01:01:48.240 --> 01:01:53.040]   This is mostly HTML, HTML, HTML, HTML.
[01:01:53.040 --> 01:01:57.080]   Then we get to a point where--
[01:01:57.080 --> 01:01:58.800]   this is still HTML, sorry.
[01:01:58.800 --> 01:02:04.400]   OK, this part over here is the part that is going to start.
[01:02:04.400 --> 01:02:11.160]   And decide when to call our Python code.
[01:02:11.160 --> 01:02:13.880]   Here, we just install some Python dependencies,
[01:02:13.880 --> 01:02:16.920]   which is whatever we need to install
[01:02:16.920 --> 01:02:20.560]   before running our agents.
[01:02:20.560 --> 01:02:23.120]   And then, I believe I left a comment.
[01:02:23.120 --> 01:02:27.360]   Yes, your Python agent code goes here.
[01:02:27.360 --> 01:02:33.200]   So this is agent code that uses the default OpenAI client
[01:02:33.200 --> 01:02:37.560]   and the OpenAI agents library.
[01:02:37.560 --> 01:02:39.480]   So in this case, we are not using any agent,
[01:02:39.480 --> 01:02:42.280]   but the OpenAI-specific code.
[01:02:42.280 --> 01:02:45.440]   The reason is that OpenAI-specific code
[01:02:45.440 --> 01:02:47.800]   pinned to a very particular version that's
[01:02:47.800 --> 01:02:48.680]   quite back in time--
[01:02:48.680 --> 01:02:53.200]   I think I have not updated it in months, but it still works--
[01:02:53.200 --> 01:02:57.240]   can be interpreted by PyOdide, which
[01:02:57.240 --> 01:02:59.840]   is a WebAssembly interpreter.
[01:02:59.840 --> 01:03:03.400]   And the reason I did this is that this way, you
[01:03:03.400 --> 01:03:08.720]   can write your own Python agent with very simple tools.
[01:03:08.720 --> 01:03:13.080]   This is the count character occurrences.
[01:03:13.080 --> 01:03:19.880]   You just have a one line telling you word.count char.
[01:03:19.880 --> 01:03:21.640]   And this is how you count occurrences
[01:03:21.640 --> 01:03:22.680]   of a character in a word.
[01:03:22.680 --> 01:03:25.280]   You don't need anything more complex than this.
[01:03:25.280 --> 01:03:27.800]   And you're already better than last year's GPT,
[01:03:27.800 --> 01:03:29.920]   just with a single function tool.
[01:03:29.920 --> 01:03:32.960]   Visit Web page is this simple.
[01:03:32.960 --> 01:03:35.200]   Search Stavily is probably a bit more complex,
[01:03:35.200 --> 01:03:39.000]   but still is relatively few lines of code, you see?
[01:03:39.000 --> 01:03:41.400]   And then that's it.
[01:03:41.400 --> 01:03:44.760]   There's, of course, other functions that are called.
[01:03:44.760 --> 01:03:48.400]   But think about having all your code, your agent,
[01:03:48.400 --> 01:03:52.680]   stored into an HTML file that other people can take and just
[01:03:52.680 --> 01:03:54.400]   run on their browsers.
[01:03:54.400 --> 01:03:56.200]   This was the main reason why I decided
[01:03:56.200 --> 01:04:01.480]   to build this Wasm agents blueprint, because I thought,
[01:04:01.480 --> 01:04:04.280]   for me, it's important not just that I can run this agent,
[01:04:04.280 --> 01:04:07.000]   but that other people can run this agent in an easy way.
[01:04:07.000 --> 01:04:09.120]   I think a lot of stuff happened in the last year.
[01:04:09.120 --> 01:04:13.040]   There are way better ways to run code and make it very portable.
[01:04:13.040 --> 01:04:16.040]   But if you want to experiment, this is available.
[01:04:16.040 --> 01:04:17.080]   It's open source code.
[01:04:17.080 --> 01:04:20.920]   You can play, change it, create your own agents, and so on.
[01:04:20.920 --> 01:04:22.800]   There's one last example, I think.
[01:04:22.800 --> 01:04:23.440]   Let me see.
[01:04:23.440 --> 01:04:24.400]   No, not here.
[01:04:24.400 --> 01:04:26.360]   So I'm not going to show it to you right now.
[01:04:26.360 --> 01:04:27.920]   I'm going to jump to another part,
[01:04:27.920 --> 01:04:32.800]   still very much related to these tools,
[01:04:32.800 --> 01:04:39.040]   like this Wasm agents tool, and in general,
[01:04:39.040 --> 01:04:42.080]   a way of checking out from the logs
[01:04:42.080 --> 01:04:45.600]   how a single tool among the ones you use is behaving.
[01:04:45.600 --> 01:04:50.000]   So know your tools.
[01:04:50.000 --> 01:04:51.440]   We have already started.
[01:04:51.440 --> 01:04:53.680]   Oh, inspire me to revert code locally.
[01:04:53.680 --> 01:04:55.480]   How good is this framework for coding,
[01:04:55.480 --> 01:04:57.200]   and which model would you suggest?
[01:04:57.200 --> 01:04:59.280]   Ooh, that's such a great question.
[01:04:59.280 --> 01:05:00.960]   So before knowing your tools, I'm
[01:05:00.960 --> 01:05:04.000]   going to tell you a bit more about that.
[01:05:04.000 --> 01:05:06.360]   And also because it's very much related,
[01:05:06.360 --> 01:05:09.720]   like which model is best for coding,
[01:05:09.720 --> 01:05:12.400]   which agentic framework is best for coding.
[01:05:12.400 --> 01:05:15.440]   And the idea to reinvent code locally
[01:05:15.440 --> 01:05:19.200]   is so great because there is one project called TIE,
[01:05:19.200 --> 01:05:21.720]   the one I told you I would have shown you later,
[01:05:21.720 --> 01:05:24.280]   that's actually being built exactly with that purpose.
[01:05:24.280 --> 01:05:27.720]   Like the developer of PIE was so frustrated
[01:05:27.720 --> 01:05:30.120]   about using Cloud Code and having
[01:05:30.120 --> 01:05:33.800]   to keep up with all the changes the user experience they had,
[01:05:33.800 --> 01:05:38.120]   that he developed PIE, which is another agentic tool
[01:05:38.120 --> 01:05:42.040]   for coding, but actually could be used for anything
[01:05:42.040 --> 01:05:46.640]   that you can use with whatever model you want.
[01:05:46.640 --> 01:05:47.840]   Yes, I agree.
[01:05:47.840 --> 01:05:50.320]   Cloud Code gets very expensive fast.
[01:05:50.320 --> 01:05:52.440]   In addition to that, and that becomes
[01:05:52.440 --> 01:05:56.760]   part of like not being able to have full power of what you run
[01:05:56.760 --> 01:05:59.160]   or not having control of what you run,
[01:05:59.160 --> 01:06:01.880]   just think that recently--
[01:06:01.880 --> 01:06:04.400]   oh, thanks a lot for sharing the link, David.
[01:06:04.400 --> 01:06:07.880]   Recently, Cloud-- oh, well, Antropic
[01:06:07.880 --> 01:06:12.840]   decided that if you are not using Cloud Code, the tool,
[01:06:12.840 --> 01:06:16.640]   you cannot just use the subscriptions that you have,
[01:06:16.640 --> 01:06:18.920]   but you have to pay per token.
[01:06:18.920 --> 01:06:21.960]   So if you're using OpenClaw, if you're using PIE,
[01:06:21.960 --> 01:06:27.200]   if you're using your own agent backed up by Antropic models,
[01:06:27.200 --> 01:06:29.080]   you will have to pay them by token.
[01:06:29.080 --> 01:06:31.360]   So you don't have this kind of flat rate
[01:06:31.360 --> 01:06:33.520]   that the subscriptions give you.
[01:06:33.520 --> 01:06:36.160]   And this is very frustrating, especially
[01:06:36.160 --> 01:06:38.800]   if you have already started paying for a subscription
[01:06:38.800 --> 01:06:41.400]   that you thought would have lasted for a while
[01:06:41.400 --> 01:06:45.520]   and cover you for your usage.
[01:06:45.520 --> 01:06:47.000]   So I agree with you.
[01:06:47.000 --> 01:06:50.360]   Getting started with something custom is the best.
[01:06:50.360 --> 01:06:51.960]   I can tell you already one thing.
[01:06:51.960 --> 01:06:55.400]   The model I heard about that people said
[01:06:55.400 --> 01:06:59.240]   was the best for relatively small hardware
[01:06:59.240 --> 01:07:07.920]   is QAN 2.6, $27 billion--
[01:07:07.920 --> 01:07:14.280]   3.6, sorry, QAN 3.6, $27 billion, which, by the way,
[01:07:14.280 --> 01:07:16.760]   is available as a LAMA file, if I remember well.
[01:07:16.760 --> 01:07:20.280]   Let me just double check.
[01:07:20.280 --> 01:07:21.200]   Yes, this one.
[01:07:21.200 --> 01:07:26.400]   If you look for this on LinkedIn,
[01:07:26.400 --> 01:07:31.040]   you will see Hugging Face CTO showing
[01:07:31.040 --> 01:07:33.560]   that it was using this model on a flight,
[01:07:33.560 --> 01:07:36.160]   and he could code while on a flight.
[01:07:36.160 --> 01:07:37.720]   I think he was using that, at least.
[01:07:37.720 --> 01:07:43.360]   I'm not 100% sure now, because I saw Gemma also being shared.
[01:07:43.360 --> 01:07:47.720]   So I would say these models over here, the ones above,
[01:07:47.720 --> 01:07:52.480]   let's say, $20-something billions are kind of comparable.
[01:07:52.480 --> 01:07:55.120]   I heard well about all of them.
[01:07:55.120 --> 01:07:57.520]   I think it very much depends on which kind of code
[01:07:57.520 --> 01:07:58.640]   you're developing.
[01:07:58.640 --> 01:08:02.600]   So can we have a back of the envelope estimation
[01:08:02.600 --> 01:08:06.760]   about how these things run, how fast they run,
[01:08:06.760 --> 01:08:08.800]   how good they are?
[01:08:08.800 --> 01:08:11.120]   I can give you some examples, and this
[01:08:11.120 --> 01:08:13.120]   is part of knowing your tools.
[01:08:13.120 --> 01:08:15.360]   So this is a super-silic sample.
[01:08:15.360 --> 01:08:20.080]   Oh, before we have this one, I'm going to skip it.
[01:08:20.080 --> 01:08:21.560]   I'm going to show it afterwards.
[01:08:21.560 --> 01:08:23.160]   This is a super-silic sample.
[01:08:23.160 --> 01:08:28.720]   I'm just asking different agents for my age.
[01:08:28.720 --> 01:08:31.400]   So what I would expect is that they don't know,
[01:08:31.400 --> 01:08:33.120]   because I'm not on Wikipedia.
[01:08:33.120 --> 01:08:37.640]   My information is not available in large language models training
[01:08:37.640 --> 01:08:38.720]   sets.
[01:08:38.720 --> 01:08:41.560]   It's available somewhere on some PDF document online,
[01:08:41.560 --> 01:08:44.960]   but probably nobody has used it, and definitely nobody cares.
[01:08:44.960 --> 01:08:49.640]   So if you ask for my birth date, you will probably not find it.
[01:08:49.640 --> 01:08:51.880]   And this is, for me, a very simple--
[01:08:51.880 --> 01:08:53.560]   because I know the answer--
[01:08:53.560 --> 01:08:58.160]   but a good example to show how different models, depending
[01:08:58.160 --> 01:09:02.880]   on how large they are, tackle the same problem
[01:09:02.880 --> 01:09:05.040]   in different ways.
[01:09:05.040 --> 01:09:06.760]   One could say they're--
[01:09:06.760 --> 01:09:08.880]   take what you say more literally,
[01:09:08.880 --> 01:09:11.560]   or one could say they are dumb.
[01:09:11.560 --> 01:09:13.480]   I don't want to humanize them.
[01:09:13.480 --> 01:09:17.640]   I just think they have a different way of predicting
[01:09:17.640 --> 01:09:19.520]   what the next token is, right?
[01:09:19.520 --> 01:09:22.840]   And probably it's less precise than what you would expect.
[01:09:22.840 --> 01:09:24.240]   So here's the example.
[01:09:24.240 --> 01:09:25.960]   I'm going to show you the code first,
[01:09:25.960 --> 01:09:27.640]   which is available online.
[01:09:27.640 --> 01:09:30.040]   So let me do things in the right order.
[01:09:30.040 --> 01:09:34.600]   First of all, where is the code?
[01:09:34.600 --> 01:09:41.120]   This one, this is the repository where
[01:09:41.120 --> 01:09:43.320]   I left some Python files with examples
[01:09:43.320 --> 01:09:46.480]   of how to test these models.
[01:09:46.480 --> 01:09:50.760]   And then I'm going to go here, and then I'm
[01:09:50.760 --> 01:09:52.680]   going to share these two.
[01:09:52.680 --> 01:09:54.720]   This is the code that we are running now,
[01:09:54.720 --> 01:09:57.560]   which is called agent_birthday.
[01:09:57.560 --> 01:09:59.160]   And this is another agent.
[01:09:59.160 --> 01:10:01.040]   It's all Python.
[01:10:01.040 --> 01:10:08.000]   And the agent code is this.
[01:10:08.000 --> 01:10:09.400]   It uses any agent.
[01:10:09.400 --> 01:10:11.680]   This is a library that we developed.
[01:10:11.680 --> 01:10:14.400]   It uses tiny agent as the agentic framework,
[01:10:14.400 --> 01:10:18.160]   let's say the code that implements the agentic loop.
[01:10:18.160 --> 01:10:22.040]   It uses llama file with a given model.
[01:10:22.040 --> 01:10:23.320]   This is just a string.
[01:10:23.320 --> 01:10:26.000]   I left it here, but I have examples
[01:10:26.000 --> 01:10:31.560]   running with all models from quen 3, 0, 6 billion to quen 3.5,
[01:10:31.560 --> 01:10:32.680]   9 billion.
[01:10:32.680 --> 01:10:35.440]   This is just the last name that I left.
[01:10:35.440 --> 01:10:38.520]   As you can guess, it doesn't need an API key.
[01:10:38.520 --> 01:10:40.960]   It runs locally, as you've already seen.
[01:10:40.960 --> 01:10:42.760]   These are the instructions, and these
[01:10:42.760 --> 01:10:45.680]   are the tools that are provided.
[01:10:45.680 --> 01:10:48.120]   So there are two tools available to this agent.
[01:10:48.120 --> 01:10:50.520]   One is scan the current directory.
[01:10:50.520 --> 01:10:56.440]   And what it does is exactly-- not read file, this.
[01:10:56.440 --> 01:10:59.720]   So it looks for the current path, so the directory
[01:10:59.720 --> 01:11:02.600]   where these source files are to.
[01:11:02.600 --> 01:11:08.080]   And it returns a list of files that satisfy
[01:11:08.080 --> 01:11:10.120]   the pattern that is provided.
[01:11:10.120 --> 01:11:13.600]   So you can search for everything, everything.txt,
[01:11:13.600 --> 01:11:14.960]   everything.py.
[01:11:14.960 --> 01:11:17.480]   These are just some examples.
[01:11:17.480 --> 01:11:22.600]   And instead, read file, opens a file, and reads the content
[01:11:22.600 --> 01:11:24.320]   and returns it to the LLM.
[01:11:24.320 --> 01:11:25.640]   So it's super simple.
[01:11:25.640 --> 01:11:29.800]   There's way more documentation of your code
[01:11:29.800 --> 01:11:31.120]   than the code itself.
[01:11:31.120 --> 01:11:34.200]   And this is required because all of these docs
[01:11:34.200 --> 01:11:37.720]   are going to the LLM as instructions
[01:11:37.720 --> 01:11:41.080]   on how to run these tools.
[01:11:41.080 --> 01:11:44.160]   And the final code is agent run prompt.
[01:11:44.160 --> 01:11:45.880]   When was Davidei not born?
[01:11:45.880 --> 01:11:48.440]   And I can also provide different prompts on the command line,
[01:11:48.440 --> 01:11:54.600]   and you will see them shown in the slides afterwards.
[01:11:54.600 --> 01:12:02.240]   So first experiment, quad3-06-pilion is quite fast.
[01:12:02.240 --> 01:12:04.320]   To give you an idea about how fast this is,
[01:12:04.320 --> 01:12:07.320]   I'm not sure I have it, but let me quickly check.
[01:12:13.560 --> 01:12:17.640]   So I'm connecting to this computer.
[01:12:17.640 --> 01:12:22.600]   If it's available, it should be.
[01:12:22.600 --> 01:12:25.440]   Otherwise, it's offline right now.
[01:12:25.440 --> 01:12:28.080]   OK, I'm not connecting to this computer.
[01:12:28.080 --> 01:12:29.200]   I will run this locally.
[01:12:29.200 --> 01:12:34.760]   Let me see if I can zip align our files.
[01:12:34.760 --> 01:12:39.960]   OK, I have it here, 06 billion.
[01:12:39.960 --> 01:12:45.280]   I already have another model running on the same.
[01:12:45.280 --> 01:12:46.560]   So let me just run here.
[01:12:46.560 --> 01:12:49.080]   OK.
[01:12:49.080 --> 01:12:51.600]   Tell me everything you know about Palanturi.
[01:12:51.600 --> 01:12:57.080]   This is how fast it goes on this hardware.
[01:12:57.080 --> 01:12:59.440]   So it's very, very fast.
[01:12:59.440 --> 01:13:02.400]   It's a very, very tiny model.
[01:13:02.400 --> 01:13:05.160]   We are not 100% sure that all the information
[01:13:05.160 --> 01:13:07.920]   it knows is correct, but it's definitely
[01:13:07.920 --> 01:13:10.560]   something super responsive.
[01:13:10.560 --> 01:13:13.000]   And I can tell you, the experiment I wanted to show you
[01:13:13.000 --> 01:13:15.520]   was trying to connect to my Raspberry Pi file
[01:13:15.520 --> 01:13:18.680]   and show you that it actually runs on a Raspberry Pi.
[01:13:18.680 --> 01:13:21.160]   The fact that I cannot connect there is kind of concerning,
[01:13:21.160 --> 01:13:23.200]   because then Raspberry Pi is in Italy,
[01:13:23.200 --> 01:13:25.360]   and I cannot easily turn it off and on.
[01:13:25.360 --> 01:13:26.200]   But that's all right.
[01:13:26.200 --> 01:13:28.400]   Let's not worry about this right now.
[01:13:28.400 --> 01:13:30.680]   Let's get back here and see what happens.
[01:13:30.680 --> 01:13:33.560]   So when was David ANR born?
[01:13:33.560 --> 01:13:37.440]   All that you see here are the different calls
[01:13:37.440 --> 01:13:39.560]   that are shared.
[01:13:39.560 --> 01:13:43.360]   And they appear as callbacks that have
[01:13:43.360 --> 01:13:45.600]   been implemented in any agent.
[01:13:45.600 --> 01:13:48.920]   So whenever you run those few lines of code
[01:13:48.920 --> 01:13:57.000]   in your Python file, like these few lines of code, by default,
[01:13:57.000 --> 01:14:00.920]   you also have all of these logging available.
[01:14:00.920 --> 01:14:01.640]   OK.
[01:14:01.640 --> 01:14:03.600]   So you know the system prompt.
[01:14:03.600 --> 01:14:05.320]   You know the user prompt.
[01:14:05.320 --> 01:14:06.840]   And you know the final answer.
[01:14:06.840 --> 01:14:08.920]   So the model didn't do anything here.
[01:14:08.920 --> 01:14:09.880]   It had the tools.
[01:14:09.880 --> 01:14:12.600]   It just ignored them and said, I don't have access
[01:14:12.600 --> 01:14:15.200]   to David ANR's birth date information, which
[01:14:15.200 --> 01:14:18.400]   to some extent is not a bad answer,
[01:14:18.400 --> 01:14:20.440]   because it actually doesn't know.
[01:14:20.440 --> 01:14:23.160]   But it completely ignored my tools.
[01:14:23.160 --> 01:14:25.560]   So I tried to run it again.
[01:14:25.560 --> 01:14:29.520]   And this time, it hallucinates.
[01:14:29.520 --> 01:14:31.880]   So it still doesn't run any tool.
[01:14:31.880 --> 01:14:34.960]   And it tells I run way later than I am.
[01:14:34.960 --> 01:14:37.000]   So I'm very happy that it made me younger.
[01:14:37.000 --> 01:14:44.040]   But I was not very happy that it just made up an answer.
[01:14:44.040 --> 01:14:47.120]   I run it again saying, look for the birthdays file
[01:14:47.120 --> 01:14:50.800]   to find out, just to make it a bit more clear that there
[01:14:50.800 --> 01:14:53.840]   is a connection between what I ask and the tools that
[01:14:53.840 --> 01:14:55.480]   are made available.
[01:14:55.480 --> 01:14:58.280]   And it kind of gets the hint.
[01:14:58.280 --> 01:15:04.160]   And it looks for David_ANR_Birthdate.txt,
[01:15:04.160 --> 01:15:06.120]   which kind of makes sense, because there's
[01:15:06.120 --> 01:15:09.720]   no instructions given to it about what it has to search.
[01:15:09.720 --> 01:15:10.840]   So it looks for it.
[01:15:10.840 --> 01:15:12.720]   It has an empty output.
[01:15:12.720 --> 01:15:17.880]   And after that, it just says, I don't have this information.
[01:15:17.880 --> 01:15:22.040]   Sorry, I skipped that part, because it's redundant.
[01:15:22.040 --> 01:15:25.720]   Next one, look for the birthdays CSV file to find out.
[01:15:25.720 --> 01:15:27.920]   At that point, it understands that it
[01:15:27.920 --> 01:15:31.640]   has to open a birthdays.csv file.
[01:15:31.640 --> 01:15:34.640]   It gets the information that was saved in this file,
[01:15:34.640 --> 01:15:39.400]   as you see all very popular computer engineers.
[01:15:39.400 --> 01:15:42.320]   And there's me too, of course, among them.
[01:15:42.320 --> 01:15:46.120]   And then it provides me with the right answer.
[01:15:46.120 --> 01:15:47.520]   It took a while.
[01:15:47.520 --> 01:15:49.400]   So do you want an agent that is not
[01:15:49.400 --> 01:15:53.560]   even able to understand which tools it has to call
[01:15:53.560 --> 01:15:55.920]   until you tell it exactly?
[01:15:55.920 --> 01:15:57.800]   Probably not.
[01:15:57.800 --> 01:16:01.680]   So we can say QAN306 billion, perhaps, is not
[01:16:01.680 --> 01:16:04.080]   the ideal choice for this kind of agent.
[01:16:04.080 --> 01:16:06.360]   It might be good for other things.
[01:16:06.360 --> 01:16:09.240]   I don't know, summarize some text
[01:16:09.240 --> 01:16:12.480]   or translate some text in relatively small amount
[01:16:12.480 --> 01:16:13.360]   of languages.
[01:16:13.360 --> 01:16:16.160]   But probably it's not ideal to do tool calling,
[01:16:16.160 --> 01:16:20.520]   even if it has been trained to do tool calling.
[01:16:20.520 --> 01:16:24.400]   I decided to then move to another tool, another model,
[01:16:24.400 --> 01:16:27.560]   QAN3.5, 0.8 billion.
[01:16:27.560 --> 01:16:29.680]   It came out months later.
[01:16:29.680 --> 01:16:32.840]   It's just slightly larger, but you can already
[01:16:32.840 --> 01:16:34.520]   see it's a bit more verbose.
[01:16:34.520 --> 01:16:36.400]   The first time it didn't find information,
[01:16:36.400 --> 01:16:40.400]   it said, well, you might want to search by keywords like these
[01:16:40.400 --> 01:16:41.620]   if you want to see.
[01:16:41.620 --> 01:16:43.240]   And of course, some of them are made up
[01:16:43.240 --> 01:16:45.240]   because history and biology are not really
[01:16:45.240 --> 01:16:47.360]   something that concerns me.
[01:16:47.360 --> 01:16:49.240]   But it's interesting.
[01:16:49.240 --> 01:16:54.840]   You see, kind of has a different way of talking to the user.
[01:16:54.840 --> 01:16:59.160]   And when I asked again, saying, look for birthdays on disk,
[01:16:59.160 --> 01:17:04.120]   it started looking for txt first, found nothing.
[01:17:04.120 --> 01:17:08.220]   And then it moved to everything, birthday, everything.
[01:17:08.220 --> 01:17:13.320]   And then eventually found CSV files,
[01:17:13.320 --> 01:17:15.980]   in addition to agent birthday, py.
[01:17:15.980 --> 01:17:18.920]   But it decided that CSV was probably
[01:17:18.920 --> 01:17:23.120]   a better source of information rather than .py,
[01:17:23.120 --> 01:17:25.600]   and then provided the correct information
[01:17:25.600 --> 01:17:30.040]   after getting the actual content of the file.
[01:17:30.040 --> 01:17:31.720]   So this is a bit better.
[01:17:31.720 --> 01:17:33.000]   It's still not perfect.
[01:17:33.000 --> 01:17:34.920]   You see, there are some attempts.
[01:17:34.920 --> 01:17:37.320]   It kind of recovered from these attempts,
[01:17:37.320 --> 01:17:40.840]   which is still part of the agentic loop and the fact
[01:17:40.840 --> 01:17:43.640]   that the errors themselves are fed back to the LLM
[01:17:43.640 --> 01:17:49.480]   and not used as a way to break the loop itself.
[01:17:49.480 --> 01:17:51.920]   This is the 9 billion model instead.
[01:17:51.920 --> 01:17:54.120]   Well, it was Davide and Acorn.
[01:17:54.120 --> 01:17:56.120]   It looks for MD files first.
[01:17:56.120 --> 01:17:57.360]   It doesn't find anything.
[01:17:57.360 --> 01:18:00.360]   And then it just looks for everything.
[01:18:00.360 --> 01:18:07.280]   Without being even asked, look on the disk, open this file,
[01:18:07.280 --> 01:18:10.160]   it automatically decides that searching
[01:18:10.160 --> 01:18:14.620]   for the most general substring and then looking
[01:18:14.620 --> 01:18:16.760]   for something that was called birthdays
[01:18:16.760 --> 01:18:22.720]   was the best path to answer this kind of question.
[01:18:22.720 --> 01:18:24.000]   Sorry.
[01:18:24.000 --> 01:18:30.800]   Yes, but I also provided with wrong data,
[01:18:30.800 --> 01:18:32.880]   earned a guess to the right file,
[01:18:32.880 --> 01:18:35.240]   and shows you the right answer.
[01:18:35.240 --> 01:18:42.600]   And then I tried to challenge it with wrong data.
[01:18:42.600 --> 01:18:45.400]   So I asked, when was Alan Turing born?
[01:18:45.400 --> 01:18:47.880]   And here, I'm not sure you noticed,
[01:18:47.880 --> 01:18:51.600]   I didn't because I realized I did a mistake by pasting
[01:18:51.600 --> 01:18:56.520]   the death date and not the birth date of Alan Turing here.
[01:18:56.520 --> 01:19:01.400]   And I realized that only when 23.59 billion
[01:19:01.400 --> 01:19:05.000]   refused to answer with the date that was
[01:19:05.000 --> 01:19:07.200]   coming from the CSV file.
[01:19:07.200 --> 01:19:11.640]   And then I said, trust the tools answers.
[01:19:11.640 --> 01:19:15.640]   And once I said that, it provided the date that I had
[01:19:15.640 --> 01:19:20.200]   in there, but decided to add according to the birthdays CSV
[01:19:20.200 --> 01:19:25.320]   file because for the tool itself, for the LLM itself,
[01:19:25.320 --> 01:19:27.440]   it kind of felt weird.
[01:19:27.440 --> 01:19:29.840]   So there was a very low probability
[01:19:29.840 --> 01:19:34.440]   that that date was associated to Alan Turing birth
[01:19:34.440 --> 01:19:36.720]   and not Alan Turing death.
[01:19:36.720 --> 01:19:39.280]   So I found this interesting because it's not just
[01:19:39.280 --> 01:19:42.720]   the model that is more general, more expressive
[01:19:42.720 --> 01:19:44.480]   than the smaller ones.
[01:19:44.480 --> 01:19:49.160]   It's also a model that has some knowledge
[01:19:49.160 --> 01:19:52.200]   and will fight to continue providing you
[01:19:52.200 --> 01:19:55.280]   that knowledge, which can be a pro or a con
[01:19:55.280 --> 01:19:57.320]   if that knowledge is not the correct one.
[01:19:57.320 --> 01:20:00.560]   Assume you wanted to know when Alan Turing was born,
[01:20:00.560 --> 01:20:04.080]   but the Alan Turing you referred to is not the Alan Turing
[01:20:04.080 --> 01:20:05.040]   that everyone knows.
[01:20:05.040 --> 01:20:07.120]   It's just the harmony.
[01:20:07.120 --> 01:20:10.520]   In that case, you really want to get it from the CSV file
[01:20:10.520 --> 01:20:13.480]   and the model will insist to always provide you
[01:20:13.480 --> 01:20:15.600]   with other information.
[01:20:15.600 --> 01:20:21.040]   So what does it mean to have a nine billion parameter model
[01:20:21.040 --> 01:20:22.440]   running on your hardware?
[01:20:22.440 --> 01:20:25.480]   So nine billion parameters are something
[01:20:25.480 --> 01:20:31.560]   that runs on as few as, I would say, 16 gigabytes of RAM
[01:20:31.560 --> 01:20:34.240]   relatively easily.
[01:20:34.240 --> 01:20:39.120]   So one thing you can play with is the context size.
[01:20:39.120 --> 01:20:41.200]   So you can make it larger or smaller
[01:20:41.200 --> 01:20:43.280]   as a parameter when you run llama file
[01:20:43.280 --> 01:20:48.040]   or any other of these local inference servers.
[01:20:48.040 --> 01:20:51.400]   And I think by playing with these,
[01:20:51.400 --> 01:20:53.920]   you will also realize how much you
[01:20:53.920 --> 01:20:57.640]   can push a system to use a more advanced with more parameters
[01:20:57.640 --> 01:21:00.760]   model rather than one that is way slower.
[01:21:00.760 --> 01:21:03.120]   Before going here, I want to show you
[01:21:03.120 --> 01:21:08.400]   one of the limits that is exactly the context size.
[01:21:08.400 --> 01:21:13.080]   So at least until some time ago, the default context size
[01:21:13.080 --> 01:21:16.400]   on tools such as llama, for instance,
[01:21:16.400 --> 01:21:21.760]   was 4K tokens for 1,096 tokens.
[01:21:21.760 --> 01:21:25.840]   And the outcome was that even with simple questions
[01:21:25.840 --> 01:21:27.600]   such as this, like how many stars
[01:21:27.600 --> 01:21:30.720]   does Mozilla AI any agent have on GitHub,
[01:21:30.720 --> 01:21:33.440]   the result you had was completely gibberish just
[01:21:33.440 --> 01:21:36.200]   because everything went out of the context.
[01:21:36.200 --> 01:21:40.040]   Imagine the agent downloading a web page,
[01:21:40.040 --> 01:21:43.240]   not being able to store all of that web
[01:21:43.240 --> 01:21:45.280]   page inside its own context.
[01:21:45.280 --> 01:21:49.120]   So dropping part of this, maybe dropping the question itself,
[01:21:49.120 --> 01:21:50.720]   and then the agent says like, yeah, I
[01:21:50.720 --> 01:21:52.600]   should answer something about this web page.
[01:21:52.600 --> 01:21:54.760]   But I don't really remember what exactly.
[01:21:54.760 --> 01:21:56.320]   And the first thing it comes out with
[01:21:56.320 --> 01:21:59.480]   is, OK, I'm just going to talk about the installation
[01:21:59.480 --> 01:22:00.240]   of any agent.
[01:22:00.240 --> 01:22:02.840]   But the question was completely different.
[01:22:02.840 --> 01:22:06.800]   So this, to me, is still part of knowing the tools
[01:22:06.800 --> 01:22:07.840]   that you are using, right?
[01:22:07.840 --> 01:22:10.560]   So whenever you use an inference engine,
[01:22:10.560 --> 01:22:16.800]   whether it's llama file or llama or llama CPP or LM Studio,
[01:22:16.800 --> 01:22:21.800]   always look for the context size because if that's too small,
[01:22:21.800 --> 01:22:24.000]   your agent is just not going to work.
[01:22:24.000 --> 01:22:27.040]   For some tools, it's going to crash because it just goes,
[01:22:27.040 --> 01:22:30.000]   let's say, out of memory or out of context.
[01:22:30.000 --> 01:22:32.480]   It doesn't break in a bad way.
[01:22:32.480 --> 01:22:34.400]   It will just tell you, no, no, no, no.
[01:22:34.400 --> 01:22:36.480]   I just went over the context size.
[01:22:36.480 --> 01:22:38.920]   I'm not going to add new stuff to the context.
[01:22:38.920 --> 01:22:40.760]   In other cases, it will tell you nothing
[01:22:40.760 --> 01:22:43.320]   like what happened here with llama.
[01:22:43.320 --> 01:22:48.480]   And it will just go on simply working worse than it could.
[01:22:48.480 --> 01:22:51.760]   And this, again, is not fair when you compare it
[01:22:51.760 --> 01:22:54.480]   to a commercial service because in that case,
[01:22:54.480 --> 01:22:57.080]   they already have all the engineering made
[01:22:57.080 --> 01:23:00.040]   to work with these things properly.
[01:23:00.040 --> 01:23:01.920]   So we did the context.
[01:23:01.920 --> 01:23:04.160]   We did the model expressiveness.
[01:23:04.160 --> 01:23:06.880]   Next step is strawberry fields forever.
[01:23:06.880 --> 01:23:11.680]   This is something that happens in the background when
[01:23:11.680 --> 01:23:15.400]   you run the same question we ran before about how many
[01:23:15.400 --> 01:23:19.360]   hours are in strawberries with the older Q3 8 billion
[01:23:19.360 --> 01:23:21.280]   with think mode activated.
[01:23:21.280 --> 01:23:25.400]   So some models are set to overthink.
[01:23:25.400 --> 01:23:27.960]   This is one example of what it means to overthink.
[01:23:27.960 --> 01:23:31.040]   So if you ever read about a model that thinks a lot,
[01:23:31.040 --> 01:23:33.000]   this is what happens in the background.
[01:23:33.000 --> 01:23:34.640]   And it's terrible.
[01:23:34.640 --> 01:23:37.800]   This is one example that I highlighted.
[01:23:37.800 --> 01:23:40.880]   So strawberry breaks down as ba, ba, ba, ba.
[01:23:40.880 --> 01:23:45.480]   So that's one R in straw and one R in arbery.
[01:23:45.480 --> 01:23:48.200]   And wait, maybe I should split it properly.
[01:23:48.200 --> 01:23:51.400]   And the only reason why we got the right answer
[01:23:51.400 --> 01:23:54.640]   was that at some point, the think mode ended
[01:23:54.640 --> 01:23:57.000]   and the tool decided to call the tool
[01:23:57.000 --> 01:24:01.080]   and agreed that it had to trust the tool answer and return 5.
[01:24:01.080 --> 01:24:03.920]   Otherwise, the answer would have been bad again.
[01:24:03.920 --> 01:24:06.440]   So the tools are a good kind of guardrail
[01:24:06.440 --> 01:24:11.880]   to allow you to get better results whenever you are asking
[01:24:11.880 --> 01:24:13.040]   a question to your agent.
[01:24:13.040 --> 01:24:17.480]   These tools can also be used to get
[01:24:17.480 --> 01:24:20.680]   better model-specific insights.
[01:24:20.680 --> 01:24:23.240]   I told you about Q3, the small Q3s.
[01:24:23.240 --> 01:24:24.520]   They're a bit dumb.
[01:24:24.520 --> 01:24:28.720]   I told you about the bigger countries overthink.
[01:24:28.720 --> 01:24:30.760]   This one is on GPT-OSS.
[01:24:30.760 --> 01:24:32.720]   It always searches on the web.
[01:24:32.720 --> 01:24:36.640]   So if GPT-OSS, which is a pretty powerful model, especially
[01:24:36.640 --> 01:24:41.480]   for the time when it came out, it was about one year ago,
[01:24:41.480 --> 01:24:44.840]   it always searches for information on the web.
[01:24:44.840 --> 01:24:47.080]   It feels like it has been trained
[01:24:47.080 --> 01:24:50.800]   to use search as much as possible
[01:24:50.800 --> 01:24:53.120]   to make sure that the information it gives you
[01:24:53.120 --> 01:24:56.240]   is always up to date and not hallucinated.
[01:24:56.240 --> 01:24:58.360]   So first of all, it needs context,
[01:24:58.360 --> 01:25:01.960]   because everything it gets from a search,
[01:25:01.960 --> 01:25:04.360]   it has to be put back into its outputs,
[01:25:04.360 --> 01:25:06.600]   and so it has to pass through the context.
[01:25:06.600 --> 01:25:09.400]   But also, even for simple information,
[01:25:09.400 --> 01:25:14.720]   like here we were asking for TV shows and the exact release
[01:25:14.720 --> 01:25:17.080]   date, genre, and so on.
[01:25:17.080 --> 01:25:19.720]   So the first query was training TV shows.
[01:25:19.720 --> 01:25:22.520]   And then once it got a list of TV shows, one of them
[01:25:22.520 --> 01:25:25.120]   was "The Last of Us" season 2, it
[01:25:25.120 --> 01:25:28.720]   looked for the release date of that specific show.
[01:25:28.720 --> 01:25:32.360]   Then probably not finding it or willing to look for the others,
[01:25:32.360 --> 01:25:35.880]   it was searching for training TV shows list release date.
[01:25:35.880 --> 01:25:39.360]   So you can see how much that model depended on search
[01:25:39.360 --> 01:25:40.480]   for every single thing.
[01:25:40.480 --> 01:25:43.120]   Everything it didn't find explicitly,
[01:25:43.120 --> 01:25:46.280]   it was searching for this, and again and again.
[01:25:46.280 --> 01:25:49.840]   What happens if you take away the search tool?
[01:25:49.840 --> 01:25:51.720]   This is the result.
[01:25:51.720 --> 01:25:54.240]   It hits Wikipedia like crazy.
[01:25:54.240 --> 01:25:56.320]   So it could still access websites
[01:25:56.320 --> 01:25:58.480]   by doing fetch web page.
[01:25:58.480 --> 01:26:01.160]   It couldn't search, so it just made up
[01:26:01.160 --> 01:26:03.360]   a lot of Wikipedia page titles.
[01:26:03.360 --> 01:26:05.800]   Some of them were valid, some others weren't.
[01:26:05.800 --> 01:26:08.720]   Like these 404s are made up Wikipedia page
[01:26:08.720 --> 01:26:10.760]   that are not existing.
[01:26:10.760 --> 01:26:12.760]   And for each of them, it was just
[01:26:12.760 --> 01:26:17.040]   trying to get release date or extra information.
[01:26:17.040 --> 01:26:20.280]   This is kind of important, right?
[01:26:20.280 --> 01:26:22.880]   You have an agent that runs stuff for you automatically,
[01:26:22.880 --> 01:26:25.560]   and for one question you ask, it's
[01:26:25.560 --> 01:26:29.960]   hitting Wikipedia like 20 something times.
[01:26:29.960 --> 01:26:31.600]   Is Wikipedia happy about this?
[01:26:31.600 --> 01:26:32.680]   Of course not.
[01:26:32.680 --> 01:26:34.640]   Like about one year ago, Wikipedia
[01:26:34.640 --> 01:26:38.600]   decided to close its access to user agents that were not
[01:26:38.600 --> 01:26:40.160]   the browser, basically.
[01:26:40.160 --> 01:26:44.240]   So if you have an agent that tries to directly hit Wikipedia,
[01:26:44.240 --> 01:26:47.520]   you're going to have a hard time making it work.
[01:26:47.520 --> 01:26:50.320]   And there's an extra information, which is--
[01:26:50.320 --> 01:26:53.880]   not information, but thought you should think about,
[01:26:53.880 --> 01:26:57.760]   which is who owns these search tools.
[01:26:57.760 --> 01:27:00.080]   In this case, it was Ollama.
[01:27:00.080 --> 01:27:03.520]   So when OpenAI and Ollama made a deal
[01:27:03.520 --> 01:27:10.680]   to release GPT-OSS as at last an OpenAI, open-weights model that
[01:27:10.680 --> 01:27:14.800]   runs locally, well, the thing stopped to be local.
[01:27:14.800 --> 01:27:17.760]   So Ollama added an extra parameter
[01:27:17.760 --> 01:27:21.120]   in its configuration that was called the airplane mode that
[01:27:21.120 --> 01:27:24.720]   was off by default. So the offline inference server
[01:27:24.720 --> 01:27:28.800]   was not offline anymore, was online by default all the time.
[01:27:28.800 --> 01:27:32.120]   And it offered a built-in web search
[01:27:32.120 --> 01:27:34.480]   that, in that case, can be optionally enabled
[01:27:34.480 --> 01:27:37.200]   to augment the model with the latest information that
[01:27:37.200 --> 01:27:41.920]   was functional to making GPT-OSS work because they themselves
[01:27:41.920 --> 01:27:47.280]   knew that GPT-OSS hit search engines like crazy.
[01:27:47.280 --> 01:27:51.440]   At that point, though, you might wonder,
[01:27:51.440 --> 01:27:53.960]   where does my information go?
[01:27:53.960 --> 01:27:55.760]   Where do my queries end up?
[01:27:55.760 --> 01:27:57.760]   And then you have to look into the code.
[01:27:57.760 --> 01:28:01.400]   And sadly, the Ollama part of the code
[01:28:01.400 --> 01:28:03.960]   that dealt with these things was not open source.
[01:28:03.960 --> 01:28:07.080]   So you couldn't really know where your information went.
[01:28:09.840 --> 01:28:13.200]   OK, I don't have conclusions yet.
[01:28:13.200 --> 01:28:15.880]   Meaning, yes, I want to leave you some time for questions.
[01:28:15.880 --> 01:28:18.680]   But I want to show you a couple of things more.
[01:28:18.680 --> 01:28:22.360]   It's like, what else can I do with these tools, right?
[01:28:22.360 --> 01:28:25.640]   I told you I would have shown you a couple more things.
[01:28:25.640 --> 01:28:30.240]   One thing I did--
[01:28:30.240 --> 01:28:35.200]   and this is kind of related to hitting Wikipedia like crazy--
[01:28:35.200 --> 01:28:38.320]   was I checked out a file format called Zim.
[01:28:38.320 --> 01:28:39.880]   I don't know if you know that.
[01:28:39.880 --> 01:28:42.280]   If you connect to the Wikix.
[01:28:42.280 --> 01:28:46.560]   Oh, Kiwix.
[01:28:46.560 --> 01:28:49.160]   OK, I was misspelling it, sorry.
[01:28:49.160 --> 01:28:52.280]   Kiwix is an application and a format
[01:28:52.280 --> 01:28:58.360]   that allows you to access Wikipedia offline.
[01:28:58.360 --> 01:29:01.280]   And there's a no-profit working on this.
[01:29:01.280 --> 01:29:04.320]   And it basically allows you-- let
[01:29:04.320 --> 01:29:06.920]   me just check if I can find more information about the Zim
[01:29:06.920 --> 01:29:10.520]   format itself on the fly.
[01:29:10.520 --> 01:29:12.960]   And you can download--
[01:29:12.960 --> 01:29:15.440]   of course, not right now.
[01:29:15.440 --> 01:29:19.160]   And oh, this time at least it managed to hit Wikipedia.
[01:29:19.160 --> 01:29:20.600]   I'm going to just tell you.
[01:29:20.600 --> 01:29:25.400]   So you can download the whole dump of Wikipedia.
[01:29:25.400 --> 01:29:26.800]   If you take the full version, it's
[01:29:26.800 --> 01:29:29.200]   like 50-something gigabytes.
[01:29:29.200 --> 01:29:30.720]   But you can have a smaller version
[01:29:30.720 --> 01:29:33.600]   of that with just the text and no images, for instance,
[01:29:33.600 --> 01:29:37.600]   for, I would say, a dozen gigabytes probably.
[01:29:37.600 --> 01:29:40.680]   And you can make it available locally on your disk.
[01:29:40.680 --> 01:29:47.120]   And there's an MCP tool that's called Zim MCP Server that
[01:29:47.120 --> 01:29:49.640]   looks for Zim files in the directory
[01:29:49.640 --> 01:29:51.880]   and that can give you information that's
[01:29:51.880 --> 01:30:03.080]   taken live from [INAUDIBLE] on your disk to [INAUDIBLE]
[01:30:03.080 --> 01:30:05.640]   This is a friend of mine who's on Wikipedia
[01:30:05.640 --> 01:30:09.520]   and whose birthdate is not known by models usually
[01:30:09.520 --> 01:30:12.720]   because it's not as famous as Alan Turing, I would say.
[01:30:12.720 --> 01:30:14.680]   But at the same time, whenever you
[01:30:14.680 --> 01:30:17.760]   make this agent available with access to Wikipedia,
[01:30:17.760 --> 01:30:20.360]   it will find the right answer.
[01:30:20.360 --> 01:30:22.120]   So this is one example of things that you
[01:30:22.120 --> 01:30:25.240]   can run offline even with a small model
[01:30:25.240 --> 01:30:27.840]   with a pretty good performance.
[01:30:27.840 --> 01:30:31.200]   Another thing is an agent that I built on my note-taking tool,
[01:30:31.200 --> 01:30:32.360]   which is called Joplin.
[01:30:32.360 --> 01:30:34.400]   I don't know if you know that.
[01:30:34.400 --> 01:30:37.040]   It's not important that you use Joplin specifically.
[01:30:37.040 --> 01:30:41.720]   I just wanted you to show what happens when you run this
[01:30:41.720 --> 01:30:43.400]   together with some information that's
[01:30:43.400 --> 01:30:49.360]   been nurtured and collected to provide a good knowledge base.
[01:30:49.360 --> 01:30:52.200]   So this is Joplin, the note-taking tool.
[01:30:52.200 --> 01:30:57.720]   And I built a tiny wiki inside it all speaking about--
[01:30:57.720 --> 01:31:01.040]   not all, mostly speaking about the LLama file project.
[01:31:01.040 --> 01:31:04.360]   So I can go on the LLama file section,
[01:31:04.360 --> 01:31:08.080]   check information about the GPU backends or the features
[01:31:08.080 --> 01:31:10.440]   that I'm implementing and everything.
[01:31:10.440 --> 01:31:14.720]   All of this has been built using a pattern that has been shared
[01:31:14.720 --> 01:31:17.000]   recently by Andrej Karpathy.
[01:31:17.000 --> 01:31:20.360]   And I think I have the actual link for that, which I think
[01:31:20.360 --> 01:31:21.680]   is super interesting.
[01:31:21.680 --> 01:31:25.800]   And I'm going to paste it in the chat.
[01:31:25.800 --> 01:31:28.440]   This is just a gist.
[01:31:28.440 --> 01:31:32.080]   And you can copy-paste this into Cloud, for instance,
[01:31:32.080 --> 01:31:35.880]   and ask Cloud to help you to build this LLM wiki, which
[01:31:35.880 --> 01:31:40.960]   is a wiki populated by LLMs accessing your notes
[01:31:40.960 --> 01:31:43.760]   and information and restructuring knowledge
[01:31:43.760 --> 01:31:47.600]   in a way that makes it more easily accessible to an agent.
[01:31:47.600 --> 01:31:49.120]   So I took this.
[01:31:49.120 --> 01:31:51.560]   I created a wiki about my project LLama file
[01:31:51.560 --> 01:31:53.960]   just by collecting all the notes that I had previously.
[01:31:53.960 --> 01:31:55.480]   And this is the wiki.
[01:31:55.480 --> 01:32:00.200]   And then I asked an agent to connect to Joplin,
[01:32:00.200 --> 01:32:02.320]   looking into the wiki, and tell which
[01:32:02.320 --> 01:32:06.320]   are the main issues that I fixed that relate to LLama file GPU
[01:32:06.320 --> 01:32:07.800]   acceleration.
[01:32:07.800 --> 01:32:10.440]   And the result is this one, which
[01:32:10.440 --> 01:32:14.920]   I believe is a pretty good summary of what I worked on
[01:32:14.920 --> 01:32:19.600]   with the sections that relate to the part of the documentation
[01:32:19.600 --> 01:32:20.920]   that I wrote.
[01:32:20.920 --> 01:32:24.320]   And all of this is stuff that you can run locally.
[01:32:24.320 --> 01:32:30.240]   In this case, I think I still use the LLama file QEM 3.59
[01:32:30.240 --> 01:32:31.000]   billion model.
[01:32:31.000 --> 01:32:32.280]   So it's a tiny model.
[01:32:32.280 --> 01:32:36.920]   You don't need a 30 billion parameter model for that.
[01:32:36.920 --> 01:32:40.240]   And I think this is a pretty decent result.
[01:32:40.240 --> 01:32:41.840]   Last but not least, because I don't
[01:32:41.840 --> 01:32:46.120]   want to keep you more than allowed or required,
[01:32:46.120 --> 01:32:47.600]   this is Spike.
[01:32:47.600 --> 01:32:51.000]   And I added it just one single extension,
[01:32:51.000 --> 01:32:53.960]   which is the CirX-NG extension.
[01:32:53.960 --> 01:32:56.600]   And I really wanted you to know about this,
[01:32:56.600 --> 01:33:00.360]   because before I told you who is owning your CirX engine.
[01:33:00.360 --> 01:33:02.720]   And I showed you the example with LLama,
[01:33:02.720 --> 01:33:05.160]   where you don't really know anything about that.
[01:33:05.160 --> 01:33:07.120]   I showed you another example, which
[01:33:07.120 --> 01:33:10.080]   is the Tabili example, where you know it's an API,
[01:33:10.080 --> 01:33:12.640]   but still you have to pay for that.
[01:33:12.640 --> 01:33:15.200]   And still, it's something you don't
[01:33:15.200 --> 01:33:17.760]   know how open or closed it is.
[01:33:17.760 --> 01:33:21.080]   There is this project called CirX-NG,
[01:33:21.080 --> 01:33:30.680]   which is CirX-NG, and it's on a GitHub repo.
[01:33:30.680 --> 01:33:34.400]   I'm going to paste this in the Slack channel too.
[01:33:34.400 --> 01:33:37.920]   And it's a free internet meta search engine,
[01:33:37.920 --> 01:33:41.600]   which aggregates results from various services and databases.
[01:33:41.600 --> 01:33:44.000]   Exactly my default search engine here,
[01:33:44.000 --> 01:33:48.880]   if I write David Einard, all this information
[01:33:48.880 --> 01:33:53.000]   is hitting one Raspberry Pi, another one, not the one
[01:33:53.000 --> 01:33:55.000]   that I couldn't access before.
[01:33:55.000 --> 01:34:01.440]   And these are the services that have been called right now.
[01:34:01.440 --> 01:34:03.560]   Right now, I have ran too many requests
[01:34:03.560 --> 01:34:05.760]   in the last short amount of time.
[01:34:05.760 --> 01:34:07.800]   So these two are suspended, but I
[01:34:07.800 --> 01:34:11.320]   hit .taco, Wikipedia, Startpage, and Google,
[01:34:11.320 --> 01:34:14.040]   got mixed information, aggregated all of them,
[01:34:14.040 --> 01:34:15.520]   and have them available.
[01:34:15.520 --> 01:34:18.520]   All of these runs even in a Docker image, if you want.
[01:34:18.520 --> 01:34:23.480]   And you can use an extension that runs in Pi, which
[01:34:23.480 --> 01:34:24.960]   you run from your terminal.
[01:34:24.960 --> 01:34:29.640]   And then you can say, OK, I'm going to spin up a model.
[01:34:29.640 --> 01:34:31.320]   Let me see what I'm running right now.
[01:34:31.320 --> 01:34:32.840]   This-- oh, nothing.
[01:34:32.840 --> 01:34:39.200]   So let's start Q3 9 billion again, 3.5 9 billion,
[01:34:39.200 --> 01:34:42.440]   which should be fast enough for us to do this thing.
[01:34:42.440 --> 01:34:45.800]   And then I'm asking, what are five qubit receivers
[01:34:45.800 --> 01:34:48.880]   to watch in the second half of 2026,
[01:34:48.880 --> 01:34:50.800]   just to make sure this is not part
[01:34:50.800 --> 01:34:55.080]   of any pre-trained data of the model?
[01:34:55.080 --> 01:34:58.400]   And these are searches it is running.
[01:34:58.400 --> 01:35:00.720]   It actually seems like it's not connecting
[01:35:00.720 --> 01:35:06.280]   to SuxNG at the moment.
[01:35:06.280 --> 01:35:09.000]   This is interesting.
[01:35:09.000 --> 01:35:11.640]   OK, let's drop it for now.
[01:35:11.640 --> 01:35:13.800]   The good thing is that it provides
[01:35:13.800 --> 01:35:16.640]   some interesting information about where to get it.
[01:35:16.640 --> 01:35:19.320]   Otherwise, I think it's mostly due to how
[01:35:19.320 --> 01:35:21.400]   I configure SuxNG in Pi.
[01:35:21.400 --> 01:35:23.560]   And so I don't want to mess with this now.
[01:35:23.560 --> 01:35:26.160]   But if you have any questions or follow-ups to this,
[01:35:26.160 --> 01:35:28.240]   please let me know in Slack.
[01:35:28.240 --> 01:35:30.760]   And I can provide you a working configuration for this,
[01:35:30.760 --> 01:35:31.920]   because I tested it before.
[01:35:31.920 --> 01:35:34.600]   And of course, before it was working.
[01:35:34.600 --> 01:35:36.080]   Let me go to the conclusions now.
[01:35:39.520 --> 01:35:45.600]   So we saw a variety of different approaches to creating agents.
[01:35:45.600 --> 01:35:48.480]   It's not just building agents from scratch with code.
[01:35:48.480 --> 01:35:51.760]   It's also using tools that rely on a genetic code.
[01:35:51.760 --> 01:35:53.760]   But the most important thing is that we
[01:35:53.760 --> 01:35:56.680]   knew some of the common characteristics
[01:35:56.680 --> 01:35:59.360]   between these different agents.
[01:35:59.360 --> 01:36:03.040]   So the callbacks, the logging, the choice
[01:36:03.040 --> 01:36:07.360]   of the model, the agentic loop, how the agent deals with errors
[01:36:07.360 --> 01:36:11.800]   by never breaking but providing the error back, and so on.
[01:36:11.800 --> 01:36:15.240]   So I think there's tinkering with AI.
[01:36:15.240 --> 01:36:17.880]   That is, tinkering by using AI, which
[01:36:17.880 --> 01:36:20.880]   is very similar to what I did before at the very beginning
[01:36:20.880 --> 01:36:23.680]   by trying to use Cloud to do reverse engineering
[01:36:23.680 --> 01:36:26.640]   of that Dremitalia API.
[01:36:26.640 --> 01:36:30.040]   And there's tinkering with AI, which is AI
[01:36:30.040 --> 01:36:32.200]   is the object of your tinkering.
[01:36:32.200 --> 01:36:35.600]   And while I haven't provided you with a one-size-fits-all
[01:36:35.600 --> 01:36:39.560]   solution for coding or writing any kind of agent,
[01:36:39.560 --> 01:36:43.280]   I think at least the approach of using the tools that you have,
[01:36:43.280 --> 01:36:46.640]   trying to know them better, to learn more
[01:36:46.640 --> 01:36:50.160]   is definitely the one that pays off in the longer term.
[01:36:50.160 --> 01:36:53.000]   And my comment here is you can solve things very quickly
[01:36:53.000 --> 01:36:54.040]   with a former.
[01:36:54.040 --> 01:36:55.800]   Just open Cloud, ask a question.
[01:36:55.800 --> 01:36:58.640]   It will very likely help you.
[01:36:58.640 --> 01:37:01.600]   It will very likely be very happy to help you,
[01:37:01.600 --> 01:37:02.960]   even for free right now.
[01:37:02.960 --> 01:37:06.440]   But at some point, you will hit a wall where it costs too much,
[01:37:06.440 --> 01:37:08.960]   or you depend on it too much, and so on.
[01:37:08.960 --> 01:37:11.800]   While with a later approach, you will learn more,
[01:37:11.800 --> 01:37:14.200]   and you will be more free in terms of the choices
[01:37:14.200 --> 01:37:17.280]   that you can make it at a later stage.
[01:37:17.280 --> 01:37:20.160]   Also, you have seen there's no one-size-fits-all solution.
[01:37:20.160 --> 01:37:24.160]   Depending on your task, the compute you have available,
[01:37:24.160 --> 01:37:26.920]   how the model was trained, whether it has tool support
[01:37:26.920 --> 01:37:30.320]   or not, you will see that choosing one model or another
[01:37:30.320 --> 01:37:32.760]   might be better.
[01:37:32.760 --> 01:37:35.440]   Easy-shmeasy, meaning very often you
[01:37:35.440 --> 01:37:38.120]   find tools which are friendly, but maybe they
[01:37:38.120 --> 01:37:40.440]   are not actually useful to you.
[01:37:40.440 --> 01:37:42.360]   Their UX might evolve too fast.
[01:37:42.360 --> 01:37:44.680]   Things might break.
[01:37:44.680 --> 01:37:47.640]   The defaults you have in the tool are not the best ones.
[01:37:47.640 --> 01:37:50.760]   Sometimes, not always, but at least sometimes,
[01:37:50.760 --> 01:37:53.280]   especially if you want to use this as a learning tool
[01:37:53.280 --> 01:37:56.120]   and not just as a tool that brings you the solution out
[01:37:56.120 --> 01:38:00.280]   of the box, devoting some time into setting things up
[01:38:00.280 --> 01:38:03.720]   from the bottom up or from scratch,
[01:38:03.720 --> 01:38:07.960]   even if it sounds a bit drastic, but let's say bottom up,
[01:38:07.960 --> 01:38:11.280]   is probably the approach that pays
[01:38:11.280 --> 01:38:14.200]   the most in the longer term.
[01:38:14.200 --> 01:38:19.040]   Think small, which means limit the growth of your contacts,
[01:38:19.040 --> 01:38:22.040]   having some memories that you can pick up at the later stage,
[01:38:22.040 --> 01:38:27.400]   having a compaction of your context might help.
[01:38:27.400 --> 01:38:30.720]   Starting with few tools brings you a very, very long way.
[01:38:30.720 --> 01:38:32.160]   Just think about how many things I
[01:38:32.160 --> 01:38:36.880]   showed you with just having a search and a fetch web page.
[01:38:36.880 --> 01:38:40.040]   This already does a lot.
[01:38:40.040 --> 01:38:44.840]   And there was a post by Corey Doctorow about the fact
[01:38:44.840 --> 01:38:47.120]   that LLMs are slot machines.
[01:38:47.120 --> 01:38:50.200]   And the main idea is that, as with slot machines,
[01:38:50.200 --> 01:38:52.360]   very often you remember the success,
[01:38:52.360 --> 01:38:54.760]   but you tend to forget the failures.
[01:38:54.760 --> 01:38:57.200]   So we are super excited about the things that work.
[01:38:57.200 --> 01:39:00.960]   And try and forget about the times it didn't work.
[01:39:00.960 --> 01:39:02.600]   I would extend these to agents.
[01:39:02.600 --> 01:39:04.240]   So it's not just about LLMs.
[01:39:04.240 --> 01:39:07.440]   Agents can make things a bit more reliable,
[01:39:07.440 --> 01:39:09.920]   but they are definitely relying so much on LLMs
[01:39:09.920 --> 01:39:14.080]   that we cannot consider them always fully reliable.
[01:39:14.080 --> 01:39:18.320]   So my suggestion here is to try and make them a little bit less
[01:39:18.320 --> 01:39:19.320]   like slot machines.
[01:39:19.320 --> 01:39:22.840]   So the successes you get are more frequent.
[01:39:22.840 --> 01:39:25.520]   So try and tune them with respect
[01:39:25.520 --> 01:39:29.160]   to the kind of problem you have so that it's not
[01:39:29.160 --> 01:39:32.160]   like every time you run them, they behave differently,
[01:39:32.160 --> 01:39:34.960]   but you have some baseline or some grounding
[01:39:34.960 --> 01:39:38.960]   you can use to make them behave more predictably.
[01:39:38.960 --> 01:39:41.760]   And last but not least, prevent ancientification.
[01:39:41.760 --> 01:39:44.320]   I know it's not a great term to use in talks,
[01:39:44.320 --> 01:39:47.280]   but I think it's probably now in everyone's vocabulary.
[01:39:47.280 --> 01:39:50.080]   Still by Corey Doctorow, if you haven't heard it before,
[01:39:50.080 --> 01:39:50.840]   just look for it.
[01:39:50.840 --> 01:39:53.040]   There's plenty of material about that.
[01:39:53.040 --> 01:39:58.040]   But in general, the approach is, which kind of data or freedoms
[01:39:58.040 --> 01:40:01.920]   or control are you giving away with each choice you make?
[01:40:01.920 --> 01:40:03.920]   And the choice could be the tools
[01:40:03.920 --> 01:40:09.760]   you're adding to your agents, like the search engine tool,
[01:40:09.760 --> 01:40:13.640]   or the model that you're using, or the code, the libraries
[01:40:13.640 --> 01:40:15.200]   that you're relying on.
[01:40:15.200 --> 01:40:17.200]   Try and understand whether at some point
[01:40:17.200 --> 01:40:20.160]   somebody will just turn that tab off
[01:40:20.160 --> 01:40:22.960]   and will make you unable to continue running your tools
[01:40:22.960 --> 01:40:25.680]   or not.
[01:40:25.680 --> 01:40:28.960]   Then this is just for you to think about in the longer term.
[01:40:28.960 --> 01:40:34.160]   What happens when you have an agent that just,
[01:40:34.160 --> 01:40:37.760]   without the need of REST, can go on the web
[01:40:37.760 --> 01:40:41.440]   and search for information until it finds it?
[01:40:41.440 --> 01:40:42.960]   This was one of the many examples
[01:40:42.960 --> 01:40:46.320]   I tried to run about getting somebody's birthdate.
[01:40:46.320 --> 01:40:49.440]   Sometimes I didn't give it my birthdate in a CSV file.
[01:40:49.440 --> 01:40:52.400]   I just said, go on the web and look for it.
[01:40:52.400 --> 01:40:55.040]   And I'm perfectly fine with this being around
[01:40:55.040 --> 01:40:57.680]   because I put it in my CD, which is online.
[01:40:57.680 --> 01:40:59.920]   But how much information can someone
[01:40:59.920 --> 01:41:03.600]   get with a tool that never gets tired and just goes on the web
[01:41:03.600 --> 01:41:06.000]   and looks for something?
[01:41:06.000 --> 01:41:08.720]   So I leave it to you as an open question.
[01:41:08.720 --> 01:41:10.720]   And I try to conclude with this.
[01:41:10.720 --> 01:41:14.800]   This is a choose your own adventure book-like choice.
[01:41:14.800 --> 01:41:22.160]   So I wanted to cite this power law of engagement
[01:41:22.160 --> 01:41:25.360]   that I found exactly 20 years ago on a book.
[01:41:25.360 --> 01:41:29.120]   Power law of participation, sorry.
[01:41:29.120 --> 01:41:33.920]   The idea is you can start with--
[01:41:33.920 --> 01:41:36.480]   and this was related especially to Wikipedia.
[01:41:36.480 --> 01:41:38.480]   You can read and you're already contributing
[01:41:38.480 --> 01:41:39.920]   because you are one of the many users.
[01:41:39.920 --> 01:41:42.800]   You can favorite something, tag, comment, subscribe,
[01:41:42.800 --> 01:41:45.600]   and so on until you go and lead projects.
[01:41:45.600 --> 01:41:49.840]   So I would say following this idea,
[01:41:49.840 --> 01:41:53.840]   you can just check out what we do at Mozilla AI GitHub org.
[01:41:53.840 --> 01:41:55.520]   You can play with different agents,
[01:41:55.520 --> 01:41:57.840]   even tools that we don't own.
[01:41:57.840 --> 01:42:00.320]   It's perfectly fine as long as they're open source.
[01:42:00.320 --> 01:42:01.280]   We're happy about that.
[01:42:01.280 --> 01:42:05.760]   Write your own any agent, test different models,
[01:42:05.760 --> 01:42:08.240]   try different tools and MCP servers,
[01:42:08.240 --> 01:42:11.680]   and finally host tools or services for your community.
[01:42:11.680 --> 01:42:12.800]   Anything works.
[01:42:12.800 --> 01:42:14.640]   Of course, you can also do none of those.
[01:42:14.640 --> 01:42:18.160]   Already being here has been a great experience for me,
[01:42:18.160 --> 01:42:20.400]   and I'm very happy that you joined this class.
[01:42:20.400 --> 01:42:24.080]   And these are some of the tools that we make available
[01:42:24.080 --> 01:42:28.240]   to allow people to tackle agentic coding
[01:42:28.240 --> 01:42:30.480]   at different levels of abstraction
[01:42:30.480 --> 01:42:33.680]   from agentic frameworks to choosing LLMs
[01:42:33.680 --> 01:42:36.960]   to hosting MCP tools, adding guardrails,
[01:42:36.960 --> 01:42:38.800]   or just run encoder models.
[01:42:39.360 --> 01:42:42.480]   And the last message I will give you all is like,
[01:42:42.480 --> 01:42:45.360]   be like Ada and tinker with stuff.
[01:42:45.360 --> 01:42:48.720]   Let us know how it works and have fun with this.
[01:42:48.720 --> 01:42:50.720]   And thanks a lot.
[01:42:50.720 --> 01:42:52.320]   I'm leaving the word to David,
[01:42:52.320 --> 01:42:55.360]   which maybe wants to greet you too.
[01:42:55.360 --> 01:42:57.120]   And thanks again for being here.
[01:42:57.120 --> 01:43:00.560]   I have nothing more to add.
[01:43:00.560 --> 01:43:03.280]   I am surprised by how long you can talk without drinking.
[01:43:06.400 --> 01:43:11.440]   Thanks, David. Now I will.
[01:43:11.440 --> 01:43:13.120]   I know we don't have a lot of time,
[01:43:13.120 --> 01:43:15.840]   but if you have any questions, we still have a few minutes.
[01:43:15.840 --> 01:43:22.480]   Sounds good.
[01:43:22.480 --> 01:43:28.880]   Sorry, just to check if there are any remaining questions from Slack.
[01:43:28.880 --> 01:43:43.040]   Okay, yes, I think your questions have been answered by David as well.
[01:43:43.040 --> 01:43:48.560]   So feel free to raise your hand and ask questions
[01:43:48.560 --> 01:43:50.480]   during before the end of the session.
[01:43:50.480 --> 01:43:56.400]   And just a reminder that we will have another office hour next week.
[01:43:56.400 --> 01:44:00.320]   So please feel free to practice through these tutorials
[01:44:00.320 --> 01:44:04.240]   and come back with more in-depth questions.
[01:44:04.240 --> 01:44:08.480]   I think there is one raised the hand by everyone.
[01:44:08.480 --> 01:44:11.920]   So please unmute to talk us through your questions.
[01:44:11.920 --> 01:44:12.560]   Hi David, can you hear me?
[01:44:12.560 --> 01:44:13.860]   Yeah.
[01:44:13.860 --> 01:44:17.120]   Well, first of all, thank you for the very great presentation.
[01:44:17.120 --> 01:44:23.680]   I have a question regarding memory as I'm finding it now kind of confusing.
[01:44:23.680 --> 01:44:26.160]   There are way too many options.
[01:44:26.160 --> 01:44:28.240]   I saw Karbati's wiki.
[01:44:28.240 --> 01:44:31.280]   I saw there are people using knowledge graphs.
[01:44:31.280 --> 01:44:34.640]   There are some people who don't use specialized tools
[01:44:34.640 --> 01:44:36.160]   and just leave everything in the context.
[01:44:36.160 --> 01:44:38.320]   So like, can you comment on this?
[01:44:38.320 --> 01:44:40.240]   Which tools do you think are better?
[01:44:40.240 --> 01:44:44.640]   Like what's the most efficient way to store memory in Asians?
[01:44:44.640 --> 01:44:48.240]   Really hard question.
[01:44:48.240 --> 01:44:54.560]   I guess the best answer that is not very helpful is you need to try it.
[01:44:54.560 --> 01:44:58.400]   That's why I like about building from scratch.
[01:44:58.400 --> 01:45:01.920]   Because in many times someone might recommend something
[01:45:01.920 --> 01:45:05.040]   that works great for the use case and you adopt it.
[01:45:05.040 --> 01:45:09.760]   But that system might be over complicated for your use case.
[01:45:09.760 --> 01:45:14.800]   So my personal approach is to start from the most simple approach.
[01:45:14.800 --> 01:45:17.280]   Like what I show, just tools.
[01:45:17.280 --> 01:45:19.760]   And then as I discover where are the gaps,
[01:45:19.760 --> 01:45:22.560]   where are the things that I need to improve, I improve that.
[01:45:22.560 --> 01:45:25.920]   And if I reach the point where I need a knowledge graph,
[01:45:25.920 --> 01:45:28.800]   then I will reach it and I will discover it.
[01:45:28.800 --> 01:45:32.480]   But I think probably the worst thing that you can do
[01:45:32.480 --> 01:45:34.400]   is to start with the most complex system.
[01:45:34.400 --> 01:45:38.480]   Yeah, sorry, I don't know if that helps.
[01:45:38.480 --> 01:45:39.920]   It helps a lot.
[01:45:39.920 --> 01:45:41.440]   Okay, I got you what you mean.
[01:45:41.440 --> 01:45:45.200]   I have one more small question.
[01:45:45.200 --> 01:45:46.480]   It's about costs.
[01:45:46.480 --> 01:45:48.560]   Like if I'm working with an element system,
[01:45:48.560 --> 01:45:53.280]   I'm always afraid that the users might abuse the system
[01:45:53.280 --> 01:45:56.960]   and I will have very high costs that I wouldn't expect.
[01:45:56.960 --> 01:45:58.960]   So what are some ways that I can limit this
[01:45:58.960 --> 01:46:00.640]   or control the cost of the system?
[01:46:00.640 --> 01:46:11.040]   So one way is depending on how much you want to invest
[01:46:11.040 --> 01:46:14.800]   in your own infrastructure, you can host your own models, right?
[01:46:14.800 --> 01:46:19.360]   So you fully own the infrastructure
[01:46:19.360 --> 01:46:20.960]   and you can optimize that.
[01:46:20.960 --> 01:46:24.640]   And then I guess depending on the use case,
[01:46:24.640 --> 01:46:27.600]   for example, now in one of the products that I am working,
[01:46:27.600 --> 01:46:31.040]   one way that we are figuring out how to reduce cost
[01:46:31.040 --> 01:46:34.480]   is actually use more specialized sub-agents.
[01:46:34.480 --> 01:46:39.680]   So give the user a really good default powerful agent
[01:46:39.680 --> 01:46:40.800]   that is the main driver.
[01:46:41.360 --> 01:46:46.160]   But then for a specialized task, like, I don't know, do stuff on a Slack,
[01:46:46.160 --> 01:46:50.960]   for example, we have a specialized agent that uses a much smaller model.
[01:46:50.960 --> 01:46:55.520]   And this is, I know, I don't know, this is one way of reducing cost,
[01:46:55.520 --> 01:46:58.560]   but it's an open problem.
[01:46:58.560 --> 01:47:03.120]   And this is also one of the reasons why we advocate for open source models,
[01:47:03.120 --> 01:47:09.920]   because the costs of today LLM providers like Anthropic and OpenAI
[01:47:09.920 --> 01:47:15.840]   can rise tomorrow because they are the ones fully owning it.
[01:47:15.840 --> 01:47:19.280]   And then if you are too attached to that, you have no alternative, right?
[01:47:19.280 --> 01:47:22.640]   If Anthropic starts charging whatever they want tomorrow,
[01:47:22.640 --> 01:47:24.560]   a lot of people will struggle.
[01:47:24.560 --> 01:47:29.200]   So the best option is to not be attached to any specific provider.
[01:47:29.200 --> 01:47:33.680]   Yeah, I don't have more advice.
[01:47:33.680 --> 01:47:36.800]   Okay, thank you very much. That was very helpful.
[01:47:39.360 --> 01:47:46.640]   And just to add one thing on this, I think that David raised something in both answers
[01:47:46.640 --> 01:47:48.480]   that, in my opinion, is quite important.
[01:47:48.480 --> 01:47:56.080]   Like, look and see what's best for you is, to me, is not an empty answer or too general.
[01:47:56.080 --> 01:47:59.120]   Like, it's really where it boils down to.
[01:47:59.120 --> 01:48:05.280]   And a process that's often taken by companies, especially startups,
[01:48:05.280 --> 01:48:10.080]   is let's apply the 80/20 rule, so let's try and find a solution
[01:48:10.080 --> 01:48:16.160]   that gets us there as soon as possible, and then try and reduce the costs, right?
[01:48:16.160 --> 01:48:22.800]   But one of the issues we have now is that when we do that with the current AI systems,
[01:48:22.800 --> 01:48:30.560]   it will work, but eventually it's going to be very hard to avoid being locked in
[01:48:30.560 --> 01:48:34.160]   unless you know very well exactly what you need.
[01:48:34.880 --> 01:48:40.800]   And by delving deeper and doing more about your use cases, what you need,
[01:48:40.800 --> 01:48:45.280]   what is the best model for that specific task that you need to do
[01:48:45.280 --> 01:48:47.360]   is going to be a good solution for you.
[01:48:47.360 --> 01:48:50.080]   And to some extent, Cloud itself is applying that.
[01:48:50.080 --> 01:48:56.800]   I have one example, which is if you look at the Cloud skill for code reviewing,
[01:48:56.800 --> 01:48:59.760]   you go and just look at the source code of it.
[01:48:59.760 --> 01:49:02.720]   It's going to be downloaded when you activate the plugin.
[01:49:02.720 --> 01:49:05.920]   And in the instructions, you will see that
[01:49:05.920 --> 01:49:09.520]   not all the tasks are assigned to the most powerful models.
[01:49:09.520 --> 01:49:16.000]   Like very often Haiku, which is much less powerful, is used to perform some of the tasks.
[01:49:16.000 --> 01:49:22.800]   So I think that is telling about the fact that even they try to customize models for the tasks.
[01:49:22.800 --> 01:49:26.880]   That's a brilliant idea.
[01:49:26.880 --> 01:49:28.160]   Thank you a lot.
[01:49:28.160 --> 01:49:29.120]   Thanks a lot for your session.
[01:49:29.120 --> 01:49:31.120]   Thank you very much.
[01:49:31.120 --> 01:49:44.640]   Can you hear me?
[01:49:44.640 --> 01:49:46.400]   Yes, I can.
[01:49:46.400 --> 01:49:49.680]   Hey, guys.
[01:49:49.680 --> 01:49:50.800]   Thank you for the session.
[01:49:50.800 --> 01:49:54.080]   Just one question or a thought that I thought to discuss.
[01:49:54.080 --> 01:49:57.440]   So we discussed certain tools and implementing like memory and other things.
[01:49:58.080 --> 01:50:02.640]   What we kind of think about the new SDKs or libraries that are in market as an open source
[01:50:02.640 --> 01:50:06.640]   core where these all things, specifically the contextual data, all your custom workflows
[01:50:06.640 --> 01:50:09.440]   can be implemented using those libraries or SDKs.
[01:50:09.440 --> 01:50:14.320]   So you don't have to take care about all the implementation of an orchestrator or your
[01:50:14.320 --> 01:50:14.800]   workflows.
[01:50:14.800 --> 01:50:22.560]   And so, for example, I'm talking about like recently taken or taken a line by line graph
[01:50:22.560 --> 01:50:27.120]   where you can have an orchestrator and multiple workflows that can act as your tools and the
[01:50:27.120 --> 01:50:32.080]   orchestrator can decide what workflow to follow with respect of maintaining all the audit
[01:50:32.080 --> 01:50:34.800]   frames and even the history, even retry mechanisms.
[01:50:34.800 --> 01:50:36.320]   So what do you think about all these tools?
[01:50:36.320 --> 01:50:42.800]   Like are they sufficient enough to include them when working on systems like this?
[01:50:42.800 --> 01:50:43.200]   Thank you.
[01:50:43.200 --> 01:50:46.640]   Thanks a lot for the question.
[01:50:46.640 --> 01:50:51.840]   I think it's very much everyone's choice, right?
[01:50:51.840 --> 01:50:57.520]   So I'm not against using tools that make your life easier, and I'm perfectly fine using
[01:50:57.520 --> 01:50:58.020]   any.
[01:50:58.020 --> 01:51:05.600]   I think the main reason for us for like really trying to do things from scratch in this class
[01:51:05.600 --> 01:51:09.520]   is because otherwise you probably wouldn't have many chances to do that on your own,
[01:51:09.520 --> 01:51:10.080]   right?
[01:51:10.080 --> 01:51:11.920]   And I understand that it's perfectly fine.
[01:51:11.920 --> 01:51:18.720]   Like to me, it makes a lot of sense, especially in your work, not to have to record everything
[01:51:18.720 --> 01:51:20.880]   from scratch, but to know like how things work.
[01:51:21.600 --> 01:51:31.600]   So I would say just go ahead with this and just keep an eye on whether making those decisions
[01:51:31.600 --> 01:51:37.520]   is kind of locking you into any particular decision or solution in the longer term.
[01:51:37.520 --> 01:51:40.880]   This is the general approach that I would apply.
[01:51:40.880 --> 01:51:48.720]   Otherwise, just go with it and from time to time review your decision, check if anything
[01:51:48.720 --> 01:51:54.560]   changed, check if anything is working better, or maybe if you see any pattern emerging that
[01:51:54.560 --> 01:51:55.920]   doesn't really work well for you.
[01:51:55.920 --> 01:52:00.640]   Especially if there are libraries which are open source libraries, you can be a part of
[01:52:00.640 --> 01:52:05.200]   the community and try and give your feedback, and so you can even make sure that whatever
[01:52:05.200 --> 01:52:07.760]   you use is improving together with how you use it.
[01:52:07.760 --> 01:52:16.000]   But again, it very much depends on your use case, so I would summarize it with no judgment.
[01:52:16.000 --> 01:52:22.080]   I use whatever works for you, and just try and apply the more general approach of checking
[01:52:22.080 --> 01:52:26.640]   out from time to time and being mindful about the choices that you're making, how they
[01:52:26.640 --> 01:52:30.880]   power you, and how they lock you somehow.
[01:52:30.880 --> 01:52:36.720]   Thanks John.
[01:52:36.720 --> 01:52:54.700]   [ Silence ]