
Block by Block: A Show on Web3 Growth Marketing
Each week, I sit down with the innovators and builders shaping the future of crypto and web3.
Growth isn’t a sprint; it’s a process—built gradually, step by step, block by block.
Let’s build something incredible, together. All onchain.
Block by Block: A Show on Web3 Growth Marketing
Jean Herelle - CrunchDAO, Alpha-generating Insights from Decentralized Machine Learning, AI, and Data Scientists
In this conversation, Jean Harrell from CrunchDAO shares his journey into the Web3 space, detailing his background in econometrics and computer science. He discusses the inception of CrunchDAO as a two-sided marketplace connecting data providers and machine learning engineers. The conversation delves into the use of privacy-enhancing technologies, the role of coordinators, and the potential for machine learning models to create wealth. Jean also highlights the importance of decentralization, funding strategies, and the unique community that CrunchDAO fosters, emphasizing real-world demand for their solutions. In this conversation, Jean from Crunch discusses innovative contributions to the platform, the role of data coordinators, and the importance of ensuring data quality through incentive mechanisms. He elaborates on the design of effective coordination mechanisms, engagement with healthcare organizations, and the challenges faced in data preparation. The discussion also covers the learning experiences from the closed beta, marketing strategies for growth, the future of AI agents in Crunch, diverse use cases for the platform, and the significance of predictive tasks in data science. Jean emphasizes the need to build a new category in the data science space, positioning Crunch as a leader in this emerging field.
Takeaways
- Jean's background in econometrics and computer science led him to Web3.
- The need for decentralized currency was evident in Taiwan in 2014.
- CrunchDAO connects data monopolies with skilled data scientists.
- Privacy-enhancing techniques allow data sharing without revealing sensitive information.
- The role of coordinators is crucial in building products on CrunchDAO.
- Machine learning models can create recurring revenue for data scientists.
- Decentralization is key to scaling the CrunchDAO protocol.
- Funding from VCs helped build the CrunchDAO protocol.
- Building trust in a two-sided market is essential for success.
- The community plays a vital role in the development and signaling of the protocol. Crunch is focused on identifying future value creators.
- The integration of LLMs with unstructured data is key.
- User engagement has led to significant question generation.
- Data coordinators play a crucial role in data usability.
- Quality assurance is incentivized through financial penalties.
- Coordination mechanisms are designed based on internal experience.
- The beta period will help refine the coordination process.
- Data preparation challenges require skilled personnel.
- AI agents could enhance productivity in data science.
- Predictive tasks are essential for autonomous decision-making.
Follow me @shmula on X for upcoming episodes and to get in touch with me.
We are rolling. Welcome Jean Harrell from CrunchDAO. Hi Peter. Now for those that have not heard of CrunchDAO, we'd love to hear a little bit about it. But before we do, we'd love to hear your origin story. I think like most folks that enter the Web3 space with a strong quantitative background, I think yours will be very interesting to the rest of us. Tell us your origin story. Yeah, how far should I go? Actually, my background is in econometric and stats. Then I went into computer science. I studied computer science in Taiwan, actually in Asia. And this is where I bought my first Bitcoin. It was 2014. I was studying computer science there. And in Taiwan, use for a decentralized currency and the need for it was back then kind of very obvious, right? Because with the position of China in front of the island, know, and the possibility that China could at any time attack Taiwan. And if you were someone with idea against that kind of rationale, then you better have an option where, you know, in case of China blocking your bank account, you could actually continue to survive and maybe even leave the country or this kind of thing. So they were a real use case for crypto. And that was how I started to get super interested into it because it made it made a lot of sense and also made a lot of sense for me that used to do econometrics and stats. on a lot of paper, know, and a lot of data that was provided by the university, but there is not a lot of transparency in how this data has been created and how this data has been collected. in this type of economy and decentralized, we have a decentralized currency, everything is on the record. And so you have an access to untouched, unprocessed, kind of very pure data set, is very good when you work on. on building models or if you try to understand what happened. And I found that just super interesting. so the story started there and following that, went back to France, worked for an insurance company, building machine learning models on their risk portfolio. So trying to model the risk of different types of customers, depending on what is your background, what is your history, where you come from. whether you want to ensure, cetera, et cetera, then this model were trying to build the price. Let's say that was taking into account all these parameters. So that was 2016. This company got acquired by the European leader on that field. And the founder of that company, I've seen what I could do with the models. And so he proposed to to build a quant hedge fund using similar techniques. He was himself a banker. He worked at Lehman Brothers before. And so he always wanted to become a quant. And so he saw that this technique could kind of change the way we were doing things. And so I started to implement this technique for what back then was called Data Crunch that is still existing now and with who is still working. And it was a big mistake. just realized that building models on stock market and building models on very noisy data is completely different. It was a very bad idea. I ended up completely out of my depth, very, very complicated, very easy to overfit. And so that was kind of overwhelming. So I decided to organize kind of a first, let's say, hackathon. back then to try to see if I could find someone that can do better than me. And actually in the first session that I did, was about 14 person joined. are the 14 first person of the DAO and they could already, if we were combining all the models together and made a model, they were already building something better than what I could have built alone in the last six months. So I had to accept that I was not that much of a good modeler and I better, you know, find a way to kind of get me out of that position that I was in. So. In the following weeks, I think for maybe 50 or 60 weeks in a row, we were coming with new data sets and trying to gather more people. And so it was really like one by one. We went from 14 to almost 2000 people. And we finally managed to have an algorithm that was stable. that was strong enough to build a core hedge fund on top of that. so what people don't realize, with Quantage Fund is that it doesn't scale as well as what people believe. Often there is not either enough volume to scale a phone and invest enough to be able to live of it, or you need to have a lot of AUM, you need to manage a lot of money. finally make enough of a living because you live off 20 % of your performance, so you need a substantial amount of money to actually make a good living out of it. So hedge funds don't actually scale as well as what people say. And so during the scaling of that fund, we decided to propose a tool to more people, like the kind of platform that we've built that was capable of gathering the prediction of all these participants, the crunchers, and how we could manage to build the product into that. And it works pretty well. As soon as we decided to open the platform, we got contacted by the Agile Lab, which is a research lab of the Abu Dhabi sovereign wealth funds in UAE that was looking into leveraging the same kind of crowd sourcing, let's say, or machine learning model. And that's how the story really started from zero to 100, let's say. And now we have about 7,000. machine learning engineers that are, that scientists that are on the platform, but 1200 PhDs. And we work with several different customers and we start to see a real attraction and especially in finance where people are in needs of better models and are really looking for, you know, more accuracy on their work. Great. Now tell us about CrunchDAO. You mentioned that you, from, excuse me, on its face, it's like a two-sided market, right? You've got data modelers on the one end and then, and I guess tell us more about CrunchDAO. Pretend like you're speaking to, or let's try it this way. How would you explain CrunchDAO to your mother? Yeah, I think she still don't understand actually, but I'm going to try my best. so how things work is you have two kinds of people in crunch. On one hand, you have people that have very expensive data, but that cannot share it. Why is that very expensive is because it has a predictive power. And when a data set can tell something about the future, it becomes very valuable. So. You can even call them kind of data monopolies on one hand. And on the other side, you have some researchers with a lot of skills, but that cannot access data that has predictive capability, because as soon as it has predictive capabilities, it becomes very expensive. so what Crunch is doing is enabling these two parties to meet together by allowing on one hand these data monopolies to obfuscate the data. So use some... privacy enhancing techniques that will hide the meaning of data, of the data, so they can share it openly. And on the other hand, finally, the crunchers, so the data scientists and machine learning engineer can work on very clean data that has predictive powers, so they can use these techniques that they're really good at to help these monopolies to get more out of the data that they have or that they acquire at very high price. So we kind of... third party of trust that is sitting in the middle and that enable this new kind of market and we can build all kind of new product with this interaction because often these people that are in this position also have a lot of difficulty in attracting the people that have a lot of skills in the field. And this is also what we solve with Crunch. And on the other hand, some guys have a lot of difficulties to find the opportunity because maybe they don't work in the US or in New York or the... where the hedge funds are, so they don't have these opportunities to work on this kind of data and create this kind of revenue. And Crunch is sitting in the middle. We collect the code. So when someone wants to participate, it's pushing this code. And then the customer can, through our API, trigger this code and receive the prediction of that model. So the customer never gets the model on one hand of the guys. And on the other hand, Crunch DAO and the Cruncher don't know what the data is about. because it's obfuscated. So nobody has the full equation, nobody has the full pieces. So it's kind of making sure that nobody can do anything and it's only by meeting together that we can create that value. That's very interesting. The data providers, the data is obfuscated through privacy enhancing technologies. Is it like ZK or multi-party computation or what? Maybe share a little bit about that. it could be, for the moment, it's very, very simple. We use differential privacy. And differential privacy is the most simple privacy-enhancing technique you'll find. So let me give you a very concrete example that is happening with one of the products that we're currently building, which is a mid-market price for a Forex. And so the way the data of that companies being obfuscated is we turn the actual problem of having orders and a mid-market price into what we call birds. And so the game is called birds. So you have doves and hawks that are traveling next to a line. And so your goal is to use machine learning to predict where the dove and hawk are going to. to go and happen. you completely destroy the real meaning of the data. You transform that in some kind of a game that cannot be retro-engineered to kind of understand what kind of currency we've been working with or what kind of granularity of currency we're working with, which makes the problem almost impossible to actually retro-engineer or to use the data for yourself or for something else. So it's very strong. Actually, it's used by... all the large companies in tech from LinkedIn to Meta, as soon as you work with, for example, personal data, and you still want to perform computer on it, then you'll find this kind of technique a lot. I know it's also being used at Apple. They have some manifesto about it. Yeah, it's being used everywhere. It's very simple. It's very light because you just have to transform the data, discretize the data, add a bit of noise maybe. And it allows very fast computation so you don't lose in speed. You just lose the fact that you understand what the data is about. it's a decision. And in our case, we really like it because if you work on financial data and you remove all the meaning of data, then it allows anyone to actually understand and remove the barrier to actually use the data. Everyone understands what's a bird and it's flying next to a wire. And it will be much more complex if we were talking about what the data is actually behind, what kind of maturity and contract happening and it create kind of a layer. And this layer can be used in building the model. And we really want the guy to use only quantitative techniques and only machine learning to build their model. So they are not influenced by some non-quantitative bias that may give them the impression they understand the data. And this is not what we want. We want pure. pure machine learning. That makes sense. So on the one hand, you've got data providers and you call them, what are they called in the marketplace? Just data providers, right? So now they have a new name, it's called coordinators. So coordinators are the people that are responsible of bringing customers on the network. And so sometimes it's the customer himself. If he can onboard himself, then the coordinator is directly the customer. But sometimes it's a third party that helps this customer to onboard. got it and they have a specific question that they want answered. Or maybe they're just curious about, have this data set. I wonder what kind of predictions can be made from based on this data set. Is that another type of question they might be thinking of asking? Yeah, maybe in the future for the moment, it's more, that would be super interesting by the way, for the moment it's more like specific. So you would say this is a data set. You don't know what it is and this is a target. You don't know what it is neither. have to, I don't know, let's say it's a classification problem. You have to classify, you know, these roles and you can train on all this data. classify these rows and you need to be good at that. And then once you've built a model, your model is facing live data. So the live data of the customer and this is where the scoring happen and this is where the reward get distributed and et cetera. I see. Now to what, so it's a two-sided market, data providers or coordinators, and then you've got data scientists on the other hand that apply models to the data to help answer the question. And it's a competition, is that correct? Yeah, absolutely. It's super important to have this kind of competitive framework. But at the same time, we see it as a new way, if I may say, for data engineer, mean, for machine learning engineers, sorry, and data scientists to create wealth, right? These guys can just push some code on the platform and create some side revenue that will be recurring, right? Because... Once your model is being used by a customer, then your IP is being monetized right then. So it's, if you have one model, it's nice. But if you start to have 10 models that are running for 10 different customers, then it starts to get very interesting. And we have some cruncher that actually are doing that full time now that they're looking at the model running and making sure that it runs properly. And yeah, that's what's very fascinating for us is I believe that we have the opportunity here to create a new way. new kind of work, let's say, and a new way to earn a living, which doesn't depend on you going to work somewhere, having to do the politics in the company, taking the metro, changing life, doing a nine to five. All this is kind of removed and you organize the way you want. the protocol is removing all the middle layers of management because it's purely, okay, you have to predict this. So it's purely quantitative, right? You're good or you're bad. And if you're good at it, then you get remunerated. It's like the closest analogy to this is what you find in the white hat community in cyber security, right? You have these guys with the bounties out there and the guy has to have to go after the bounty and they get paid if they manage to be good enough to unlock the bounty. Mm-hmm. works super well. think it's 99 % of the zero day attack I've been found by White Hat. It's super efficient. And that's kind of the same thing, but for machine learning where the guy are going to become so good and so important that they can afford to live in these kinds of new models where they just dictate the rules and they don't have to. operate under that legacy framework which we could work on working in a team. Well, what is the decentralized? There's been marketplaces in the past. I remember Kaggle. I don't know if you were around when Kaggle was around, but they were acquired by Google, but it was another marketplace. What's the decentralized aspect of CrunchDAO that makes it different? yeah, first Kaggle is not really a marketplace. Now Kaggle is more a platform with Hackathon. And so you can go participate into a Hackathon and then you at the end of the Hackathon, you basically give the model away to the company. And I think that is pretty inefficient because there is some example. For example, the first and most famous Kaggle Hackathon was Netflix Hackathon where I went to Yeah, you remember that. That was super cool. And I think the price was a million dollar for the app for these models. But how much did they make with the anchor system on Netflix? It was very cool. Probably have all the lot, you know, but it was really, really cool to the system. And we do believe in the fact that. Well, the people that are going to build this IP, they deserve a right to the upside of what value is going to be created. on the other hand, a lot of companies actually want our source. mean, the company we're speaking with, they're like, well, do we really want the model? I mean, do we want to incorporate it ourselves and clean the code and run the hardware, the infra, monitor the model, the feature drift, et cetera? Do we even want the hire, if that's possible, the guy that built the model? Maybe it doesn't want to work for that company, period. So it's going to just give you the model and what. So good luck with that to actually monitor. And you know the tech's moving super fast. Maybe this model is working this year, but in six months, especially now. I mean, I think it was two years ago, we had 300,000 papers about machine learning and it was two years ago published on Archive. we need to check the number this year with elements and things, this number is to the roof. There is no way you want on a team can support this level of activity. And there is no way a model that have been built at the moment is take performance in six months. It's going to be already, you know, kind of legacy. Yeah. Let's back up a little bit. How long has CrunchDao been around and did you raise money from VCs to build it? Yeah, did. Crunch is being created in 2020. So it's been three, it's been four years now. We raised money in two rounds. First, we raised money from Business Angel. That was in 2021. And we had some amazing people. We have, for example, Rand Indi, that is CEO of Zama. You probably heard about them. We had people like Julien Boutelou that was back then running Stake Capital, Stake DAO. We had some high profile from finance and hedge funds that we're already super interested into what we were building. And then we raised from multi-coin in May, $3.5 million. Interesting. in order to build the protocol fully. yeah, the goal was first to have people that could help us in building the protocol and understanding the need of finance and having the connection to be able to do a go-to market to these institutions. And then the race with Multicoron was to have a very strong partnering with three to make sure that we were doing the right choices and we could be surrounding by the right founders to also make sure that the token rigs were making sense and that we were not making any mistakes in the designs. That sounds great. Question on, so whenever you're building a two-sided market, it's one complication is you have to communicate the value proposition to each side. And so on the coordinator side, you what is the messaging and positioning that you're providing them? And same with on the data modeling side to data scientists. Well, how are you communicating value to them? And whenever there's a two-sided market that's new, how do you ensure trust or build trust with these two very different audiences? Maybe you could share a little bit about that. Yeah, well, first they are not that different, to be honest. We have in the community people that work in these companies already. And so, yeah, I think it's a bit linked to the answer I was giving you before. And mean, on one hand, the very proposal is you will never be able to build that internally. Like there is no chance you can access this kind of firepower in building your model. And... building a model with a team of four that are scientists internally, I think it's kind of a legacy model. It doesn't work. It's moving so fast and it's going at such a speed that you're going to waste time in hiring and you're going to waste time in building a model and these guys are going to have to monitor it. You're going to have to grow a very large team, actually, to be able to start to build something that really makes sense and that really perform and then it's not going to perform in the next month and this kind of thing. The value is there. And I think those people say, want to outsource. And that's often what we get in the first call. And I'm talking about large bank that they are like, we want to outsource because there is so many, they need so many data sources. They need so many data. They need so many insight that they don't want to build it internally. And as well, what we see now with the coordinator, when it's not the customer, is that the coordinator has seen the opportunity to build the product. Now what's happening is that coordinators are coming in order to build the product on Crunch. And then instead of doing a one-shot, let's say, with a customer, then serving just one customer, we build a product. We solve the problem once with all the community. We always keep an open front for the new models to try to improve this product. But then this product has been sold many times by the coordinator. That's what's happening with mid-market price. So mid-market price can be set to many banks and organizations. at the same time, but you just have to solve the problem once. And this is where you can scale if you want the business model of a coordinator. When is dissociated from the customer, right? We are in this use case where those guys are coming, they have data, they want to a product, they use the crunch to build the models on top of the data, and then they are selling the product to banks or to many people. And for them, it's a no brainer. They don't have to hire, they don't have to think too much, they just have to... bring the data, maybe the higher one data scientist, they just use them to craft the data set, prepare the problem and then plug it on Crunch. But that's it. Then all the models coming from the code. And on the other hand, from the data scientist side, like the business, I mean, the value proposal is just, you know, it's not like in Kaggle where you try to be on the top of the leaderboard to get a job is you would never have to get a job if you're good at Crunch. That's the thing. You just build a model and your whole life is set. If this model is is good and connected to the right customer, then the customer is going to use it for 10 to 15 years and use Setup and maybe more. That makes sense. What's the decentralized aspect? we haven't talked about this, I imagine that CrunchDAO is going to have a token at some point. And what is the role of the token in the marketplace? So the decentralized aspect now comes from the fact that it's becoming permissionless. And so this is the release we're doing now with the white paper that we're releasing at the end of the year is that now anyone is capable of coming with a new use case, Like a coordinator. And the way you do that is as a coordinator, need to create, you just have to stake some crunch, right? To prove your good faith and... As soon as you do that, then you can launch a crunch on the protocol. So to launch a crunch, you need to define a reward, you need to define the reward function, and you need to define a data source. And this is permissionless. Now the foundation is then incentivizing the right behavior, and so we are also implementing staking function that allows community to do some signaling for the reward of the foundation. So we know who is actually creating revenue. who is actually generating buyback and burn of the token, which is really the tokenomics that we are operating on. And so the staking is helping us to understand who is good from the data scientist part and from the coordinator side. And this is needed because actually we couldn't scale in a centralized manner because every use case, actually the job of a coordinator, it's a lot of work because you need to prepare the data, you need to obfuscate the data. You can do it with just one employee, but it's time consuming. And the thing is that every industry is different. So now we see coordinator that are specialized in some industry. If you're really good at, I mean, if you have experience in finance, you will be able to coordinate the financial use case. If you have experience in energy, you'll be able to do that. Let's say you want to predict, don't know, wind speed for offshore aeolian field customer that will be paying you for these predictions. Let's say you want to you work in sport and you're doing sport prediction. You have some people that are actually loving this kind of algorithm that are trying to actually play the bet with that. It can also be used by the team, by the trainers, try to understand the player and the game better. So every vertical is different. And so we couldn't be expert in all the vertical, right? So we had to kind of let this network. open so it can scale and so we had to define all the rules and all the smart contracts that allows this protocol to be autonomous and run by itself. I think that's fascinating. And in a lot of ways, I I shared the example of Kaggle, but it's not a close analogy. But you're really building, it feels like CrunchDAO is really building a new category in the space. Do you have a name for this category? That's a good question. We need to figure it out. I I really like what I see on the deep side of the tokenomics world because it's utilizing the real world. So this is, I think, particularly in crunch that we're solving problems for people in the real world. But at the same time, it's not a physical infrastructure. So I mean, it's human that are being coordinated. So sometimes I'm tempted to say MLDPn, sometimes I'm tempted to say AI, know, Web3.AI. So I don't really know. Actually, we're crafting that and we will figure it out. And I think it's more, I think the community will put us at a moment. And that's why it's also fascinating to talk with you guys that see all this project and that help us maybe understand where, you know, where we fit into this work. Because basically, As a founder, you kind of heads down, you know, you have your vision, you have your community, and even more when you are in your own space, like I said, there is not too much interaction. We don't need the other protocol to do things. That's not true because now we start to have some needs and we have on the roadmap another aspect of the decentralization, which is the decentralization of the compute that we're going to need to run the models in decentralized infrastructure. And so now we start to speak with a lot of... or the protocol that are providing these kind of things. But at the same time, we need to find protocol that are mature enough to work at the level of latency that we're looking for in finance and the level of availability that you need when you work for a bank because they will cut your contract if the coordinator cannot run at their 99.9 % availability. That's the real world, right? You cannot just cut the network and just say, bank, we cannot serve you now. Doesn't work like that. It's so refreshing and also unusual to be talking to you that CrunchDao has real world demand. And I say that because many web three projects don't. In a lot of web three projects, they're really a solution looking for a problem. Whereas you guys are, there is a problem and you guys created a solution to solve it and it's actually working and has been for the last four years. And that's very refreshing. I'm curious. Yeah. at the same time, it's a problem that I face myself and you should never build a project around the problem that you have first. You need to make sure that other people have it. And so I think in a way we get a bit lucky and also we change kind of shape around this adventure, right? It evolved from the closed version that I told you to the open version to now the protocol. protocol and now we start to see people building product on it and now we see the scalability coming back. And so it's of an adventure. And so maybe the crypto project on longer roadmaps sometimes maybe and so they're going to see this and they start by making a lot of noise around the idea. So they just start the journey after making a lot of noise. And so often we see protocol make a lot of noise and then we don't hear about them for the next five years. Then finally they have a product that works. We built it a bit differently. And so we started it from making sure that there were people that wanted this problem to be solved on top of me. And we got lucky on that because we managed to score one of the largest customers we could find on that market, which really then helped us in the awareness of the other people that could need or want this solution. So yeah, we started from that. And then from the moment that we were Seeing this, which is now the moment where we feel comfortable in crafting the product around this demand. It's only at that moment that we decided to put the effort and the resources in building a completely decentralized version of it. And actually it comes from another problem, which is the scaling of the things. If we didn't have that issue, guess, maybe we could have stayed a bit more centralized. But now we have this issue of scaling. which is a good problem to have, right? When you have more people that comes to you and want to build more on the protocol. Yeah. So on the one hand, you've got this protocol. It's a two-step that serves the needs of data providers or coordinators, and then who have a question. And then now you've got machine learning scientists on the other hand, that have models that can apply to the set data set. And there's real world problems that your marketplace, the crunched out is solving on behalf of these two different, two related, but different parties. And we've talked about the token and at some point that's going to be released and that's going to have a key. It's going to play an important incentive mechanism for both parties. Tell us about the community because in Web3 community is quite important. What's the role of community in Crunch now? So first, I think there is kind of a two community in Crutch. The first community is a community of machine learning engineer, quants, scientists, statistician, and et cetera. These people are not Web3 native. And they are very pragmatic. And they are not inclined to like Web3 by default, especially that there is a lot of noise. And so sometimes you see use cases that already make a lot of sense. And so... This part of the community is more quiet. There are not people that will hang out on Twitter all day, go for quests, and make a lot of noise in the Discord. So that part of the community is really very important because this is crunch. The crunch now is the machine learning community. And on the side of that, have how can we leverage the collective intelligence and how can we orchestrate the... the Web3 community such that it can be a utility for the overall protocol, right? And so we find different ways and the first way is what I just described is using this community as signaling for distributing reward for people that are useful for the protocol. So let's say a coordinator is coming with a use case and so is describing the use case, is generating revenue for the protocol. And if you believe that this use case is going to generate much more revenue in the future, which is good for the crunch, then you can signal the foundation that you really believe in this use case by staking on top of this coordinator use case. And so I believe that there is one thing in the word community that is very strong is the capability of projecting itself very far and understanding some very futuristic concept. And so I really like that. signaling function that we want to have. So we make sure that we see the people that will create value in the future and we don't miss on gem that maybe are not generating revenue at the beginning, but that can be kind of helped by the community in order to join the protocol. That I really like it. Then we also find a second entry point that is called pi. And pi is a new way to contribute to the crunch. Instead of contributing with a machine learning model, you could contribute with time series. And the way to build the time series is by just asking a question to an LLM. So the way we did that is we plugged an LLM to tons of unstructured data. And when you ask a question, this question is being asked to all the documents that we could find in the history. And every time it's being asked, it's creating a value between zero and five. So for all document and all date, it creates a new data points which create a time series. And then these time series can be repiped into some models and can be reused by the machine learning models in order to generate more product and more value for the DAO. And it works pretty well. We had about 36,000 questions that have been asked by people that don't know how to code, they know how to ask a question about the economy, right? And that's... very interesting use cases for us. And so now we're plugging more data into Py to have more people to join the community, even if they don't know how to code. Now they are very valuable and they can generate a lot of value through these two ways, signaling and generating time series for the models. Interesting. Well, what about other type of workers on the data coordinator side? know, much of data needs to be labeled, for example. Is data labeling part of, is that a part of coordinators job? Well, yeah, the job of the coordinator is in the end to... The coordinator needs to propose a data that can be used by the data on the other side. So this data needs to be clean. This data needs to be already a minimum usable. so coordinator have the choice to maybe... don't do it well, but then I guess that a lot of people won't be engaging with the data because now they are used, especially now crunches used to have very clean data, very institutional like data. And you really only come for building the machine learning models on top of it, which is not a developer skills, right? It's really, it's more like a stats, you know, mathematic. lot of physicists, astrophysicists, all these kinds of people are used to build a lot of models. They are not developers. they are not, I think, I mean, it's different skills. have data engineers that can build data pipelines, clean up the data, do all this work. And then once this work is done, then you can have a proper modeling activity on top of it. So it's a bit separated. Yeah, the coordinator has to do a lot of things. sometimes in order to end up to a place where you can have modelers to work properly. Yeah. Is there, on the data side, how do you ensure quality? Are there penalties if I provide a data set as a coordinator and it's not prepared enough and it's poorly labeled? Am I penalized in some way? yeah, you're penalizing yourself because you're paying for the service, paying for accessing the protocol. And that's really the entry point at Crunch is when you want to propose the on the protocol, a data set on the protocol, you need to lock the reward on top of it. And so every time and every week, let's say, or every day, depending on your reward function, part of that reward is being distributed. So if your data is... not as poor, then you're just distributing money for something that is not usable. So you're kind penalizing yourself. But that's how you ensure the quality is good. Skin in the game, let's say mechanism where you have money at this or you better stop your crunch. If you see it's not working, you know, stop your crunch and make sure you fix the data and then just, you know, then restart your crunch. But once you're locked value into the protocol, it's locked. you have to distribute it. So you will have to clean this data and make sure it's usable. And that kind of incentive mechanism makes so much sense because I am, as a free agent, I am incentivized to make sure I provide good data that is prepared and that's usable. And there's no outside party forcing me to do that. I have to do that on my own because it's to my benefit to do that. And that is just a very interesting way. And you don't see a lot of coordinating mechanisms like that, that is kind of self, almost self, with some kind of feedback loop that is self-reinforcing. And so, as a team, you, how did you come up with that specific design? Was that something you iterated on or you came to that eventually? Well, so that's we actually on the first coordinated design that's being released now until before we were doing this internally. So we have the experience from doing it internally as a core team. And based on that experience and the discussion we have with people that want to become a coordinator in the bed down, we had discussion with them. And so we crafted from our experience and they need as well a series of smart contract and dynamics that that are both safe for them and safe also for the cruncher on the other side and for the protocol. But that's indeed the first version of this. So hopefully we got it right. But worst-case scenario, you can always fine-tune this kind of mechanism and nothing is set in stone in our world where it's not like you're operating an automated... market maker or you have like this liquidity pool with a lot of value inside, know, in our case, it's pretty, pretty easy to, you know, publish an announcement and make sure that we can change a bit, you know, this function to this coordinator is not being impacted and this kind of things. It's kind of easier for us. And we want to use the beta period, which is going to happen in February, where we just launched a few number of coordinators. We don't completely open the protocol in a permissionless manner. We will just start with a few coordinators in this private beta, let's say, we make sure that we can operate, we can deliver the service. Both sides of the marketplace are happy and things are working. And once we get to that point, then we will slowly open the gates so anyone can come and this time do their job and start a coordination use case. I see. So in this beta period, you have to vet the coordinator and the data that they'll provide as you aim to remove any kinks and improve the system. And so if I were a large healthcare organization with a lot of patient data and I have to follow regulatory concerns around HIPAA, for example, in the United States, And I have a couple of burning questions that I need some machine learning help on. What would that engagement look like if I approached Crunch and said, hey, I've got this data set. I need some help answering these questions. So at the moment we just had this closed beta, so it's just like contact us and contact the core team if you want to be part of the beta. So super, super centralized, but that's the beta, right? We create this new role. But in the future, there is two possibilities. Either they get in touch with someone that is already coordinating in healthcare, so someone that has experience in their field, and so they would be able to go to that team that has keen in the game that have experience that already using the protocol for a while. So either they go through this actor and they try to be coordinated or either they are interested enough to prepare the data themselves and just start a coordinator, lock some money and et cetera. So two scenarios again, either they are their own coordinator or they will pass by someone that knows the industry. is capable of doing things and maybe in the future there will be several coordinators in the same industry you know and you'll be able to just go to this kind of provider that will help you in the data processing if you don't know how to do it yourself if you don't know how to clean your data yourself and etc but yeah for the moment that's either internal or coordinators. I see. So it seems like there's almost a layer of I as a data provider, but if I don't have anyone in my organization that can help me prepare the data, I might look to crunch to help me with that. Or do you have access to data preparers that can help prepare the data that, but I have really valuable data that's proprietary. just don't have people to help me prepare it. but this data could be incredibly valuable to the crunch ecosystem and be available to machine learning scientists on the other side. Yeah, that's the dilemma. As soon as you need to work with some data, you could have data preparers, right? But it can only generalize to a certain point, right? It depends on what is your CRM, how did you store the data, what's the format, and are you using some proprietary mechanism there? It's only generalizable to a certain So what we want to encourage and really have to is to make sure that we modules and pieces of code that can be reused by all, know, kind of a modular system that can be reused by the coordinators to make their work easier and eventually by the client when they are their own customers, their own coordinators. So we need to create this kind of yeah, bricks of knowledge. that will help decent boarding. But in the end, is only, you will need someone to look at the data if you want to prepare it. So either you do it internally under the advice of some people or you trust a coordinator through an NDA. Like when you work with a data pro, with a consultant, you trust a coordinator with an NDA that will be preparing your data before they push on crunch. In the end, crunch, I think there is no miracle solution, At Crunch, we're good at building models and we kind of want to stick to that. So make sure you come up with something we can build a model on and that's really where we can create the best, best outcome. I see. In this closed beta, I'm sure you and the team are learning a lot of things that need to be improved. What are some challenges you've discovered in this closed beta period and any challenges you've discovered and any improvements that you're planning on making before you open up the beta? Well, it's actually starting in February. So for the moment, what we've been doing and what the team has been focusing on is developing all the smart contracts. And all the smart contracts are now working locally. By the end of the month, they're going to be on the DevNet. And once we are on DevNet, we'll be able to onboard these coordinators. and start simulating what we want to do on the DevNet. Once this is working properly by the end of February, then we will be able to. So the learning period will be probably in February. For the moment, we just have a Q &A question with, what do you need to do? What kind of data are you going to work with? What kind of latency you need to work with? What kind of availability are you looking for? Is replicability interesting for you? This kind of thing. in the, when you, when the beta period begins in February, what are some, Yeah, sorry, I'm Yeah, no worries. When the beta period begins in February, what are your marketing plans to get more data scientists on the one hand, as well as coordinators on the other? Well, it's quite organic now on the data scientist side. There is also the question, do we want more people also in the data scientist side? And that's a question we're asking ourselves. I mean, that's a choice. And you mentioned Kaggle before. I'm just going to put that because this light is killing my head. But the thing is... I think at a stage I want Crunch to be able to also do a bit what Kaggle is doing. Say, okay, you can be trained on Crunch and you can get some lesson. You can get a better Cruncher and become enjoying the group of people that are actually doing the job. And then if you do that, there is 21 million users at Kaggle. it's massive, you know, and do you actually want that? So, yeah. There is a sentence that I love which said, beware what you wish for. you know, having a lot of users also adding a lot of noise. And then you need to craft even more sophisticated coordination mechanism, trying to figure out who is doing what, who is useful, who is not. And then people complain, they don't get paid, they don't receive, you know, points and et cetera. And then it gets all messed up sometimes. So the question is, do we want more? And I think right now we are in a space where People that are using our product are happy, they're paying for it. And so we are in a good place. And I think I want to kind of preserve that and we see how it evolve in the future. so we monitoring the quality and make sure the quality stays set. If we see the quality going down, we're to have to implement actually measure to maybe block, know, maybe close even the Cunjdao and let it only enter to people that can actually, that are capable of doing the job. So yeah, the plan is marketing plan on the data scientist side is keep the quality and build partnership with universities so we can kind of get exposure there with the talents that are emerging every year from the education system. Keep a focus on the countries where we have a lot of traction, which is a lot of Asia and US. And yeah, craft the right partnership with people that have the ability to make our signal resonate in the ML community, right? To make our idea resonate in the ML community. That's for the data scientist side. for the customer side, we have the chance to have... thousands of data scientists that are actually working in these companies. Sometimes they work in small teams and sometimes they are facing the same problem that I have. Their model is drifting and they're getting tired. So, a bit organic as well. Most of the contracts we've been finding were actually coming from people that were coming to crunch and just said, okay, maybe you need to talk to my company. They could be interested. We're struggling on that and that. So, that's one way. And the second way is really... Being on the right vertical is crucial because when you have a product that works in finance, for example, the world goes like, people talk a lot and people know each other and it's actually a pretty small world. So when you manage to perform as well as what we did with the Agilab, for example, or these kinds of things, people hear about us and then they kind of reach out or let's say we're speaking at this conference and... they come and reach out at the conference where, because they've heard about us before. Is there a future for an AI agent to use? An AI agent could be a layer on top of Crunch and a question will come to the AI agent and then the AI agent will then do something with Crunch and infer and then return an answer back to the user. Is there a future like that? That's a passionating future. I think that's a coordination use case, to be honest. That's the beauty of what we do. Maybe someone can come up with this use case and want to build it on top of the community and the community will be able to reinforce this agent, for example. On another path, you also have a future where the cruncher are using AI agent in order to build their model. they are already, mean, every developer now is using, almost every developer on the planet is using some version of copilot or coding with LLM already. Now imagine being able to not only provide one model every two hours to crunch because you have to code it, but kind of... communicate your ID to an agent that can then build your model and then submit it to Crunch. And maybe you have an independent AI agent that's to be crunching our data and just make money out of Crunch on top of it. And we are completely open to it. I remember we had back then some of the auto ML solution out there, like H2 or DataRobot were actually competing on Crunch. At the moment, I remember some guys were using that. And that was super interesting. we're super interested. I the humans for the moment are still winning. And I think it's because we are operating on use cases that are a bit edged use cases, even completely edged. Like if you want to make the difference between a cat or a dog or do some facial recognition, this kind of algorithm are like 99.9 % accurate. So there is no need for working with crush. You need a data engineer that can deploy this model properly, put it in production and make it work. You don't need to have extra modeling power. But for the very hardcore use cases, like what I just faced is when I tried to apply a model to finance, very, very noisy, like billion of human at work just represented in one number. Sometimes it's very, very noisy, very, hard to predict. This is where you want to use people. So these age cases are where crunch is going to. to be very useful. And using AI agent is just a way for you to de-multiplicate and augment your productivity. You can say to this agent, I want to try this CNN and I want to use this and this time of normalization and I want to plug an optimizer on top of it so I can have non-convex hyperparameter optimization. in order to maximize this part of the objective function, let's say. That's sophisticated thinking, more like creative thinking that the data scientists would have. But then the agent can just write the code behind, let's say, and the other. But yeah, that's on the side of the data scientist side of the marketplace. On the other hand, I strongly believe that Crunch can power some AI agent that will need predictive or... that are also facing this kind of use case. mean, if I am an AI agent and I work in decentralized finance, DEF AI now, which is kind of coming on the market, well, maybe an agent is going to open a coordinator. You know, just say, I need prediction and I need people that can be a model than me. And if he's smart enough, he's going to find crunch, open a coordinator, set up the rules, put the data, give a bit of his money on the contract. And we will be able to do that in complete autonomous way. But you men will be kind of working for the AI agenda at that moment, which is kind of a scary future. But it will eventually happen. So you've shared some use cases in finance. What other use cases could there be for which crunch would be useful? Well, there is plenty and that's why I'm so excited about crunch is that you have data science problem everywhere. Like everything is mathematics. Yeah, a single thing is like every decentralized protocol, every L1 we could work for. We could work for every, I don't know, I mentioned wind prediction, weather forecast, solar activity, being able to predict it in a better way can help you. rebalance the energy grid in such a way that is not impacted by, you know, unpredictable solar activity, same thing for the satellite. is amazing use cases like that, even with even more value, like for example, this algorithm that is used by UPS for all its delivery truck that's called Orion, right? Can you imagine all the trucks of UPS using one algorithm? This algorithm is using, for example, traffic predictions and et cetera. And so if you can improve the prediction of traffic for UPS, can you imagine in the world how much petrol saving, for example, you can do for that company and how much pass-by you believe in? Or someone that is operating like a fleet of tanker all the year that if you can have it with a forecast or with a stream prediction. I remember back then we even speak with someone that worked on submarine. underwater stream prediction that can be used by submarine to kind of move without using their engine. If you can predict this stream in a better way by 10%, then you increase the distance that the submarine can go without using its engine that can save life and change the course of history sometimes. That's massive, yeah. everywhere like healthcare, logistic, energy, finance, course, city planning. Yeah. Everything that happened. Yeah, it's everywhere. And this what human realize as well is that we want to have more and more AI everywhere, but you need to have the proper algorithm because an AI that can only do analytical things is not really autonomous. You always have to kind of project yourself. Like is this kid going to cross? world, you know, is this, this concrete is going to be strong enough if the wind goes above this level, right? What's happening to, I mean, there is all these things. And if we want to have real autonomous agents to be autonomous, they kind of need to know what's happening in the future. And, and, you know, for human, we call it entrition. So for us, makes sense. You know, we have this intuition like, these things is going to fail. Mm-hmm. it's to fall, right? It's a bit like that. We understand gravity somehow, even if we're not physicists, we understand it and we know it's going to fail. We call it entrition. Having that in a model is not trivial. It needs to be there. And there is even more sophisticated sources of entrition. you think about what some people are doing, like Warren Buffett, let's say, it's taking decisions with a billion dollars just because. is reading a lot of things and then they're reading a lot of analytics and on top of all this analytics they are making a decision about something that is more based on an intuition, let's say, on a gut feeling. And when we're going to be able to develop this kind of gut feeling, it's going to be amazing because they have access to a much larger amount of data. And so this intuition is going to be very, very strong. But yeah, so working on predictive tasks for us is, I think, super interesting. And it's kind of left on the side now because we have very strong analytical use case with LLM. can analyze a large amount of data. Generative use case is very impressive. We kind of forget prediction. And prediction is often stuck in a very old state where people are using very expensive data with linear regression on top of it, right? Because the data is so expensive, nobody has it. You don't have to go to fancy to actually have a lot of information. But I mean, it's 200 years old tech, so there is much better things we could do on this data if you manage to share it. And if you manage to talk to the people that can use it, that can use better technology to build modern technology. Well, you know, I began my career at Amazon a long time ago. I an internship at Toyota and my area of research was queuing theory. And my first job was to reduce time traps or eliminate time traps at the receiving factory in Georgetown, Kentucky for Toyota. And so I did a lot of queuing theory, linear programming, And then when I was at Amazon, was in a part of a what's called the space management team. And so if you have a bunch of stuff going into an Amazon warehouse, like where should you put it was the big question. Because then you have to optimize for wherever you put it also has to be optimized for the picker on the other side so that they can reduce the number of steps or number of hand touches. And that's so can go into the outbound. And there's a fascinating problem. But it was all The tools that we had back then was just linear programming. And also we did a lot of lean manufacturing. But machine learning didn't exist. I studied computational linguistics in grad school and it was just, we just didn't have what we have now. And what you guys are doing at Crunch is fascinating. And I think it could open up and really enable so much innovation. And so I wanna give you kudos for starting. Crunch and for having built it in the last four years. And I'm really excited for where you guys are going to go. And I want to encourage you, think about a category because one of the most powerful ways that you could really build a brand is by creating a category. And you guys are creating a category. I can't think of another project like Crunch and you are, you know, if you are the first, you build a category and you'll be the most memorable project in that category. So it's a good job. Yeah. Thank you so much for the kind words. And yeah, it's also a lot of pressure on the team because we see all this and we want to make sure that we can get it right as soon as possible. It's not an issue if you don't get it right at the first time, but yeah, the goal is to get it as soon as possible on the right path. And so... So it's great to have people like you and more people that understand the space around us to help us guide the decisions to write the right shape, let's say, for the protocol and for that category because we have to define a lot of things. You've made me a fan and I'm excited for where you guys are gonna go. And in very best, know, best of luck in February when you guys begin the beta. Before we end, is there anything that you'd like to share with the audience? How they can learn more or even contribute or be part of the community? Yeah, absolutely. Well, the best entry points now are our Discord server. We have, of course, a Twitter, a Nix account that you can follow to have all the news. You can follow my account. I spend a lot of time building, so now I'm getting a bit out there and trying to speak about what we're doing. And we are, for example, running office hours. Every week with Peter and I we also running weekly roundup and namaz every week where Anybody can come, know with some question and that we answer live and we also kind of You know keep the community updated with everything we are we've been achieving in the week So yeah, I'd say that that are the best entry points Twitter and our those links in the episode summary also. Yeah, thank you so much. That's super helpful. yeah, of course, super happy to come back. Maybe we do another session once the category is fully defined and we are live, know, maybe that would be nice. do another deep dive. pleasure, Peter. Jean Harrell from Crunched Out. Thank you so much. Thank you, Peter. See you.