Artwork

Gaël DUEZ에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Gaël DUEZ 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.
Player FM -팟 캐스트 앱
Player FM 앱으로 오프라인으로 전환하세요!

#6 - David Mytton - Bringing reliable and transparent information to Green IT

40:42
 
공유
 

Manage episode 339427646 series 3346125
Gaël DUEZ에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Gaël DUEZ 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.
In this episode, we go to London and meet David Mytton the co-founder and CEO of Console and also an academic doing his PhD on sustainable computing in Oxford University 🎓. All David’s work follow a consistent thread that could be described as “Effective Sustainability requires reliable and transparent information”. Under this angle, we discussed the environmental impact of datacenters and why the estimates vary so much, the accuracy of carbon footprint calculators for websites as well as the choice of language and CI/CD when we aim to develop greener code 👩‍💻.
❤️ Subscribe, follow, like, ... stay connected the way you want to never miss an episode!

Learn more about our guest and connect:


📧 You can also send us an email at greenio@duez.com to share your feedback and suggest future guests or topics.

David's sources and other references mentioned in this episode:



Transcript


Gaël: Hello, everyone. For this episode, we go back to London to have an amazing talk with David Mitton, the co-founder and CEO of Console, an Angel investor and an academic doing his PhD on sustainable computing in Oxford University. Now, let's be honest. Interviewing David is both a privilege and a challenge, a privilege, because he ranks super high on my list of world class experts in digital sustainability among people like Gauthier Roussilhe, for instance, but also a challenge because he covers so many angles in green IT that I would like to record a 4 hours long episode. Still, there is a consistent threat among all David's work that could be labeled as “Effective sustainability requires reliable and transparent information”. Under this angle, we discussed the environmental impact of data centers and why the estimates vary so much, the accuracy of carbon footprint calculators for websites, as well as the choice of language and CI/CD when we aim to develop greener code. But let's talk a bit about David first from 2009 to 2018, David was co-founder and CEO at Server Density, a London based SaaS infrastructure monitoring startup which got subsequent VC funding and achieved to attract more than 700 customers. When StackPath acquired the company, he joined it for two years as their lead product engineer. Then he completed an environmental technology MsC at Imperial College, London, where he developed his interest in sustainable computing. He continued that research on the sustainable team at Uptime Institute. Eventually, he launched Console in early 2021 to provide the best tools for developers while starting a part-time PhD in Oxford University. Welcome, David. Thanks a lot for joining Green Area today,

David: Hello. It's very kind introduction, Thank you very much.

Gaël: First of all, what did I miss in your bio?

David: It's a good question because my hobbies overlap very much with what I do on a daily basis, which is the console.dev newsletter to help engineers find tools. And I just really enjoy playing around with computers. And so I do that all day playing with dev tools and Cloud infrastructure and then at the weekends - and sometimes during the week as well - I'm also investigating the sustainability of everything I've been playing with during the week because I think it's important for us to be able to continue using all these digital services whilst reducing the environmental impact. And so my hobby is the academic side of things, which is why I'm starting a PhD at Oxford to continue the work that I first got interested in at Imperial in sustainable computing.

Gaël: So you could describe yourself as a sustainability geek.

David: Yes, I think so. Technology broadly and then sustainable technology, specifically

Gaël: How this interest in sustainability, and especially the sustainability in ICT came first. Did you have some kind of ha ha moment?

David: It came once I sold my company in 2018, and I joined the acquirer as you mentioned StackPath and ran their product engineering team for a bit. But then I was thinking about what challenges we're going to face as a civilization over the coming decades and how I could apply my time, my skills to contribute to solving some of those issues. And I decided that climate change was a major challenge that really needed to be fixed. This was pre-covid, of course, so that was the biggest challenge that I saw at the time, and I decided I needed to bootstrap my knowledge in environmental technology and environmental sustainability in general because I have absolutely no or had no background in it prior to that. And so I did a masters degree at Imperial College here in London, in environmental sustainability and technology, and that really covers, covered, everything from fisheries management through to clean energy and vectors for disease control with mosquitoes. But I specialize specifically in energy technology and that I chose that because I thought that was the most tangible and applicable area that was currently in motion at the move to renewable and clean energy essentially. So specializing in energy, I then was able to link my experiencing cloud and software to start getting into the sustainability aspects of the usage of I t globally.

Gaël: So we mentioned reliable metrics being a pillar of your work, both as an academic and as an entrepreneur. And I'd like to have your opinion on the carbon calculators for websites which are blossoming across the world. Sorry if this question might be a bit controversial but are they reliable? All of them? Only some of them? And what could be the use?

David: Generally they're not very reliable, and they're not much use. In my opinion, the problem comes down to the methodology and I would distinguish general website calculators from the Cloud Carbon footprint calculators that are provided by the cloud providers, which all three of them Amazon, Google, Microsoft, they now will have their own calculator. I put those in a separate category, the ones that I think you're referencing just general Calculators are available from a number of different places online

Gaël: and dedicated for websites, not not the full infrastructure. Most of the time.

David: That's right. Yeah, So the way that they are working is they generally scan your website and look at all the assets, CSS, images, JavaScript, all that kind of stuff and calculate the total page size. And then they apply an energy factor, which is typically taken from academic literature and then multiply the energy factor by the page size, and that gives you the total amount of energy that is consumed on a single page load. And then that can give you a number that you can use to calculate based on the number of visitors to your website, and that gives you the total energy consumption and then governments around the world they all published carbon factors, which allows you to convert that energy consumption into carbon emissions. And that's the number that usually gets out of these calculators. But it makes a fundamental assumption around the energy factor, and typically you see these being taken from academic literature that provides a single number. It's often kilowatt hours per gigabyte, and then they multiply that by the data transfer. And that has some major limitations, which means it's essentially inaccurate and not much use. And this is compared to the carbon footprint calculators you get from the cloud providers, which will go into your full infrastructure and all of the details. And they look at their actual energy consumption from all of the data centers and the full footprint of the infrastructure, whoever it is that you're using. And then they are portioning it across all their customers, so far more challenging with those because different companies are more transparent than others. But because they're using actual energy consumption rather than the calculators which are just using factors multiplying them, they're significantly more accurate.

Gaël: We had an episode dedicated to Cloud Carbon Footprint and the different methodologies and pro and cons regarding the big three - I would say - carbon calculators with Cameron Kasher and Benjamin Davy. It was a fascinating discussion. But going back to the website calculator, is there any other bias?

David: The first challenge, I would say, is that it drastically simplifies how website infrastructure works and how websites are actually delivered to users. So if you're just looking at the page size and using that as the total amount of data transferred, then that misses out a huge part of what's going on behind the scenes. So, for example, is that data downloaded every single time you load the page? It is the first time, but on the second and the third times it's not. It's cached by the browser. And then you've got caching happening on the infrastructure behind the scenes because there are content delivery networks. There's also processing that can happen in the browser. So if you're executing JavaScript or playing a video, then there's processing happening in the browser. But there's also processing happening on the service side if you're doing any kind of service side rendering or querying databases, so you're missing out some major components in just how the page is rendered. And then the energy factor, this number that you get kilowatt hours per gigabyte.This is an average, so it doesn't really capture any of the key differences between different types of applications. It's just looking at the average amount of energy per unit of data and misses out things like Web browsing versus video streaming, for example, which is much more intensive on the data volumes. Often you'll see a number that has been cited from a paper by Aslan from a couple of years ago as linear. Commonly, you'll see it as 0.65 kilowatt hours per gigabyte. That's assuming the number has been correctly adjusted because the number that they publish in the paper is actually for 2015. And you have to make adjustments for projected improvements in efficiency where they project. It's going to fall by half every two years, so you have to project that out to get the current year. The further out you make projections and make these changes, the less accurate they are, because it's very difficult to predict changes in infrastructure and improvements in efficiency and also that number from the Aslan’s paper. It only refers to fixed line networks. And that excludes a major source of traffic, which is mobile,

Gaël: Which is crazy because - and I speak about something that I have experienced when I was working in the prop tech industry - very often platforms reach 70 or 80% of traffic being done via mobile.

David: That's right, yeah, so the application is really important. And then finally, I think it only refers to infrastructure in countries with modern equipment, and that's relevant for countries in Western Europe. But as you have more infrastructure being deployed in countries that are not as advanced, then you're missing out potentially significant components of the environmental impact of that. So all of these things combine to mean that the number is basically useless, that you get from the carbon calculators. And actually it's more useful to look at something like the Google Lighthouse to use performance as a proxy for energy consumption. Because if the page is loading faster and you reduce asset sizes and you tweak your cashing and all those kind of things, then you're making more efficient use of the infrastructure just

Gaël: To close the topic of website carbon calculators. I understand you don't believe in the accuracy of the numbers provided but are they consistent? Meaning that the number is wrong but the evolution could still be interesting to follow and to track just to know if you're going into the right direction.

David: I don't think so, because many of the things that you change as a result of, say, the Google Lighthouse project would not be reflected in the carbon calculators, the website carbon calculators. So, for example, if you improve the caching and the calculator doesn't look at the caching - it's just looking at the full page load - you've made an improvement that is invisible to those calculators, and it would still show the exact same carbon number. The challenge is that it's somewhat unintuitive that increased amount of data would have minimal or no effect on the amount of energy consumed. Because you assume that if you're using more data then it needs more energy and generally that's not the case. This is an emerging area, I suppose, in academic research, and there's a limited amount of our papers to point to. But over the last two years, thereabouts, there have been some papers that look into this in more detail and the approach I describe around using the factor multiplied by the data transferred is known as kind of an allocation method where you're allocating the data to the energy or vice versa, and that has its uses. It's useful for reporting because if you know the amount of data that you transferred, the amount of energy that you've consumed by looking at your electricity meter too, then you can divide them and allocate them and then you get an intensity figure and you can look over time at your those two numbers and hopefully see a reducing energy. Improving energy efficiency to reduce intensity factor. So over time, if you're transmitting more data but you're using less energy, then your intensity would decrease over time, and that is useful to track. But it assumes you've got access to the data volume, which generally isn't that difficult to get access to. But it's more challenging to get access to that energy number because it's not exposed by the cloud providers. And unless you are racking your own equipment in a data center, you're not going to be able to get access to it. And the challenges that this unintuitive nature of how networks work, it comes down to understanding how those networks are deployed and, as it turns out, the way that networks work as you deploy the network equipment. And it has a baseline amount of energy consumption, which is a factor of the capacity of that piece of equipment. But it is almost completely disconnected from the actual usage. It's not correlated. Certainly not linearly correlated. So networks are provisioned for peak load and they’re provisioned for redundancy. And when you add that new equipment, that's when the energy increase occurs.Certainly the energy increase that is significant in terms of measuring. It's not when you push more data through the equipment, at least not on a linear basis, and that is the common error that is made with these carbon calculators. The website Carbon Calculators is that they assume that energy consumption is linear and it's proportional to the amount of data that is transmitted, which isn't the case.

Gaël: Thanks a lot for your honesty and straightforwardness about these tools. We could already start grasping it, and I mentioned it in my introduction, you’re a true infrastructure expert as well. You wrote several articles, academic articles on the environmental footprint of data centers both Cloud and Private Data Center. I'd like to start by not talking about energy for the moment and focus on water. Thought leaders in digital sustainability like Gerry McGovern, for instance, in his New Thinking newsletter are more and more vocal about it. Several hyperscalers experienced severe pushbacks in the Netherlands or in Denmark, for instance, regarding the water consumption. And I rediscovered recently that actually you wrote something about it back in February 2021. So what could you tell us about the water consumption of the infrastructure and maybe more specifically, the data centers?

David: Water consumption of data centers is an interesting topic because it is a lot more challenging to explain than energy consumption and carbon emissions. So when you're thinking about carbon emissions our goal is zero - net zero essentially - for carbon emissions, and that's a very simple marketing message. It's a very simple goal, and everyone can understand what we're trying to do. So when we're transitioning to clean energy, we mean zero carbon energy, and the goal is obvious. Water consumption is not so clear cut because the goal is not always zero water and you have to understand the context of the location of the data center to decide what the goal should be. So if you have a data center that is deployed in a region that has abundant water, perhaps it's next to a lake or a source of water that is not stressed in any way and is well maintained, then the water consumption - you should pay certainly for access to water, and you should consider the efficiency and making sure that your usage is reasonable - but it doesn't need to be zero, and you can invest your resources, your capital resources elsewhere. That is completely different from building a data center in a region that is very hot and has challenges with the governance of the water supply, or has other users of the water supply that are potentially of higher importance and where there might be drought and common problems with stress to the water system and in those regions, then zero water might be a goal that you use, but it depends on what you're trading off, because sometimes reducing water consumption can mean an increase in energy consumption and so you have to understand which one you're trying to optimize for. There are also some challenges of what you mean by water consumption, because in a data center there are two different types. There's direct water consumption, which is typically what people think of when they're thinking of what data centers do with water. And that is the use of water for cooling, typically in air conditioning and chiller units within the data center to keep the environment at a set temperature so that the IT equipment that can operate efficiently. But that's actually the smaller part of water consumption. Indirect water consumption is quite significantly larger, and that is the water consumption associated with the power generation that goes to produce the electricity for a data center. Because around the world we are still using majority fossil fuels and fossil fuels are part of thermo electric power generation. Water is a key component in how our electricity is generated because the fuel is burnt to turn water into steam and the steam turns the turbine and that is what generates the electricity. Or if you're using hydropower in some regions, then there can be even higher water consumption there because the water is being used to generate the power as part of the hydroelectric dam. For example, whether that is a true consumption depends on the downstream users. You might use that water somewhere else. Um and so you kind of got a dual use for it. But sometimes the water is discarded, or, um is not used again, and that can be actual consumption. And so you have to combine these two factors to think about what the data centers actually drawing, and where is it drawing it from, and also where you're generating your electricity for the data center, which could be quite a long way away from where that data center is and what is the water footprint in that region? And understanding the relative stress of both of those locations is important to understanding the true water footprint of a data center.

Gaël: And you say that a significant part of the water footprint is actually caused by electricity consumption. Could you give a rough estimation?

David: Off the top of my head? I think it's 2 to 3 times more water consumption from the power generation, but I would have to refer you to the paper that you mentioned that I wrote. There's a graph in there. There's also a graph in the 2016 Data Center Energy Report by Shehabi and Massenet, which includes a good breakdown of the water consumption data centers as well. So for the actual numbers, have a look at those two papers.

Gaël: I will. And actually, my listeners know that every article, document that is mentioned in an episode is put in the episode show notes. So now that we talked a bit about the water footprint - and thanks a lot for sharing this bit of counterintuitive information once again; the second one in the show - let's go back to energy. And, you wrote very recently an article about - I'm sorry to put it in plain English - but the mess about estimates around the data center energy usage. Could you tell us a bit more why? Which one should we rely on? And are we on the verge of a catastrophic increase in electricity consumption in data centers that will imperish the energy transition or not? [laughing] Scary music

David: [laughing] So you're referring to a paper that I co authored with my colleague at Oxford Masaõ and we looked at all of the data Centre energy estimates from 2007 to 2021 this was just published in July as a review article, and we looked at 46 papers or at least 46 were suitable for inclusion. We actually looked at hundreds and hundreds of papers, but we had some quite strict inclusion criteria because we wanted to ensure that they were original calculations. The calculations were transparent enough that we could have a look at the sources and the methodology, and that they applied to a specific region that was large enough to be an estimate for a particular geography. So, for example, it could be a country, or it could be a region of a of a larger geographical area such as the EU, and we extracted 258 individual estimates and looked into almost 700 different sources because the key to creating accurate estimates is the variables that are used in the methodology and the values that you put in for those variables, and our results showed that over 30%, or thereabouts 30%, were from peer reviewed publications, but the majority were not so 38% were from non peer reviewed reports. A large number of those lacked clear methodology, and it was difficult to find where the data had come from. And both IDC and Cisco made up almost 60% of all of the sources. So just two companies provided the majority of the sources. And the particular challenges that many of the sources about 11% had broken Web links and 10% we just couldn't find. And whilst the goal of the review is not to highlight any individual publication, but we did give some examples of some good publications about their methodology, we made some recommendations to people who actually are using these numbers and producing them. So three groups, in particular the general end users, so journalists, people working at nonprofits or advocacy groups who don't necessarily have a scientific background looking at the academic articles and be able to trust the numbers that are published are actually correct so they can do their wider work. Then there's the research community who's actually producing these, which is typically academics. But also the third group, which is particularly important, is growing in importance is policymakers and regulators within the grid operators, for example, people within the energy sector who are planning the infrastructure that delivers electricity to data centers and to homes and offices and cities because it takes a very long time to build out the energy infrastructure. And so the challenge that we spotted originally was just the wide variance in the numbers you mentioned. You can find energy consumption estimates ranging from 200 terawatt hours a year to almost 8000 terawatt hours a year, depending on the year that the estimate is for. And that range is completely absurd. Because it's so wide, multiple orders of magnitude. It just makes it completely useless. And rebuilding our energy infrastructure to deal with two or 300 terawatt hours of total global energy consumption is completely different from building something out to deal with 8000 terawatt hours. And so the motivation behind the article was to figure out where all these numbers are coming from and try and understand why there might be such variance in the literature.

Gaël: These numbers, these extreme numbers, do they come from, I would say non academic, non peer reviewed articles and studies, or actually well documented and written papers following a methodology which is replicable with open data, etcetera. They actually agree to disagree also, and they also find a very wide range of energy consumption.

David: It depends on the methodology of the paper. So there are several different approaches to creating data center energy estimates. You've got bottom up modeling so that looks at the equipment that is deployed and the energy characteristics of that equipment from the manufacturers and from databases like spec Power, which do testing of equipment. And then that is combined with market data. To try and calculate how many of these pieces of equipment, how many servers, how many these drives all those kind of things have actually been sold and deployed. And how is that going to change over time? And so by creating these models of how much has been sold? This is what is the power consumption of that equipment you can start creating some numbers around. What would the energy consumption of that equipment be? The next method is a top down methodology, and this is an example where you would get numbers from an organization or a government that has actually looked at the amount that's consuming. So a classic example is a telecoms operator or an owner of a large scale data center they would look at all of their energy consumption, and then they'll publish it. And by taking that single number of the total energy consumption, you can apportion it across users or data transferred or something like that to provide a kind of a historical estimate of what was actually used and then apply that per user or per subscriber. And then there's a third type of methodology which uses either of those two top down or bottom up methodologies as a baseline and then applies an extrapolation figure out into the future based on some or several scenarios of what they expected growth will be. And this is useful to make projections out a couple of years into the future. And when you look at the bottom up methodologies and top down methodologies. They do this, they will take their historical figure, and then they will apply a growth factor and consider what it might look like a couple of years into the future. The difference with the extrapolation based models is that they project very far out into the future. And as I said earlier, the further out into the future you project, the less accurate your projections are going to be, because it's very difficult to account for all of the potential changes in infrastructure and energy efficiency and government policies and all those kinds of things. And what we see is that the very large range of estimates typically comes from the papers and the reports that use this extrapolation based methodology. They say If these trends continue or something happens, then energy will increase significantly over time. But the common characteristic of all these is that you never see that massive growth in the past. You see it gradually growing, sometimes plateauing, but there is never a sudden massive increase. But then, in these projections, you'll see at a certain point the growth factor will be applied and the graph in the paper goes exponential and you see crazy growth. And that's when you get the numbers as high as 10,000 terawatt hours. And these are just not realistic because they haven't seen that happen in the past. But B, that is not how technology tends to evolve because there are efficiencies and that you get from new technologies and things change. That means that these values just don't make sense.

Gaël: Fair point. So albeit the energy consumption of data centers is something that is very important, that should be well taken into consideration in the energy mix to meet Paris agreement regarding greenhouse gas emissions that should not be covered with an apocalyptic tune.

David: I think what generally happens is the person who is writing or creating a project has already decided what they want, the number to show, and they then go and find the academic research that supports those numbers. And you can see this in several examples that we use in the paper, which reports that publish value judgments based on whether data is useful or not, or whether we should cut back on wasteful behavior. And whilst there is something to be said for changing user behavior, my view is that's a waste of time and should be down to the user to decide what is useful to them, and nobody else should be making judgments about how they're using technology and actually our goal should be to make technology sustainable broadly so that regardless of what the user does, then they can benefit from sustainable infrastructure because I don't think we're going to change user behavior sufficiently enough to have an impact. And we should make that change on behalf of the user by changing the infrastructure, so essentially they have no choice about it. But when you make value judgments about whether technology is good or bad, or certain technologies are good or bad, then I think that gets into too much of a kind of like a nanny state type approach where you're controlling what people are doing based on your own perspective rather than based on what the users want to do. And like with anything, you can find statistics to back up any point you'd like to make. And that has been a particular challenge. I think, in the realm of data center energy consumption, because you can use these papers that have since been quite significantly debunk. There have been challenges made to the methodologies, but you continue to see people citing the older papers just because they back up the point that they had already decided they wanted to make

Gaël: Thanks a lot for all these insights regarding the environmental footprint for data centres. I wanted to ask you a very simple question, which might take a bit of time to answer. Let's say I'm a CTO. I'm a lead engineer. I'm just simply a concerned Software developer. How should I get started to write greener code, not taking into consideration the infrastructure? Because sometimes you're in an organization where the infrastructure is as it is, not necessarily in a devops environment. And I've got this crazy question also, at some point, do you believe that at some point we will see in good CI/CD pipelines, sustainable variables being taken into account?

David: There's a good paper on this by Pereira et al., which was published in 2021 as a follow up to a paper they published a couple of years before that, which looks at the energy efficiency of all the major programming languages. And it gives some advice on how engineers how developers should think about this because it does come down to language choice. But it also comes down to what you're trying to optimize for, and often we think that well, because energy is basically power multiplied by time, then you can reduce the time of the execution to reduce the energy consumption, and that does have a factor to play. But there are challenges with how code is deployed. That means that might not necessarily be true. So, for example, you can reduce the amount of time that it takes to complete a calculation by deploying that calculation across 10,000 servers. Assuming that the operation is parallelised, for that doesn't reduce the energy consumption, but it does reduce the time. Then you have to think about “Okay, so what are you trying to optimise”? Are you trying to optimise for just the raw energy consumption? Are you trying to optimise for the time it takes to complete the processing or you're trying to optimise for memory? And that third one is the one that's often forgotten because there is an energy impact for memory consumption, and you can kind of see that by how the cloud providers bill for things like serverless functions like AWS lambda. There's a compute component, and there's a memory component, and you pay for more memory to be allocated to your serverless function. And whilst it's not always the case, that price is a direct proxy for energy consumption. It stands to reason that the price of energy is a component in how AWS decide to price their services. And so understanding how these different components of your code play together will inform what you're trying to do. When it comes to which language is the best, again you have to ask the question or for what. But as the two papers I mentioned show, C is generally the best language for all of them, although not quite the case for memory. But in the top two or three languages you've always got C and Rust and Go is pretty good as well when it comes to execution time in memory. But if you're writing everything in C, then you're trading off other things like memory safety and understand the maintainability of the code. Because writing in C is very low level, it is very high performance, but is the result the result of writing in memory unsafe languages is the cause of a large number of security issues we've seen over the last few decades. Those are somewhat mitigated by switching to Rust, but Rust is a very difficult language to learn. That's certainly what I found when I was learning it. And I think most people say that there is a quite steep learning curve. Once you get it, then it's a good language. But there is that learning curve there, and these are the challenges that CTO has to think through is what are they trading off? Maybe it's developer productivity, and it takes a bit longer to get code into production because you've got onboard new engineers into Rust. At a large company, that might not matter; at a small startup where weeks or months of additional development time could mean the difference between success and running out of money then maybe that is more of a consideration. And writing Web services in C, I don't see that very often. It's more typescript and JavaScript. As a product develops, then maybe you split things out and you might have a particular component that's rewritten in a more performant language. Writing algorithms in C that certainly makes a lot of sense compared to writing in Javascript, but it really depends on what you're trying to do. So thinking about these things, I think, is important and having those numbers appear in the CI builds would be interesting. You might want to see code coverage. You might want to see the cost of your infrastructure. You might also want to see the carbon impact or the energy impact of your infrastructure. And then you can then see which of those you want to optimize for which you want to change. I think it's more likely that it's going to be trying to avoid sudden spikes, certainly with cost. You want to see that your code didn't inadvertently deploy something that's going to cost you 10 or 20 times more than you expected. And I think potentially we'll see the same thing with the carbon impact. And you don't want to see a sudden spike of carbon impact that you weren't expecting. And removing those unintended consequences, I think, is probably going to be the first step. But as soon as you get into the optimization of very small numbers or you're starting to spend a lot of time optimizing for things like energy consumption, you'll probably also optimize for memory and also trying to understand how to make that particular operation more performance, and that comes with scale. And so these are all the things that are considered as part of a project rather than just focusing on a single number.

Gaël: That's very actionable tips that you shared right now! Being mindful of time, my last question would be “What would be your recommendations to learn more about digital sustainability and sustainable computing - your area of expertise?” It can be books, video experts to follows, articles, et cetera.

David: I think the best way to learn more about this is to get into the fundamental science of how sustainable energy and energy science works in general. So I would highly recommend the master's degree that I did at Imperial College. But in the absence of being able to spend a full year full time and the cost of that, then there are two books which were on the reading list for the course. The first one is a book simply called “Energy Science”. It's an Oxford University press publication by Andrews and Jelly. I think it is that is a fundamental it's fundamental science about how energy systems work and a brand new edition was just published this year, I think which is good more broadly in terms of sustainable energy. There's a book by McKay called Sustainable Energy Without the Hot Air, and that is a very pragmatic and realistic look at what it would take to solve the energy challenges that we have and the transition to clean energy. Because often there's a lot of talk around big projects like “Well, let's just plan hundreds of millions of trees” or “let's just shift to solar energy” without actually understanding the big challenges were “just” doing that thing. And the emphasis is often on the word “just” rather than trying to think through what it practically means. And that book actually does all the math behind it. It's slightly out of date, but it is still generally valid, and that gives you a very good understanding of how sustainable energy should work and can work.

Gaël: Thanks a lot. So go back to basics that will be your main message with these two books.

David: I think so, Yeah, understanding the science because that's the foundation of everything

Gaël: Which is very consistent with everything you shared with us in this episode. So thanks a lot for attending. I feel that there are a few dozen questions that I would have loved to ask you and to go a bit more in depth. But maybe another time! So for that episode, thanks a lot for being with us today.

David: Excellent. Thanks for having me on.

❤️ Never miss an episode! Hit the subscribe button on the player above and follow us the way you like.
📧 Our Green IO monthly newsletter is also a good way to be notified, as well as getting carefully curated news on digital sustainability packed with exclusive Green IO contents.
  continue reading

53 에피소드

Artwork
icon공유
 
Manage episode 339427646 series 3346125
Gaël DUEZ에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Gaël DUEZ 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.
In this episode, we go to London and meet David Mytton the co-founder and CEO of Console and also an academic doing his PhD on sustainable computing in Oxford University 🎓. All David’s work follow a consistent thread that could be described as “Effective Sustainability requires reliable and transparent information”. Under this angle, we discussed the environmental impact of datacenters and why the estimates vary so much, the accuracy of carbon footprint calculators for websites as well as the choice of language and CI/CD when we aim to develop greener code 👩‍💻.
❤️ Subscribe, follow, like, ... stay connected the way you want to never miss an episode!

Learn more about our guest and connect:


📧 You can also send us an email at greenio@duez.com to share your feedback and suggest future guests or topics.

David's sources and other references mentioned in this episode:



Transcript


Gaël: Hello, everyone. For this episode, we go back to London to have an amazing talk with David Mitton, the co-founder and CEO of Console, an Angel investor and an academic doing his PhD on sustainable computing in Oxford University. Now, let's be honest. Interviewing David is both a privilege and a challenge, a privilege, because he ranks super high on my list of world class experts in digital sustainability among people like Gauthier Roussilhe, for instance, but also a challenge because he covers so many angles in green IT that I would like to record a 4 hours long episode. Still, there is a consistent threat among all David's work that could be labeled as “Effective sustainability requires reliable and transparent information”. Under this angle, we discussed the environmental impact of data centers and why the estimates vary so much, the accuracy of carbon footprint calculators for websites, as well as the choice of language and CI/CD when we aim to develop greener code. But let's talk a bit about David first from 2009 to 2018, David was co-founder and CEO at Server Density, a London based SaaS infrastructure monitoring startup which got subsequent VC funding and achieved to attract more than 700 customers. When StackPath acquired the company, he joined it for two years as their lead product engineer. Then he completed an environmental technology MsC at Imperial College, London, where he developed his interest in sustainable computing. He continued that research on the sustainable team at Uptime Institute. Eventually, he launched Console in early 2021 to provide the best tools for developers while starting a part-time PhD in Oxford University. Welcome, David. Thanks a lot for joining Green Area today,

David: Hello. It's very kind introduction, Thank you very much.

Gaël: First of all, what did I miss in your bio?

David: It's a good question because my hobbies overlap very much with what I do on a daily basis, which is the console.dev newsletter to help engineers find tools. And I just really enjoy playing around with computers. And so I do that all day playing with dev tools and Cloud infrastructure and then at the weekends - and sometimes during the week as well - I'm also investigating the sustainability of everything I've been playing with during the week because I think it's important for us to be able to continue using all these digital services whilst reducing the environmental impact. And so my hobby is the academic side of things, which is why I'm starting a PhD at Oxford to continue the work that I first got interested in at Imperial in sustainable computing.

Gaël: So you could describe yourself as a sustainability geek.

David: Yes, I think so. Technology broadly and then sustainable technology, specifically

Gaël: How this interest in sustainability, and especially the sustainability in ICT came first. Did you have some kind of ha ha moment?

David: It came once I sold my company in 2018, and I joined the acquirer as you mentioned StackPath and ran their product engineering team for a bit. But then I was thinking about what challenges we're going to face as a civilization over the coming decades and how I could apply my time, my skills to contribute to solving some of those issues. And I decided that climate change was a major challenge that really needed to be fixed. This was pre-covid, of course, so that was the biggest challenge that I saw at the time, and I decided I needed to bootstrap my knowledge in environmental technology and environmental sustainability in general because I have absolutely no or had no background in it prior to that. And so I did a masters degree at Imperial College here in London, in environmental sustainability and technology, and that really covers, covered, everything from fisheries management through to clean energy and vectors for disease control with mosquitoes. But I specialize specifically in energy technology and that I chose that because I thought that was the most tangible and applicable area that was currently in motion at the move to renewable and clean energy essentially. So specializing in energy, I then was able to link my experiencing cloud and software to start getting into the sustainability aspects of the usage of I t globally.

Gaël: So we mentioned reliable metrics being a pillar of your work, both as an academic and as an entrepreneur. And I'd like to have your opinion on the carbon calculators for websites which are blossoming across the world. Sorry if this question might be a bit controversial but are they reliable? All of them? Only some of them? And what could be the use?

David: Generally they're not very reliable, and they're not much use. In my opinion, the problem comes down to the methodology and I would distinguish general website calculators from the Cloud Carbon footprint calculators that are provided by the cloud providers, which all three of them Amazon, Google, Microsoft, they now will have their own calculator. I put those in a separate category, the ones that I think you're referencing just general Calculators are available from a number of different places online

Gaël: and dedicated for websites, not not the full infrastructure. Most of the time.

David: That's right. Yeah, So the way that they are working is they generally scan your website and look at all the assets, CSS, images, JavaScript, all that kind of stuff and calculate the total page size. And then they apply an energy factor, which is typically taken from academic literature and then multiply the energy factor by the page size, and that gives you the total amount of energy that is consumed on a single page load. And then that can give you a number that you can use to calculate based on the number of visitors to your website, and that gives you the total energy consumption and then governments around the world they all published carbon factors, which allows you to convert that energy consumption into carbon emissions. And that's the number that usually gets out of these calculators. But it makes a fundamental assumption around the energy factor, and typically you see these being taken from academic literature that provides a single number. It's often kilowatt hours per gigabyte, and then they multiply that by the data transfer. And that has some major limitations, which means it's essentially inaccurate and not much use. And this is compared to the carbon footprint calculators you get from the cloud providers, which will go into your full infrastructure and all of the details. And they look at their actual energy consumption from all of the data centers and the full footprint of the infrastructure, whoever it is that you're using. And then they are portioning it across all their customers, so far more challenging with those because different companies are more transparent than others. But because they're using actual energy consumption rather than the calculators which are just using factors multiplying them, they're significantly more accurate.

Gaël: We had an episode dedicated to Cloud Carbon Footprint and the different methodologies and pro and cons regarding the big three - I would say - carbon calculators with Cameron Kasher and Benjamin Davy. It was a fascinating discussion. But going back to the website calculator, is there any other bias?

David: The first challenge, I would say, is that it drastically simplifies how website infrastructure works and how websites are actually delivered to users. So if you're just looking at the page size and using that as the total amount of data transferred, then that misses out a huge part of what's going on behind the scenes. So, for example, is that data downloaded every single time you load the page? It is the first time, but on the second and the third times it's not. It's cached by the browser. And then you've got caching happening on the infrastructure behind the scenes because there are content delivery networks. There's also processing that can happen in the browser. So if you're executing JavaScript or playing a video, then there's processing happening in the browser. But there's also processing happening on the service side if you're doing any kind of service side rendering or querying databases, so you're missing out some major components in just how the page is rendered. And then the energy factor, this number that you get kilowatt hours per gigabyte.This is an average, so it doesn't really capture any of the key differences between different types of applications. It's just looking at the average amount of energy per unit of data and misses out things like Web browsing versus video streaming, for example, which is much more intensive on the data volumes. Often you'll see a number that has been cited from a paper by Aslan from a couple of years ago as linear. Commonly, you'll see it as 0.65 kilowatt hours per gigabyte. That's assuming the number has been correctly adjusted because the number that they publish in the paper is actually for 2015. And you have to make adjustments for projected improvements in efficiency where they project. It's going to fall by half every two years, so you have to project that out to get the current year. The further out you make projections and make these changes, the less accurate they are, because it's very difficult to predict changes in infrastructure and improvements in efficiency and also that number from the Aslan’s paper. It only refers to fixed line networks. And that excludes a major source of traffic, which is mobile,

Gaël: Which is crazy because - and I speak about something that I have experienced when I was working in the prop tech industry - very often platforms reach 70 or 80% of traffic being done via mobile.

David: That's right, yeah, so the application is really important. And then finally, I think it only refers to infrastructure in countries with modern equipment, and that's relevant for countries in Western Europe. But as you have more infrastructure being deployed in countries that are not as advanced, then you're missing out potentially significant components of the environmental impact of that. So all of these things combine to mean that the number is basically useless, that you get from the carbon calculators. And actually it's more useful to look at something like the Google Lighthouse to use performance as a proxy for energy consumption. Because if the page is loading faster and you reduce asset sizes and you tweak your cashing and all those kind of things, then you're making more efficient use of the infrastructure just

Gaël: To close the topic of website carbon calculators. I understand you don't believe in the accuracy of the numbers provided but are they consistent? Meaning that the number is wrong but the evolution could still be interesting to follow and to track just to know if you're going into the right direction.

David: I don't think so, because many of the things that you change as a result of, say, the Google Lighthouse project would not be reflected in the carbon calculators, the website carbon calculators. So, for example, if you improve the caching and the calculator doesn't look at the caching - it's just looking at the full page load - you've made an improvement that is invisible to those calculators, and it would still show the exact same carbon number. The challenge is that it's somewhat unintuitive that increased amount of data would have minimal or no effect on the amount of energy consumed. Because you assume that if you're using more data then it needs more energy and generally that's not the case. This is an emerging area, I suppose, in academic research, and there's a limited amount of our papers to point to. But over the last two years, thereabouts, there have been some papers that look into this in more detail and the approach I describe around using the factor multiplied by the data transferred is known as kind of an allocation method where you're allocating the data to the energy or vice versa, and that has its uses. It's useful for reporting because if you know the amount of data that you transferred, the amount of energy that you've consumed by looking at your electricity meter too, then you can divide them and allocate them and then you get an intensity figure and you can look over time at your those two numbers and hopefully see a reducing energy. Improving energy efficiency to reduce intensity factor. So over time, if you're transmitting more data but you're using less energy, then your intensity would decrease over time, and that is useful to track. But it assumes you've got access to the data volume, which generally isn't that difficult to get access to. But it's more challenging to get access to that energy number because it's not exposed by the cloud providers. And unless you are racking your own equipment in a data center, you're not going to be able to get access to it. And the challenges that this unintuitive nature of how networks work, it comes down to understanding how those networks are deployed and, as it turns out, the way that networks work as you deploy the network equipment. And it has a baseline amount of energy consumption, which is a factor of the capacity of that piece of equipment. But it is almost completely disconnected from the actual usage. It's not correlated. Certainly not linearly correlated. So networks are provisioned for peak load and they’re provisioned for redundancy. And when you add that new equipment, that's when the energy increase occurs.Certainly the energy increase that is significant in terms of measuring. It's not when you push more data through the equipment, at least not on a linear basis, and that is the common error that is made with these carbon calculators. The website Carbon Calculators is that they assume that energy consumption is linear and it's proportional to the amount of data that is transmitted, which isn't the case.

Gaël: Thanks a lot for your honesty and straightforwardness about these tools. We could already start grasping it, and I mentioned it in my introduction, you’re a true infrastructure expert as well. You wrote several articles, academic articles on the environmental footprint of data centers both Cloud and Private Data Center. I'd like to start by not talking about energy for the moment and focus on water. Thought leaders in digital sustainability like Gerry McGovern, for instance, in his New Thinking newsletter are more and more vocal about it. Several hyperscalers experienced severe pushbacks in the Netherlands or in Denmark, for instance, regarding the water consumption. And I rediscovered recently that actually you wrote something about it back in February 2021. So what could you tell us about the water consumption of the infrastructure and maybe more specifically, the data centers?

David: Water consumption of data centers is an interesting topic because it is a lot more challenging to explain than energy consumption and carbon emissions. So when you're thinking about carbon emissions our goal is zero - net zero essentially - for carbon emissions, and that's a very simple marketing message. It's a very simple goal, and everyone can understand what we're trying to do. So when we're transitioning to clean energy, we mean zero carbon energy, and the goal is obvious. Water consumption is not so clear cut because the goal is not always zero water and you have to understand the context of the location of the data center to decide what the goal should be. So if you have a data center that is deployed in a region that has abundant water, perhaps it's next to a lake or a source of water that is not stressed in any way and is well maintained, then the water consumption - you should pay certainly for access to water, and you should consider the efficiency and making sure that your usage is reasonable - but it doesn't need to be zero, and you can invest your resources, your capital resources elsewhere. That is completely different from building a data center in a region that is very hot and has challenges with the governance of the water supply, or has other users of the water supply that are potentially of higher importance and where there might be drought and common problems with stress to the water system and in those regions, then zero water might be a goal that you use, but it depends on what you're trading off, because sometimes reducing water consumption can mean an increase in energy consumption and so you have to understand which one you're trying to optimize for. There are also some challenges of what you mean by water consumption, because in a data center there are two different types. There's direct water consumption, which is typically what people think of when they're thinking of what data centers do with water. And that is the use of water for cooling, typically in air conditioning and chiller units within the data center to keep the environment at a set temperature so that the IT equipment that can operate efficiently. But that's actually the smaller part of water consumption. Indirect water consumption is quite significantly larger, and that is the water consumption associated with the power generation that goes to produce the electricity for a data center. Because around the world we are still using majority fossil fuels and fossil fuels are part of thermo electric power generation. Water is a key component in how our electricity is generated because the fuel is burnt to turn water into steam and the steam turns the turbine and that is what generates the electricity. Or if you're using hydropower in some regions, then there can be even higher water consumption there because the water is being used to generate the power as part of the hydroelectric dam. For example, whether that is a true consumption depends on the downstream users. You might use that water somewhere else. Um and so you kind of got a dual use for it. But sometimes the water is discarded, or, um is not used again, and that can be actual consumption. And so you have to combine these two factors to think about what the data centers actually drawing, and where is it drawing it from, and also where you're generating your electricity for the data center, which could be quite a long way away from where that data center is and what is the water footprint in that region? And understanding the relative stress of both of those locations is important to understanding the true water footprint of a data center.

Gaël: And you say that a significant part of the water footprint is actually caused by electricity consumption. Could you give a rough estimation?

David: Off the top of my head? I think it's 2 to 3 times more water consumption from the power generation, but I would have to refer you to the paper that you mentioned that I wrote. There's a graph in there. There's also a graph in the 2016 Data Center Energy Report by Shehabi and Massenet, which includes a good breakdown of the water consumption data centers as well. So for the actual numbers, have a look at those two papers.

Gaël: I will. And actually, my listeners know that every article, document that is mentioned in an episode is put in the episode show notes. So now that we talked a bit about the water footprint - and thanks a lot for sharing this bit of counterintuitive information once again; the second one in the show - let's go back to energy. And, you wrote very recently an article about - I'm sorry to put it in plain English - but the mess about estimates around the data center energy usage. Could you tell us a bit more why? Which one should we rely on? And are we on the verge of a catastrophic increase in electricity consumption in data centers that will imperish the energy transition or not? [laughing] Scary music

David: [laughing] So you're referring to a paper that I co authored with my colleague at Oxford Masaõ and we looked at all of the data Centre energy estimates from 2007 to 2021 this was just published in July as a review article, and we looked at 46 papers or at least 46 were suitable for inclusion. We actually looked at hundreds and hundreds of papers, but we had some quite strict inclusion criteria because we wanted to ensure that they were original calculations. The calculations were transparent enough that we could have a look at the sources and the methodology, and that they applied to a specific region that was large enough to be an estimate for a particular geography. So, for example, it could be a country, or it could be a region of a of a larger geographical area such as the EU, and we extracted 258 individual estimates and looked into almost 700 different sources because the key to creating accurate estimates is the variables that are used in the methodology and the values that you put in for those variables, and our results showed that over 30%, or thereabouts 30%, were from peer reviewed publications, but the majority were not so 38% were from non peer reviewed reports. A large number of those lacked clear methodology, and it was difficult to find where the data had come from. And both IDC and Cisco made up almost 60% of all of the sources. So just two companies provided the majority of the sources. And the particular challenges that many of the sources about 11% had broken Web links and 10% we just couldn't find. And whilst the goal of the review is not to highlight any individual publication, but we did give some examples of some good publications about their methodology, we made some recommendations to people who actually are using these numbers and producing them. So three groups, in particular the general end users, so journalists, people working at nonprofits or advocacy groups who don't necessarily have a scientific background looking at the academic articles and be able to trust the numbers that are published are actually correct so they can do their wider work. Then there's the research community who's actually producing these, which is typically academics. But also the third group, which is particularly important, is growing in importance is policymakers and regulators within the grid operators, for example, people within the energy sector who are planning the infrastructure that delivers electricity to data centers and to homes and offices and cities because it takes a very long time to build out the energy infrastructure. And so the challenge that we spotted originally was just the wide variance in the numbers you mentioned. You can find energy consumption estimates ranging from 200 terawatt hours a year to almost 8000 terawatt hours a year, depending on the year that the estimate is for. And that range is completely absurd. Because it's so wide, multiple orders of magnitude. It just makes it completely useless. And rebuilding our energy infrastructure to deal with two or 300 terawatt hours of total global energy consumption is completely different from building something out to deal with 8000 terawatt hours. And so the motivation behind the article was to figure out where all these numbers are coming from and try and understand why there might be such variance in the literature.

Gaël: These numbers, these extreme numbers, do they come from, I would say non academic, non peer reviewed articles and studies, or actually well documented and written papers following a methodology which is replicable with open data, etcetera. They actually agree to disagree also, and they also find a very wide range of energy consumption.

David: It depends on the methodology of the paper. So there are several different approaches to creating data center energy estimates. You've got bottom up modeling so that looks at the equipment that is deployed and the energy characteristics of that equipment from the manufacturers and from databases like spec Power, which do testing of equipment. And then that is combined with market data. To try and calculate how many of these pieces of equipment, how many servers, how many these drives all those kind of things have actually been sold and deployed. And how is that going to change over time? And so by creating these models of how much has been sold? This is what is the power consumption of that equipment you can start creating some numbers around. What would the energy consumption of that equipment be? The next method is a top down methodology, and this is an example where you would get numbers from an organization or a government that has actually looked at the amount that's consuming. So a classic example is a telecoms operator or an owner of a large scale data center they would look at all of their energy consumption, and then they'll publish it. And by taking that single number of the total energy consumption, you can apportion it across users or data transferred or something like that to provide a kind of a historical estimate of what was actually used and then apply that per user or per subscriber. And then there's a third type of methodology which uses either of those two top down or bottom up methodologies as a baseline and then applies an extrapolation figure out into the future based on some or several scenarios of what they expected growth will be. And this is useful to make projections out a couple of years into the future. And when you look at the bottom up methodologies and top down methodologies. They do this, they will take their historical figure, and then they will apply a growth factor and consider what it might look like a couple of years into the future. The difference with the extrapolation based models is that they project very far out into the future. And as I said earlier, the further out into the future you project, the less accurate your projections are going to be, because it's very difficult to account for all of the potential changes in infrastructure and energy efficiency and government policies and all those kinds of things. And what we see is that the very large range of estimates typically comes from the papers and the reports that use this extrapolation based methodology. They say If these trends continue or something happens, then energy will increase significantly over time. But the common characteristic of all these is that you never see that massive growth in the past. You see it gradually growing, sometimes plateauing, but there is never a sudden massive increase. But then, in these projections, you'll see at a certain point the growth factor will be applied and the graph in the paper goes exponential and you see crazy growth. And that's when you get the numbers as high as 10,000 terawatt hours. And these are just not realistic because they haven't seen that happen in the past. But B, that is not how technology tends to evolve because there are efficiencies and that you get from new technologies and things change. That means that these values just don't make sense.

Gaël: Fair point. So albeit the energy consumption of data centers is something that is very important, that should be well taken into consideration in the energy mix to meet Paris agreement regarding greenhouse gas emissions that should not be covered with an apocalyptic tune.

David: I think what generally happens is the person who is writing or creating a project has already decided what they want, the number to show, and they then go and find the academic research that supports those numbers. And you can see this in several examples that we use in the paper, which reports that publish value judgments based on whether data is useful or not, or whether we should cut back on wasteful behavior. And whilst there is something to be said for changing user behavior, my view is that's a waste of time and should be down to the user to decide what is useful to them, and nobody else should be making judgments about how they're using technology and actually our goal should be to make technology sustainable broadly so that regardless of what the user does, then they can benefit from sustainable infrastructure because I don't think we're going to change user behavior sufficiently enough to have an impact. And we should make that change on behalf of the user by changing the infrastructure, so essentially they have no choice about it. But when you make value judgments about whether technology is good or bad, or certain technologies are good or bad, then I think that gets into too much of a kind of like a nanny state type approach where you're controlling what people are doing based on your own perspective rather than based on what the users want to do. And like with anything, you can find statistics to back up any point you'd like to make. And that has been a particular challenge. I think, in the realm of data center energy consumption, because you can use these papers that have since been quite significantly debunk. There have been challenges made to the methodologies, but you continue to see people citing the older papers just because they back up the point that they had already decided they wanted to make

Gaël: Thanks a lot for all these insights regarding the environmental footprint for data centres. I wanted to ask you a very simple question, which might take a bit of time to answer. Let's say I'm a CTO. I'm a lead engineer. I'm just simply a concerned Software developer. How should I get started to write greener code, not taking into consideration the infrastructure? Because sometimes you're in an organization where the infrastructure is as it is, not necessarily in a devops environment. And I've got this crazy question also, at some point, do you believe that at some point we will see in good CI/CD pipelines, sustainable variables being taken into account?

David: There's a good paper on this by Pereira et al., which was published in 2021 as a follow up to a paper they published a couple of years before that, which looks at the energy efficiency of all the major programming languages. And it gives some advice on how engineers how developers should think about this because it does come down to language choice. But it also comes down to what you're trying to optimize for, and often we think that well, because energy is basically power multiplied by time, then you can reduce the time of the execution to reduce the energy consumption, and that does have a factor to play. But there are challenges with how code is deployed. That means that might not necessarily be true. So, for example, you can reduce the amount of time that it takes to complete a calculation by deploying that calculation across 10,000 servers. Assuming that the operation is parallelised, for that doesn't reduce the energy consumption, but it does reduce the time. Then you have to think about “Okay, so what are you trying to optimise”? Are you trying to optimise for just the raw energy consumption? Are you trying to optimise for the time it takes to complete the processing or you're trying to optimise for memory? And that third one is the one that's often forgotten because there is an energy impact for memory consumption, and you can kind of see that by how the cloud providers bill for things like serverless functions like AWS lambda. There's a compute component, and there's a memory component, and you pay for more memory to be allocated to your serverless function. And whilst it's not always the case, that price is a direct proxy for energy consumption. It stands to reason that the price of energy is a component in how AWS decide to price their services. And so understanding how these different components of your code play together will inform what you're trying to do. When it comes to which language is the best, again you have to ask the question or for what. But as the two papers I mentioned show, C is generally the best language for all of them, although not quite the case for memory. But in the top two or three languages you've always got C and Rust and Go is pretty good as well when it comes to execution time in memory. But if you're writing everything in C, then you're trading off other things like memory safety and understand the maintainability of the code. Because writing in C is very low level, it is very high performance, but is the result the result of writing in memory unsafe languages is the cause of a large number of security issues we've seen over the last few decades. Those are somewhat mitigated by switching to Rust, but Rust is a very difficult language to learn. That's certainly what I found when I was learning it. And I think most people say that there is a quite steep learning curve. Once you get it, then it's a good language. But there is that learning curve there, and these are the challenges that CTO has to think through is what are they trading off? Maybe it's developer productivity, and it takes a bit longer to get code into production because you've got onboard new engineers into Rust. At a large company, that might not matter; at a small startup where weeks or months of additional development time could mean the difference between success and running out of money then maybe that is more of a consideration. And writing Web services in C, I don't see that very often. It's more typescript and JavaScript. As a product develops, then maybe you split things out and you might have a particular component that's rewritten in a more performant language. Writing algorithms in C that certainly makes a lot of sense compared to writing in Javascript, but it really depends on what you're trying to do. So thinking about these things, I think, is important and having those numbers appear in the CI builds would be interesting. You might want to see code coverage. You might want to see the cost of your infrastructure. You might also want to see the carbon impact or the energy impact of your infrastructure. And then you can then see which of those you want to optimize for which you want to change. I think it's more likely that it's going to be trying to avoid sudden spikes, certainly with cost. You want to see that your code didn't inadvertently deploy something that's going to cost you 10 or 20 times more than you expected. And I think potentially we'll see the same thing with the carbon impact. And you don't want to see a sudden spike of carbon impact that you weren't expecting. And removing those unintended consequences, I think, is probably going to be the first step. But as soon as you get into the optimization of very small numbers or you're starting to spend a lot of time optimizing for things like energy consumption, you'll probably also optimize for memory and also trying to understand how to make that particular operation more performance, and that comes with scale. And so these are all the things that are considered as part of a project rather than just focusing on a single number.

Gaël: That's very actionable tips that you shared right now! Being mindful of time, my last question would be “What would be your recommendations to learn more about digital sustainability and sustainable computing - your area of expertise?” It can be books, video experts to follows, articles, et cetera.

David: I think the best way to learn more about this is to get into the fundamental science of how sustainable energy and energy science works in general. So I would highly recommend the master's degree that I did at Imperial College. But in the absence of being able to spend a full year full time and the cost of that, then there are two books which were on the reading list for the course. The first one is a book simply called “Energy Science”. It's an Oxford University press publication by Andrews and Jelly. I think it is that is a fundamental it's fundamental science about how energy systems work and a brand new edition was just published this year, I think which is good more broadly in terms of sustainable energy. There's a book by McKay called Sustainable Energy Without the Hot Air, and that is a very pragmatic and realistic look at what it would take to solve the energy challenges that we have and the transition to clean energy. Because often there's a lot of talk around big projects like “Well, let's just plan hundreds of millions of trees” or “let's just shift to solar energy” without actually understanding the big challenges were “just” doing that thing. And the emphasis is often on the word “just” rather than trying to think through what it practically means. And that book actually does all the math behind it. It's slightly out of date, but it is still generally valid, and that gives you a very good understanding of how sustainable energy should work and can work.

Gaël: Thanks a lot. So go back to basics that will be your main message with these two books.

David: I think so, Yeah, understanding the science because that's the foundation of everything

Gaël: Which is very consistent with everything you shared with us in this episode. So thanks a lot for attending. I feel that there are a few dozen questions that I would have loved to ask you and to go a bit more in depth. But maybe another time! So for that episode, thanks a lot for being with us today.

David: Excellent. Thanks for having me on.

❤️ Never miss an episode! Hit the subscribe button on the player above and follow us the way you like.
📧 Our Green IO monthly newsletter is also a good way to be notified, as well as getting carefully curated news on digital sustainability packed with exclusive Green IO contents.
  continue reading

53 에피소드

Tất cả các tập

×
 
Loading …

플레이어 FM에 오신것을 환영합니다!

플레이어 FM은 웹에서 고품질 팟캐스트를 검색하여 지금 바로 즐길 수 있도록 합니다. 최고의 팟캐스트 앱이며 Android, iPhone 및 웹에서도 작동합니다. 장치 간 구독 동기화를 위해 가입하세요.

 

빠른 참조 가이드