The Moodle Podcast

Moodle's scalability: A conversation with Jon Miles, Head of Tech at Titus Learning

April 26, 2023 Moodle Podcast Season 1 Episode 10

In this episode, Marie Achour, Moodle Global Head of Product, interviews Jon Miles, Head of Technology from Titus Learning,  Moodle Premium Certified Partners.

In this podcast, Jon Miles shares the Titus Learning approach to scalability and experience in scaling Moodle for large installations.

Learn more about Titus Learning: https://moodle.com/partners/titus-learning-uk/
Find out more about Moodle LMS here: https://moodle.com/solutions/lms/

Visit Moodle at Moodle.com

Speaker 1:

Hello and welcome to the Moodle Podcast.

Speaker 2:

Hello everyone, and welcome to this edition of the Moodle Podcast. My name is Maria Shore and I am the global head of product here at Moodle hq. I'm delighted to be joined today by John Miles, head of technology at Titus Learning, one of our biggest Moodle premium certified partners. Today, John and I will be discussing Moodle at scale. I can't wait to learn from John's experience in this domain and his approach to scaling Moodle for large installations. Hi John, and welcome to the podcast.

Speaker 3:

Hi, Mary. Thanks very much for having me.

Speaker 2:

So, firstly, can tell us a little bit about who you are and what it is that you do.

Speaker 3:

Well, hello everyone. I'm John. I head up the technology team here at Titus in the uk. Uh, I've been lucky enough to work in tech for tech organizations for the last 30 years. 15 of those were specializing in open source technologies, and obviously Open Moodle is a, as a, as a key open source technology, which we're huge supporters of. I've seen a huge amount of change in that time, but essentially my role here is to drive innovation into the technology solutions that we deliver for our customers. Mm-hmm.

Speaker 2:

Sounds like a big job,<laugh>. So Titus is an award-winning Moodle certified premium partner. Can you tell us more about Titus and your team, please?

Speaker 3:

Sure. Well, I'm talking to you from headquarters here in Salt Air Village. This is in, uh, Yorkshire, in the UK. Now, this area that we're in actually is one of only 33 UNESCO World Heritage sites here in the uk. So it's a, it's a really cool place to be. Wow. We've developed a tech hub here and we've got a whole bunch of our team that are based here. All of our tech guys are all here. And, uh, it's an old mill and we have the sort of top floor of the mill where, where we all collaborate and, and hang out. My team comprises of a bunch of solution architects, a bunch of designers, engineering teams, and we design and we build and operate innovatively e-learning platforms. And we do that with Moodle, uh, at the core.

Speaker 2:

Okay. Yeah. Sounds like your offices would be worth a visit. So one of the, the things that Tides specializes in is, um, creating secure, reliable and stable Moodle hosting environments. Can you tell us a little bit about that type of service? What kind of, what kind of service does it include and, and what it is that you offer your customers?

Speaker 3:

Sure. Well, you know, times have changed. We've moved away from having on-premise comms rooms or data centers in our customers environments. So all of our, all of our hosting is now managed, uh, in the cloud, and we partner with aws. So Amazon Web Services and 99% of all of our customer solutions are hosted within, uh, aws. And the reason for this is basically agility and the speed and the flexibility to be able to spin up Moodle platforms and operate those at scale 24 7. And we use AWS's footprint around the world. So yes, we're here in the uk, but a lot of our customers are global. So we provide the, uh, technical skills, the people, the expertise to compliment what our customers have themselves in-house. It enables us to create mood environments and support those environments for our customers. And that includes things like security, the underpinning technology stack, um, which is sort of all open source. So it's un to operating systems, it's jinks web servers, it's MySQLs php. Of course, we then make sure that we not only build that for our customers, but we run and we maintain that and we underpin that with, uh, with service levels throughout their contract. Um, and it is a partnership with our customers, and they reach out to us and their administrators will reach out to us on a, on a support arrangement if they need any help. And of course, if we see anything proactively, uh, that may be wrong or impacting the performance of, of their services, we would reach out to our customers. Uh, and that's really what we, we call the managed, uh, service.

Speaker 2:

Mm-hmm.<affirmative>. Yeah. And so talking about those customers, can you give us a couple of brand names or examples of some of those big Moodle installations that you support through this model? Yeah.

Speaker 3:

Well, we've got about a hundred and sort of 30 year so customers, and they range from traditional sort of schools and universities and colleges all the way through to large enterprise and multinational organizations and government organizations. One of the, one of the largest ones, and one of the ones that we're most proud of is the, uh, national Rail operator here in the uk. It's called Network Rail. This is still one of Europe's largest Moodle workplace installations. We have very aggressive SLAs as part of this, of their requirements. Uh, they have an interesting model in that they have a bunch of in-house expertise, a hundred thousand or so in-house users, but cause it's a rail operator, they heavily depend on third party contractors. So Moodle Workplace was ideally positioned with its multi-tenancy. So we have thousands of thousands of, of tenants, uh, on the platform as well as all the in-house, uh, in-house users. And we had to build this platform to scale significantly. We had very high, uh, usage demands coming through. We've used AWS's multi, what we call multi availability zone. So we've effectively created a footprint across multiple zones for resiliency reasons, and also to make sure that we can deliver the service levels that that's demanded of us. Each one of those zones operates, there's three of them in total, and each one operates at a third capacity all the time taking traffic. So it's a really big operation. We've been very proud of the service levels that we've achieved for Network Rail, and we've run the Service now for, for those guys for three years. But it's been a really good test for us to really get to grips with how do we scale something that is beyond, I guess, just having a standard bunch of servers. Yeah. Um, how do we scale it to deal with huge capacity spikes and, and large demand.

Speaker 2:

Super impressive. So, and, and it, it's really a great example because there is a bit of a misconception in the market sometimes that Moodle isn't as scalable as some of our competitors, and we both know that's not true. And, and the example you just gave just kind of demonstrates that. But we'd, I'd love to hear a little bit more from you about, you know, you're the one actually creating these success stories on the ground. So how does Titus and and your team provide a detailed Moodle solution designed to provide that kind of scalability? What's your process like

Speaker 3:

Globally? There are instances of Moodle that are running right now with well over a million users configured against them. Yeah. So Moodle, Moodle can be scalable if you're using the right tools. So Titus, we, we don't believe in one size fits all. So instead of just stamping out, you know, standard Tech stacks and expecting them to perform, we, we really try and focus on what is the architectural design required to meet a particular use case. Now, in the case of Network Rail or one of our very large global customers, they often have huge capacity spikes at a particular time of day. And they use a journey as a very specific, for example, it might be that they have to complete a particular certification and submit that certification on a particular day, which of course creates a huge sort of spike. So, so what we try to do is to understand what that, what that user journey is and what that looks like. And then we build every customer, uh, uh, into what we call a solution design that is a document as a, as a document with a whole bunch of visual architectural, uh, drawings on it, which show how the traffic is gonna flow through the systems. And we use capability that is provided by aws. And AWS have a fantastic set of products, in fact, that enable us to, to use things like, uh, multi availability zones, which I mentioned before mm-hmm.<affirmative> caching, which is storing frequently used database results in memory security, uh, and automation to really help us to build a, a very scalable environment for our customers.

Speaker 2:

This question of, of availability, um, of course that is the benchmark. What kind of reliability percentages do you achieve with your solutions at Titus? I'm, I'm assuming you've got KPIs or SLAs that guide them. What kind of numbers are we talking about?

Speaker 3:

The gold standard for any, any organization, actually not, not just in the alert, but the gold standard for availability is 99.99% mm-hmm.<affirmative>. So that gives you something in the regional, like, you know, 20 to 40 minutes worth of downtime a month, no downtime. So, so when we design them, build the platforms we design and build them specifically for that. So if the customer has a requirement for four nines availability, that's what we build for them. So if we take our estate, what one of the things where we, we look at our data, so we've, we've roughly got about hundred 20 customers. That equates about two 30 platforms because a lot of customers have test environments that we use to do testing on, including load testing. Um, so what we, what we've seen is, so of all of those customers, if you add all the users up, we've got about 0.2 million users on our, on our platform. And what we found was interesting was that the, the data tells us that the concurrency of all of those users and, uh, you know, and all of those users on those customer platforms is between five and 10%. Mm-hmm.<affirmative>, what that means is we've got all this capacity there, but actually there's only up to sort of a hundred or so thousand users on the platform at any one time. So the data is really, really important for us. So in order to position an availability figure, we really under, we need to understand how those users are connecting into our system. So we will architect the system accordingly, and we will ensure that we understand the design limits of that system before we commit to putting an SLA around that. But every one of our customers, every one of those hundred customers has an SLA in written into their contract. And I'm really pleased to say that if we look at our data for the last 12 months and we take the averages, we have been hitting 99.99% every month. And of course there's always outages for customers, and sometimes they can be due to a bunch of content that is, is put onto those environments and they quickly run out storage, which can cause an outage. Or occasionally we see situations where a customer is heavily loaded for a particular small piece, uh, a small moment in time. So we put auto scaling in place to cater for that. Mm-hmm.<affirmative>, there are situations where outages happen, but when we see an outage or we see an operation in the shoot, we get onto that very, very quickly to keep those available figures, uh, as high as we can.

Speaker 2:

Yeah. And if you're achieving those numbers, then you've definitely got it. Right. We were talking about those short moments of high burst volume usage. What kind of things can you do? And I think you gave us a couple of hints in your answers already, but, um, to accommodate those kinds of traffic spikes when that there's this big significant volume at a specific period of time,

Speaker 3:

This is always challenging having a, a, a partner relationship with the customers. Really important. So the customer sort says, look, you know, we're doing this big piece of marketing activity and we're gonna drive all of the learners into the, into our lms, and we want them to do this, this, this, and this. We have some insight into what's gonna happen, but the, but majority of the time course we don't. So in effect, we're having to sort of react. So the first thing is that we would, we always do perform, we call performance benchmarking. So when we build a solution, we basically low test it, we stress test it, and we look at the design limits. So we sort of know what it can do and what it can't do. And of course, one size does not fit all here because every customer's got a different learner journey. So for example, we may see a school, uh, behave differently from an enterprise. For example, if a student has to submit a bunch of coursework, let's say, you can guarantee the student's gonna leave that right for last minute. So you'll see a big, big spike in traffic. So we, we test the system to that point. We know the concurrent database connections that it can handle. For example, we might have 10,000 concurrent database connections at the backend, or we might have 50,000 users at the front end. You know, that's ok. You know, we, we, we can manage that. So, so it's really about understanding behavior and making sure that we can scale for that. And there is something that, that I can sort of talk a little bit about, uh, around the auto scaling, which is a, a capacity on demand function, which has been really, really beneficial for us. So we can cater for a, a spike in traffic by throwing more web servers, for example, or more processing power into that. And that's part of the automation that, that we have within, within aws.

Speaker 2:

Great. And so talking about AWS and, you know, those three letters have come up a few times during our conversation so far. Can you tell me a little more about how they enable, um, you guys to really increase that flexibility, that scalability and the reliability of your solutions?

Speaker 3:

Well, it's, it's really critical actually. I know many of our customers will have been involved, I'm sure with, with cloud providers that, that's Azure or Google or AWS or Rackspace. Uh, they all offer a very similar level of, of product capability themselves. But the reason that we use AWS is because of that reliability. They are investing heavily in their product capability. So they have a bunch of port, they have a whole portfolio basically there of products, everything from storage through to, uh, security, through to on demand, uh, EC two capabilities, compute capability to, to automation and data. So, so we will, we will take these aws, so we have a bunch of AWS trained engineers in-house, and we'll look at the capability that they have, and we'll try to apply that capability wherever possible to quickly manage any particular situation that we may be suffering from. So for example, uh, it might be that we have a big customer that always has a very spiky sort of load profile. So we will design the system to deliver, you know, the majority of their service, but then we'll make sure that we can call on additional, uh, load capability to manage their spikes. Um, and that works really, really well for us once we've got it set up, once we've got it configured. I think without having aws, we would be in a situation where we're having to throw hardware constantly at the problem. So we may have a hugely high spec, very expensive server, but at 99% of the time we're not using any of that. It's just there to cater for that 1%. So, so yeah, so, so there's a whole bunch of architecting that we would do here, and that's why Amazon Web Services is a, has been a really important partner for us.

Speaker 2:

And you have some of their, you, I think you said you have some, some of their engineers, um, that work with you or for you at Titus

Speaker 3:

AWS is a special skillset and managing systems and operating systems and storage arrays and security is a specialist skillset. And what we've done is we've adapted, we have a team which we call a DevOps team mm-hmm.<affirmative>, and, and these guys basically are our engineering team. They look after all of our products that we build for our customers, but they also look after aws because AWS is a, is a product set. That's how we, that's how we treat it. We hire the, the best, the best that we can, and then we train all of these guys in the AWS certification program, which I would really recommend looking at if we just jump on AWS dot uh, dot com and have a look through that. So all of our engineers will be, are trained to what we call an AWS cloud practitioner, uh, level, and then they have an option to go on skill themselves further as cloud, uh, as cloud architect or as, or as, um, or as cloud security specialists, depending on, you know, on their, on their desired route. But we engage with that, with that training program. We've had a lot of success with it. And it may, and it ensures that all of our DevOps engineers have got the basic level of technical skills to support operating systems such as tu for example, uh, web servers which run jinks and databases such as Postgres or MySQL. So, so we always have a very close alignment to the, the minimum level of technical skillset that we need, and that then compliments the skillsets that our customers have, which are generally a mood skillset and generally mood administration sort level. So, so it compliments each other and, and it, it works really well.

Speaker 2:

Nice. So there might be some listeners to the podcast who still have some doubts around the, the security of cloud-based solutions. And I'd love to hear from you a little bit, if you can talk to us about that correlation between cloud-based security and physical on premise type security and that those two models, and why is this such a hot topic anyway?

Speaker 3:

It's a really interesting one. We take security really seriously here. I mean, the, our customers are entrusting us with their data. That's what that means in terms of physical and cloud security. If you think about it, it, it's the same thing. It, it's ne it's connected into a network. So whether that's hosted physically in a customer's data center or whether that's in the cloud, it's still connected to a network. So at any, at, at any point that you've got, uh, physical connection into a network, of course there is the ability for, uh, third parties to try and attack that or compromise that. And they're looking for ways in all the time. You know, we, we have security capability that's been established that we use to monitor within aws, all of our customer environments. So if there is any suspicious activity, we see that it's not just about the technical ability to protect our data. We, we rely on best practice accreditation certification standards to help us with that. So obviously we've got to factor in things like gdpr. We've got to factor in requirements based on different geographic locations globally and different government requirements for data. So we have an ISO accreditation, and this is really important for us. So we have, we have ISO 21 mm-hmm.<affirmative>, which really encourage, ensures that we focus on data security. It's not just our tooling, but it's our processes that, and it's our people and it's how we connect into our environments, for example. So if I try and give you the working example, so even here at Titus, even our AWS engineers who support our customer platforms, those guys have no direct access to any customer platform. Yeah. Which might sound a bit strange to your listeners, but cause we support those platforms, right. What I said 20 2001 helps us achieve is actually we can still connect into those platforms, but we, we authenticate against those platforms in a very secure way, and we track and log everything so that we know exactly who's connecting when and have no direct access. And of course, if we have no direct access, anyone that's trying to compromise those systems has no direct access either. Yeah. So we've been very fortunate that we haven't experienced major security breaches or anything, anything significant. But, but, but this is a complex subject and it does require us to maintain all of our software and, and identify any, any vulnerabilities very, very quickly and remove them for our customers. But it's a never ending ongoing project. Yes. Uh, but it never stops.

Speaker 2:

Yeah. We do a bit of the same in designing the Moodle product itself, of course. Um, and so we understand that continuous evolvement required in the continuous focus on it that we need to put onto it. So security's just one aspect of resilience. Uh, a truly resilient solution is a secure solution. But can you maybe share a little bit more about, um, the resilience of your designs for Moodle solutions?

Speaker 3:

Failure does happen. Um, we can, we can experience failure, and there can be lots and lots of reasons for that, particularly where you've got cloud hosted environments where the network between the user and the actual platform, we don't control any of that. Right? So, so what we do is we use capability to hedge our bets. So we very rarely would put, uh, a technology, a tech stack in one location. What we would try to do is to spread that tech stack around. So, and there's capability, just to use an example, which we call, uh, availability zones mm-hmm.<affirmative>. So we would have multiple instances of our customer's environments across a number of availability zones. So that in effect spreads any risk out. And it also ensures that we've got better, better performance actually, because we can use those availability zones. Back in the old days, there used to be this concept of, uh, this thing called disaster recovery. And you used to have a, a data center with a whole bunch of production services in it, and you'd have another data center down the road on a different part of the country, which would have the same infrastructure in it, but it will be switched off. It would be what would be called a cold standby. So if something happens, your, you'd run, you send all your tech guys in, and you'd get them all to spin up all these, start up all this, all this infrastructure new environment, there'd be an outage and eventually you'd have all your services running. Fantastic. So what we, what we're seeing in this model is that these availability zones are all operating online all the time, and traffic is being sent to all of them. They're all linked together. So if there's any problem in any part of the world or any part of the infrastructure, traffic would effectively re recognize that, and it would just simply root to environment that happens to be in different zone. So in fact, if anything happens to, uh, the network or, uh, some sort of security or deny the service attack into one of these environments or wherever it might be, the experience for the end user of that is effectively it's transparent. So they don't see it. Obviously our guys are running around and we're, we're trying to fix stuff, but the service is maintained for our customer. So in effect, it's a, it's a seamless, uh, handover. So, so, so this is, this is a really, really important feature for us for re resilience. And it's that design actually that ensures that our Moodles, our Moodle platform, sorry, are, uh, available 99.99% of the time.

Speaker 2:

Nice. I am ashamed to admit that I am old enough to remember those old, um, data centers and have spent a little bit of time in them too. One of the other really important aspects of Moodle as a platform, it's, is its propensity to be able to be integrated with other things. So I'd love to hear a little bit from you about integrations and relevant open API and Moodle standards that you work with and how you manage the concept of integrations in your solutions to your customers.

Speaker 3:

Sure. So we wanna make, um, it easy for customers to move data between systems. We believe that data entry should only be done once, so you shouldn't have to replicate data. And of course, we also need to identify where the single source of the truth is in our customers environment. One of the things that we've done to try and tackle that is to develop a capability as, as a unified capability where our customer systems can connect into their mood environments without any need to do development work. Mm-hmm.<affirmative>. So previously, if we go back a few years, uh, we can integrate, but it would be very, very point to point integration and it would require a whole bunch of developers coming along on both sides to do a bunch of development work. Its quite slow and painful, and we would get some sort of integration operational. What we've now done is we've said all of the interactions that Moodle can support, we have made those effectively available via an api. And we've called this Titus Connect mm-hmm.<affirmative>. So this, this means that any customer can come along and integrate their HR system or their Salesforce system, uh, or their Oracle platform or whatever they happen to have into Moodle without any need to do development work. So an example would be, let's say, uh, one of our customers has a new starter that joins their organization. They register that new starter on their system, their Oracle system, their HR system. And what that integration can do is, for example, that can then send a message to Moodle to say, there's a new starter here, or to enroll them in a whole bunch of courses because they need to do their minimum courses. So, so that's a really good example of how, how simple that can be. Hmm. So this is really, really important. No longer can we, can a customer, uh, accept this principle of what I call swivel sharing. So going to one system and getting some information and then spinning their chair around to another system and getting a bunch of information, it's all gotta be together. So what we've gotta do is simplify that, that process for, uh, for our customers, and that's whats Connect does for us. And, and that's why integration is so important and it's important that we make it really, really simple and easy to do.

Speaker 2:

Sounds like a bit of magic. How difficult is it? I'm, I've heard you speak, I want the resilience, I want the security, I want the availability, I want Titus Connect. How difficult is it to switch to a hosting based solution and, and does the fact that a lot of this is built on an open source construct make that a little bit easier to do?

Speaker 3:

We're big open source, uh, advocates, uh, here, and we, we always have been, Moodle is, is a hugely successful open source project, but there's lots of open source technologies that underpin what we see in cloud environments. A lot of our customers have got now infrastructure in the cloud. It's very rare that we find customers who are still using on-premise solutions. But either way, what we've had to do is to become specialists, I suppose, at moving data around. So we've become sort of experts at how do we do that in a secure way and how do we do that quickly? And if you think about it, the data itself is, is effectively what's held in, in a database. And many database technologies are very similar, whether that's Postgres or MyQ, MySQL or, or anything else. So we've had to become experts at doing this. So whether that is tole or whether that is, uh, non to Moodle, moving that data across is now become very much part and parcel of any standard migration. So we use technology to help with that. We use Docker, for example, to build images we use mm-hmm.<affirmative>, Kubernetes to help us with, with version control. But in effect, we are able to move data content, user data from one place to another very, very quickly. And if we're very lucky a customer happens to be in aws, that's even easier because we can do an AWS to AWS data transfer, which keeps things even even more simple. So, so it's recognized as something we would do that's part and parcel of, of an implementation that we would undertake for our, for our customers.

Speaker 2:

Yeah. And so we've done the migration, we're on the hosting platform. Uh, how do you support your customers once you're set up and ready to go?

Speaker 3:

Well, I'm, please say we have a really excellent team of service analysts and we have a service desk operation, which is a single point of contact for our customers, and our customers get access to that. So any queries that they've got, whether that's a change they wanna make, or whether that's a, if something's gone wrong or they need some advice or they want some training or whatever it might be, everything comes through that, that, that model. We have portals that enable customers to engage. Uh, we have telephone services, uh, we have email capability, we have chat capability coming along, but we try to make sure that a customer has, uh, a really good engagement with us. Every customer will receive potential account manager and also potentially even a, uh, we call customer success manager, which we've developed. Um, so we can really try and understand their, their business models. We use a whole bunch of tooling to help us manage requests that come through to the team, and we wrap an SLA around that as well. So we recognize that the service that we provide is, is unique to us. You know, you could argue, well, Moodle's the same, right. No matter who provides it. Well, arguably true, but all these things we've talked about are the very specific differences that a Moodle partner can offer. But importantly, the service wrap is unique. Mm-hmm.<affirmative>. Um, and of course we, we measure that service on behalf of our customers for the length of the, uh, contract.

Speaker 2:

We've gone through a lot of material. Is there anything else you think we really should hear about Titus or Moodle in a scalable construct that you think our listeners would enjoy and, and get to learn a little something through?

Speaker 3:

A lot of what we talked about today often gets forgotten about. You know, we tend to take for granted all the hosting we tend to take for granted, the security, the maintenance, you know, the migration of the data, the content production. We tend to refer to it as, as Moodle and, and how they learn. But all this of course is really, really key. And, and all this has to knit together the part of that solution. I also recognize as well that a lot of our customers have maybe an older infrastructure and they may have a bunch of technical debt and stuff they may have done, and they think, actually we can't really move, we're stuck with what we've got. I think I'd just like to say that I'd urge listeners to, to really think about reaching out to a Moodle partner. We have a fantastic Moodle partner network, whether that's here or whether that's around the world. The Moodle partners globally have answers to these problems. They have skillsets in-house, they have expertise that can really help customers to make the journey over to Moodle and make that journey a success.

Speaker 2:

John, thank you so much. You shared so many of your insights. Um, I know I learned a lot about Titus, in particular in the services that you offer, and I'm sure that our listeners will have learned a lot about Moodle scalability and Titus as well. So we really appreciate it to all our listeners, if you want to find out more about Titus, you can find them at moodle.com/partners. I've been your host, Maria Shore, the global head of products at Moodle hq. Thanks for listening to the Moodle podcast. If you enjoyed this podcast, please subscribe and leave us a review and you can follow us available on any podcast streaming service. We'd also love to continue this conversation on our social media channels, which you can find on moodle.com or in the show notes. So John, thank you again and I wish you a great day. Thanks

Speaker 3:

Everyone.