[MUSIC] [MUSIC] Good morning to the Mini Bear stage, to the opening session on day four. It's nice of you to be so early and attend this session. We will look into the past for present dangers. Welcome Richard for peeking over the tape mode. Wonderful, thank you. And thank you all for being here indeed this bright and early. Enjoy the shade while it lasts. So yes, for the next 20 or so minutes I will be speaking about magnetic tape storage. The title of this talk is Peeking Over the Tape Mode. So the idea is we'll be spending half the time looking at how modern large-scale magnetic tape storage systems work. And then the second half we'll be looking at what this implies for the security. The motivation behind this talk is that we're storing more data now than ever before. And still in this sort of ecosystem of various storage media, tape is still a very central pillar. At the same time, tape is taking on a dual purpose, besides being a cheap and efficient way of storage. It's also becoming a sort of safeguard against many cyber attacks that are happening against organizations. For instance, ransomware, where tape is an important protector nowadays. So before we proceed into the details, why should you listen to me? I don't have this, you know, the long gray beard usually associated with your local tape wizard. Instead I've worked with magnetic tape for only two years, but I've had the pleasure of doing so at CERN, the European Center for Nuclear Research, where I've been, well, in the IT group, specializing in it. This, of course, warrants a small disclaimer. It's not a way, shape, or form organized by, approved by, or associated with the CERN, the organization. This talk does not represent CERN's moral values. This talk is my own. So, no. Yes. Why are we speaking about magnetic tape? Magnetic tape is still very important for data storage. And the data on magnetic tape is oftentimes very interesting. There's a bit more of an overhead usually to using these tapes compared to your disk, which you just plug into your PC. And so people tend to be, you know, curating the data that is on the tape. Tape storages may include, you know, important data that you want to keep for future generations, you want to keep it for the next decade or so. It may also include personal information, you know. Odds are your government is using tape archives somewhere in their storage ecosystem. As do, you know, your friendly hyperscaler cloud providers, either through services which are directly exposed to the customer or in the back, where they keep their, you know, the personal data they've accumulated over the decades. There's also backup data, you know. If you want to do a backup, we're better to put it than magnetic tape storage, which lasts for longer than the life expectancy of a hard drive, and where it is less accessible to mean hackers. Tape is also becoming popular again because it is less bad for the environment. It has a lower impact both energy-wise and for the whole life cycle of the tape and the related infrastructure than spinning disks or other tape storage systems, or other storage systems. So there is a sort of push to use more tape in various institutions because of its environmental friendliness. And then, as I mentioned, tape is becoming this sort of last line of defense against ransomware attacks. Data written onto a magnetic tape cartridge is, in this state right now, inaccessible to an attacker. So if your ransomware attack is going on and you make this thing hard to get to, then you have data that is safe from the whole thing. And, well, it's not just that this is a nice thing for organizations to have as a protection. It's also becoming mandated. So here, for instance, for the European Digital Operation Resilience Act, the DOOR Act, it makes it a requirement for financial institutions to have the appropriate safeguards in place in their system. And then magnetic tape will be one of these. Finally, magnetic tape is still storing a lot of data at the end of the day. Just to hammer this home, here I have some... there we are. Just an estimate from back home at CERN how much data will be needed. So here we see, for the Atlas experiment alone, you just need to look at the y-axis. You see that we are looking at the exabytes, units, for the coming decade. Times this by more experiments, by more institutions which will have even higher requirements, we'll be in set-abytes of data storage. And so we need to be efficient and scrape all the bits or put all the bits on media where we can. So what does a magnetic tape system look like? At scale or at medium-ish scale, it looks something like this. There is data input on the left side, where the data first hits what we call a disk buffer. This is because tape can be fiddly to interact with and most external systems know how to deal with disks instead of tape. So a disk buffer is a nice interface for those. Then the disk buffer servers will be speaking the protocol, the transfer protocol that you want to use. They'll be an intermediate buffer for both data going in and out. And then that makes the whole operation smoother. It's generally a bad idea to write to tape systems directly through the Internet. So this is a nice buffer that smooths things out. However, this is a filthy disk system, so we won't be speaking about it anymore. So for the rest of this talk, we won't dedicate more time to it. After the disk buffer, the data goes to the tape server and the tape drive. The tape server is the intermediate for connecting a drive to your local IT system. So it'll be connected to the Internet or the local network at least. It will have the appropriate drivers. It will have some disk storage as well to also buffer the files going in and out locally. And then it will have the tape drive connected somehow, usually as a SCSI device and by way of fiber channel for the data transfer. The tape drive is responsible for writing to these things. So as you may note, in a normal hard drive you have both your storage media and the reading infrastructure in one unit. For tape this is not the case. So the drive is a separate I/O device. A tape drive is housed within the tape library and as are the cartridges. The tape library is a container. It can have many shapes. It can be round. It can be container shaped. And inside of there you have shelves of these magnetic tape cartridges and the drives for reading. In addition to that you have some robotics for moving the cartridge to the drive and getting them into contact. And yes, so now we can look at these components a bit more in detail individually. The media. The media or tape cartridge is what in the end stores your data. It is, if you open one up, well it's a plastic container and inside you have this wonderful tape all coiled up, which is composed of magnetic media and some polymers to keep it all together. The media comes in two flavors, Enterprise and LTO. Enterprise is the more expensive option, usually a bit more feature rich and more capacity rich. And then LTO or Linear Tape Open, which is a more open standard by the LTO consortium. It's usually more focused on cost effectiveness, so a bit cheaper, but you get more bang for your buck. Around this time, the time of writing or speaking, a tape cartridge of the present generation holds about 18 to 20 terabytes of data each. And it has this nifty property that this doubles every couple of years. So whereas your disk storage at this time is sort of struggling to scale because it has reached its paramagnetic limit, where the bits are becoming so small that it's really hard to keep them magnetized and to have the technology around it to actually magnetize more stable media, Tape Media still has lots of easy victories ahead. The bits are relatively big in contrast to disk or optical storage, but it makes up for that by virtue of having a big surface area. So the band inside the tape cartridge is one centimeter wide, but one kilometer long, which gives it a lot of surface area to work with. This however introduces a cost in that it may take time to seek to your files. So we'll get back to that. So once you actually have the drive mounting the tape and are reading or writing it, it takes some time to get to where you want to go. The media may be formatted, similar to how you would format a disk file system. The baseline sort of is the LTFS file system where your tape is sort of like a disk file system with all the metadata attached. But at scale you will find lots of systems which don't use this. This means that the metadata such as the file name will not be on the tape itself, which also has implications. Moving on to the tape drive. As mentioned, it's the input/output device used for reading and writing tapes. These also come in the flavors of Enterprise and LTO, Enterprise being more feature rich and LTO being more bang for your buck. And also they come into shapes. They come in full height and half height. Full height is the sort of standard size that you would put into your tape library. And half height is the sort of smaller desktop or desk sized edition. It can go up to 400 megabytes per second, about there about at present. But this is under ideal conditions. If you write your data stupidly and you need to seek back and forth, you will of course incur a great cost to your read times. The drive is connected to the tape server as mentioned. This is also known as the data mover in the lingo. And yes, connected by SCSI. The drive is capable of performing encryption and compression. Which makes it nice so then you don't need all that hardware in the tape server. You can offload that to the drive. What does writing to tape look like? It's something like this. This won't be on the test, don't worry. You mount a tape drive onto your tape server. You get these ST0, ST1, NST0, NST1 and so forth devices showing up. These are your destinations or your ways to interact with the drive. For bigger systems you would then have some sort of command to get the correct cartridge from your library to your drive. And then you can operate it on it using actual tools or Linux tools which may or may not already be present on your machine. Fun fact, TAR was used to interact with tape storage. The MT command is maybe not installed on your laptop but it's easy to install on most distros. This allows you to interact with the drive, to read its status, to load tapes into an active state and so forth. Yes. Do note for the two devices, one of them rewinds, one of them doesn't. So once you have written a file to your tape, STX will go back to the start of the tape. So if you go, I'll write my file number one, thank you. And then later on go, I'll write file number two, it will overwrite the first one. So beware. And then there are some more examples on Red Hat site and on IBM's pages for more details. The library, very quickly done. It's the physical container. The library maintains the air gap between the reading and writing device. This is what gives you this nice property of defense against, for example, ransomware attacks. The attacker has more troublesome time getting to your storage than the disks which have to be connected all the time. There may be robots inside, usually one. There may be two for redundancy. And these, together with the rest of the internals, all communicate by way of CAN bus or ethernet, depending on the model of library. Very important, a library may be partitioned into logical libraries, similar to how you would partition your disk. So a logical library would show up to your software as a separate library to interact with. So then you can split your physical one into multiple logical ones which is useful, as we will see. Alright, finally, security, the part you've been waiting for. What does this all mean for the security of the tape system? Overwriting takes time. As we've mentioned, seeking takes time, getting to the right place, also just getting the tape from the library to the drive takes time. There's usually a queuing system because you have not so many drives in contrast to how many tapes you have. So it's a system of patience. You may have to wait a day to get to your data. Conversely, the deletion process is not as you would expect it, perhaps. You may know that on a disk you may delete a file and then it's still like someone scraping the data from the disk and after the fact may recover it, depending on how you've actively been writing. It's similar on tape. Usually the deletion process is just a metadata operation. So wherever you keep your metadata is where you just do the deletion and you don't actually touch the tape. That means the data is still on the tape. So if someone steals the tape after the fact, after you thought you deleted everything, data might still be fine. To remedy that, it's important, use encryption by default on your tapes so that if someone runs off with them, they still can't read them. Also use this key management system called Leather. I don't have the time to go into the details, but it's basically a neat way to organize encryption keys so that you can on a more fine-grained level delete files by just deleting encryption keys, which is then compliant with, for example, GDPR and other data management systems. Also be sure to wipe and destroy used tapes. Don't just throw them in the trash. Someone might be able to recover them. In addition, if you overwrite the start of the tape, your data may be completely gone. I mentioned this NST0 and NST, these devices which may rewind and not rewind. If you by accident go, "Oh, I'll write my 18 terabytes of data and now I'll just add another file and whoops, I wrote them to the wrong device. It's now at the beginning." You've overwritten the tape. The data is no longer accessible to you because these tracks on the tape media don't line up. They will no longer line up and so without special equipment you will not be able to read any successive data on the tape without special equipment, which you can get to you by your local three-letter agency or which you will have to ask your tape manufacturer or distributor for, which is usually a costly operation. An attacker may exploit this by very quickly just overwriting as many tapes as they can, so be aware of that. It just takes the time to queue in the system, the time to mount the tape into the drive and then just to write a few bits. Remember to protect yourself against that by restricting access, of course, as much as possible. Monitor access patterns, so log and monitor the log files for suspicious activity. Is someone mounting 2,000 tapes in my library and just writing the first bits? That's odd. Beware of that. Also, put in logical libraries with read-only access. I mentioned these partitionings, these logical libraries. You can put, for instance, replicas of your tapes. If you have data on one tape and you want to be really sure that it's safe, you create a double copy. Put the replica into one of these logical libraries, which in the tape library does not have a drive associated with it. Then you suddenly have a separate partition where the data cannot be easily read from or overwritten. Tapes can be moved in between usually by way of special administrator commands or command line tools. I believe REST APIs are on their way as well, so this will be the way to protect against sudden overwrites. The patient attackers encryption key switch attack. Tape is often used for backups where you will have to encrypt for your users. A patient attacker can compromise a host and then just wait, observe. Then they may switch the encryption keys if they find them. Maybe you have put the encryption keys onto a file on disk or you have put the encryption key into a database. The attacker can switch it. Just, you know, they look at the cron job, they see the backup starts at 1.45 each day. We switch the key just before and just after so that the backup takes place with the wrong encryption key. Do that for the duration of the data retention policy and then suddenly if an incident happens, if an attacker pulls the switch on the ransomware attack, then the backups are inaccessible to the horror of the system administrator. So to remedy that, try to outsource key management to your users. It's nice to not centralize everything in your tape service to the best of your abilities. If the software you use for the backups allows the user to manage the keys, that's great. Also monitor, of course, make your users take samples of their backups, read them back, and do the same yourself for the case where you manage the keys. And then finally distribute your architecture. If you don't make everything central, then it is certainly easier to avoid attacks on these files. Finally, one more. As we mentioned, you may have tape which is not self-describing, so your metadata may not be on the tape. What does this mean? It may mean that the metadata is on a central database somewhere. The central database is a very juicy target for an attacker. So remember to back it up and make sure you are not keeping the backup of the database. If you are speaking to your local database guy and ask him what's your database policy, don't worry about it, we put it on tape, they will say. And if your tape is then not self-describing, you will not be able to recover in the time of an incident. So that puts me at the end. To conclude, it's good to understand tape systems because they are still part of our mix of data storage media. Attackers may understand these by just observing, so you should understand them too. And then I move on to questions. You can ask me questions here, or later on I will be at the Cold North area most likely, and I have about ten of these souvenir CERN LHC tapes for the interested people to claim. Thank you for listening. Thank you, Richard. Okay, there's a microphone. One or two questions. Thank you for your tape and for a blast from the past, which I understand is also the future. You talked about an act, a European act, that requires using tapes, which is a nice thing because tape is a great media that has been forgotten. However, do you know if large companies have actually started implementing it and going back to tape? I think they never left tape actually. Sorry to hear that. You should tell your management about the security benefits. But tape is for sure used still at scale. I know that CERN used to be big players in the market back in the day. Now we have been sort of eclipsed by the hyperscalers. Before we were the big buyers for the vendors of tape, IBM, the producers of our Sony and Fujifilm nowadays. But the use of tape has probably never been greater. It's just that as more people move to the cloud, I suppose they implicitly use it through the cloud services more and more. Of course, maintaining a tape system by yourself is hard. There's lots of proprietary software around. And robots keep breaking. Exactly. And you have to take responsibility for the breakage. We're trying to reduce the cost somewhat by, well, we develop our own open source software for the data management on tape. So this reduces the cost of running it yourself. But yeah, I see for medium sized companies it can be tough. Thank you Richard for your view on our tape storage solutions. One more time? One more? No. Alright. Speak to me after. Thank you all. [Applause] [Music]