Start / One CA Podcast / 1 jon may artificial intelligence for ha dr operations lorelei

1: Jon May: Artificial Intelligence for HA/DR Operations - LORELEI

41 min • 8 april 2018

Please welcome Jon May, Research Assistant Professor of Computer Science at the University of Southern California.

Dr. May describes his work on a DARPA-funded artificial intelligence project called Low Resource Languages for Emergent Incidents (LORELEI) and its connections with HA/DR operations for Civil Affairs.

One CA is sponsored by the Civil Affairs Association.

Hosted and edited by John McElligott.

---

Transcript

00:01:00    Introduction
and welcome to the 1CA podcast. My name is John McElligot. We're joined today by Jonathan May. He received his PhD in computer science from USC in 2010. Prior to rejoining USC and the Information Sciences Institute in 2014, he was a research scientist at SDL Language Weaver. John's researching areas include language, a natural language processing, specifically machine translation and semantic parsing. and formal language theory. Dr. May, thank you very much for your time. Thanks very much for having me. It's great to be here. Sir, before we dive into the program that you're working on and how it relates to humanitarian assistance and disaster response and civil affairs branch of the military, we want to go through some of the basics of what your field entails. So if you could go into more detail about your background and the natural language processing field. Sure, great. I was a computer science major in college, and I started to become very interested in artificial intelligence.

00:02:09    SPEAKER_04
intelligence. I thought it was really cool that, you know, we could build systems that could, you know, try to be, you know, mimic the brain sort of, or play games against humans. And in particular,

00:02:23    SPEAKER_04
I like the idea of, I discovered this field called natural language processing. which is really about how humans and computers can talk to each other, really how computers can understand human language and then produce human language and everything that that entails.

00:02:44    SPEAKER_04
And today you see a lot of natural language processing, or it's also sometimes known as computational linguistics, in your day -to -day life. So if you're just using, say, Google and typing a search query there, you're just... You're using your own words to try to figure out what you want,

00:03:00    SPEAKER_04
want, and then a computer algorithm somewhere is trying to find a web page that's responsive to you. So that's natural language processing right there. Other areas are determining when you spelled a word wrong. A kind of classic example is Siri, who's listening to you speaking,

00:03:19    SPEAKER_04
to you speaking, understanding the speech patterns and turning those into words and understanding what those words are supposed to mean and then trying to give you an answer. automatic translation, which is, you know, where you've got some Chinese webpage and you want to figure out, you know, what does this mean?

00:03:36    SPEAKER_04
You know, maybe it's a train ticket booking page. You need to figure out how to buy your tickets and they don't have the data. Somebody didn't write a translation, so you have to automatically translate these words. And then you can actually engage in commerce there, even though they don't speak your language and you don't speak theirs. So I love all that stuff. It really is. It seems to me like a great way to, particularly translation, to unify the world. So we're all kind of speaking one language together. And yeah, there's lots of great accomplishments that have happened over the past 20 years or so. And I think there's a lot more still to be done. It seems to be a field that's advancing at a rapid pace right now. Yes, yes. In particular, the field has really been around about as long as computers have been around. Pretty much, you know, the early development of computers that were at the end of the Second World War were first used for calculating missile trajectories,

00:04:31    SPEAKER_04
of the Second World War were first used for calculating missile trajectories, but then the second use was trying to do automatic translation. In particular, like in the early 50s, the U .S. was particularly keen, of course, on translating Russian. And this was way back when, but it wasn't very good for a very long time. But in the modern era, we have... volumes of data available to us and really sophisticated fast hardware that's able to process this data and so we're able to take advantage of all this data and learn statistics about the data to help us that have led to lots of gains really practical gains and in the past say five to seven years in particular

00:05:21    SPEAKER_04
You've probably heard about these advent of deep learning, which is the use of this particular kind of technology called neural networks. And they have really led to some really stunning developments. Now, sometimes it can be hard to tell whether you're talking to a computer or a human. Wow. And so it's fascinating. And I wanted to ask you about a question that... was included in a brief that you had provided to some civil affairs troops recently. The question was, can we leverage artificial intelligence or AI to respond to disasters around the world? What inspired you to ask that question? I want to give credit to DARPA for really asking that question before I did. But I saw,

00:06:07    SPEAKER_04
well, I think they saw, and we all saw it together. I was working for... this machine translation company after graduation in 2010.

00:06:16    SPEAKER_04
And I remember, so this was a company and we were providing translation, many different kinds of languages to companies and also for some government projects and also to help human translators actually do their job better. And I remember there was the earthquake, I believe, in Haiti. And it was a big humanitarian crisis. Most of the people in Haiti, of course, speak Haitian Creole, which isn't a language that we've historically spent efforts on trying to build automatic translation systems for. There's not a lot of data. There's not too many people that actually speak Haitian Creole, the population of Haiti,

00:06:56    SPEAKER_04
which is relatively small. But I asked my boss at the time, I said, you know, is there anything that we could do? I feel like maybe we could be of some service. And he said, well, I don't think there's much we could do. I mean, you know, these people are in a crisis situation right now. And it takes us quite a bit of time to gather enough data to build a system. And even building the systems takes some time. And by the time we're ready to deploy a translation system to maybe connect, say, USAID providers with the people on the ground who are maybe texting out their requests. It's going to be too late. So we didn't do anything, but there were people who did. And there was a program where they went down,

00:07:37    SPEAKER_04
they went down, and there was a team of people who did what I do. But they also brought in native Haitians, expats, and they were trying their best to use what technology they could and also just kind of scramble to translate these things as fast as possible. But it was kind of like it would have been better if they prepared this sort of thing ahead of time.

00:08:00    SPEAKER_04
Well, prior to that, we had done, I worked on a team, I think, back in, I want to say, 2003. And we were looking into, you know, if we needed to develop a system in a new language for translation or for, sometimes translation is fine, but you actually typically get lots and lots of data thrown at you all at once. I think analysts can receive, you know, tens of thousands of documents that they have to sift through a day. And just translating them all is not really necessarily going to be that great. There's other techniques that are part of natural language processing, which is understanding the most important parts of a document, trying to provide a summary, or just identify the names of the people, the places, and maybe the events that are happening in a big picture to allow some triage to happen. So we wanted to know, could we build those systems? If we just learned about a language, and somebody said, okay, go, build a system, what could you do in 30 days? And back in 2003, we tried doing this.

00:08:58    SPEAKER_04
doing this. And I was really kind of taken by how surprisingly well we were able to do with the language at the time, the Cebuano,

00:09:06    SPEAKER_04
which is... Where is that spoken? I think it's in the Pacific, in the Pacific Islands region,

00:09:15    SPEAKER_04
and I should look that up. Give me a second,

00:09:19    SPEAKER_04
if that's all right. Maybe Papua New Guinea or someplace like that? So, I'm sorry. The Philippines. is spoken in... Yes, it's an Austronesian language, so it's native to the Philippines. It's the second most spoken language in the Philippines after Tagalog.

00:09:40    SPEAKER_04
It should have been fresh around now. But anyway, yes, so it's spoken in the Philippines. But I hadn't studied it before, and most of our team hadn't. And, you know, we did a pretty good job. It was kind of surprising how well we were able to do without too much specific Cebuano data, and we didn't talk to any Cebuano experts. And so this kind of, I think this idea was sort of stirring around, and then after 2010, at DARPA they came out with this program, which was about,

00:10:12    SPEAKER_04
the name of the program was called Lorelei, and it was about trying to be responsive to the humanitarian aid and disaster relief needs when you don't have a lot of resources available. in terms of data and in terms of time. So given very limited data in the language that you need to build a system for and given a very limited amount of time,

00:10:34    SPEAKER_04
very limited amount of time, really ideally 24 hours is what they're aiming for. What kind of systems can we build? What kind of technology can we build? And so that's been a major focus for me and for a number of researchers actually around the world over the past few years. And it's been great because we really... We get to work with people who speak the language but aren't experts in linguistics or experts in computer science,

00:10:57    SPEAKER_04
speak the language but aren't experts in linguistics or experts in computer science, and they teach us about their language in this really limited time frame. And we're able to build surprisingly sophisticated systems. It was surprising to me at first, actually. And, you know, if you have a little more time, you do a little better, but when you don't have a lot of time, you can still do pretty well. I think there's also been some nice interest in deployment. in various agencies. So it's been a pretty nice story.

00:11:28    SPEAKER_04
story. Right. Yeah, I think 24 hours is very fast for anyone, but especially for civil affairs and for the military, unless we happen to be on the ground or in country already, if there was a natural disaster or outbreak or some kind of man -made event, it would take a little bit longer for most teams to respond. But if USAID or some other assets were already, you know, on their way as a Dart team, for example, then we would be coordinating with them and having a system like this in place would be very helpful. Well, it's really great to hear that 24 hours is a little too fast because, to be honest, if you wait a week, it's a lot better. So, you know, we can do some early triage, but then actually the more we... The more we see how we're doing at the beginning, the better our systems can get. So in our early days, we did give ourselves up to a month. And by the time you're done with a month of training, you've actually got a fairly usable system. It's still not at the same level as, say, like a French -English translation system where we've got 100 billions of words of French and English, and we've been studying that problem for years and years.

00:12:47    SPEAKER_04
We do pretty well, and we learn more insights on the language over the time, too. So our first year, we were working with Uyghur, which I'm actually kind of pronouncing wrong. I think it's more like Uyghur. But this is a language that's spoken in China, in the Xinjiang region, which is in the northwest. So it's spoken by an ethnic minority. It's a Turkic language, actually. It has no relationship to... to Mandarin. And it's, you know, so we were working with Royale and we realized after a few days,

00:13:22    SPEAKER_04
maybe a week of working with it, that hey, you know, this language is actually quite similar to some language that we've already got data for. And we had a lot of Uzbek data. And so we were able to develop techniques for pretending that the Uzbek was Uyghur and actually transforming the Uzbek into Uyghur. And now... increases the amount of data that you've got available. And this is kind of a major part of this program, is trying to look around and see, you know, even though you don't have a lot of resources in the language that you care about, if you have a lot of resources in other related languages, you can figure out what those related languages are. Can you leverage those? Right. And furthermore, you know, there's, to some degree, all languages have things in common, right? So even though...

00:14:09    SPEAKER_04
Chinese and English might seem very, very far apart from each other, and in many ways they are. There's still kind of common understandings that underlie all languages, and you can take advantage of these things too. So there's kind of like language universal ideas.

00:14:25    SPEAKER_04
So if you have a bunch of news data, say, and it's in some language, you don't know this language at all. Maybe you're not even told what the language is. You can still assume that people are probably going to be talking, at some point about dates, right? You know, days of the week or months or years. Right. And, you know, we do tend to have, to segment our...

00:14:51    SPEAKER_04
calendar into, you know, roughly four -week chunks. And so there's, you know, about between 28 and 31 days in every month. And so you can kind of pick up on these common regularitie

Senaste avsnitt

1: Jon May: Artificial Intelligence for HA/DR Operations - LORELEI

Senaste avsnitt

238: The Diamond Game: Dennis Cosgrove on Strategy, Service, and Security

237: Disaster Management by Ian Thigpen

236: State Department Reforms by Dan Spokojny, FP21 (Part 2)

2025 CA Symposium and call for papers

235: State Department Reforms by Dan Spokojny, FP21 (Part 1)

234: Wes Aniton / Tedd Lipka on USMC Civil Affairs Reserves

233: Patrick Jenevein, Dancing with the Dragon

232: Lee Bratcher on Blockchain and national security

231: Adam Zivo on the Serbia protests

230: James Turner on Information Operations in Competition - Part II

229: James Turner on Information Operations in Competition - Part I

228: Stephanie DeRiso on Cultural Support Teams

227: Boyan Pancevski, Russia's war with the West (Part II)

226: Bojan Pancevski, Russia's war with the West (Part I)

225: USMC 1st CAG Civil Affairs

224: Jon Shaffner, South American governance and security

223: Giancarlo Newsome on Innovation in Civil Affairs

222: Why Should We Care about the Indo-Pacific

221: Veterans Without Orders

220: Ali Maisam Nazary: The Taliban Resistance Movement

219: Nick Dubaz, Civil Affairs in Africa

218: Gretchen Peters, Winning Against Global Networks (Part II)

217: Gretchen Peters, Winning Against Global Networks (Part I)

216: Mic Mulroy: Fogbow and Veteran Humanitarian NGOs

215: Ismael Lopez on OHDACA and Humanitarian Relief (Part II)

214: Ismael Lopez on OHDACA and Humanitarian Relief (Part I)

213: Colleen Ryan on OSCE and European Border Security

212: Christopher Meyer on PRC strategic corruption and political warfare (Part II)

211: Christopher Meyer on PRC strategic corruption and political warfare (Part I)

210: Andreas Eckel on NATO Civil Military Cooperation

209: Kurt Dykstra and Joshua Weikart on joining Civil Affairs

208: Grant Newsham on the Japan Defense Forces and PRC threat (Part II)

207: Grant Newsham on the Japan Defense Force and PRC threat (Part I)

Holiday Replay, 163: Sam Cooper on China political and Economic Warfare

Holiday Replay, 171 Civil Military What?

Holiday Replay, 179: Civil Affairs Innovation with Colonel Brad Hughes, part II

Holiday Replay, 178: Civil Affairs Innovation with Colonel Brad Hughes, part I

Holiday Replay, 160 Curtis Fox: Part I, Russian Hybrid Warfare

206: One CA Year in Review Part II

205: One CA Year in Review Part I

204: Rocco Santurri on Korea stabilization

203: Review of The NATO CIMIC Conference

202: Andrew Gonzalez on Marine Civil Affairs in the Pacific (Part II)

201: Andrew Gonzalez on Marine Civil Affairs in the Pacific (Part I)

200: Jörg Grössl on the NATO Civil-Military Cooperation Centre of Excellence

199: Jeffrey Fiddler and the U.S. Gaza Relief Mission

198: David Luna, State-sponsored criminality in strategic competition

197: Scott Mann "Nobody is Coming to Save You"

196: Jeffrey Fiddler on the DOD response to COVID 19

195: Cleo Paskal on PRC operations in Guam

194: Doug Stevens on faith-based diplomacy

193: Patrick Alley on Global Influence (Part II)

192: Patrick Alley on Global Influence (Part I)

191: Drew Biemer on Energy Sector Civil Affairs

190: Pavlo Kukhta on Ukraine Reconstruction

189: Phillip Smith in discussion with Brian Hancock

188: Part II, Mickey Bergman on Diplomacy in the Shadows

187: Part I, Mickey Bergman on Diplomacy in the Shadows

186: Major Gustavo Ferreira testifies at the U.S. China Economic and Security Review Commission.

185: Scott Mann, Life After Afghanistan

184: Megan O’Keefe-Schlesinger on Information Operations. Part II.

183: Megan O’Keefe-Schlesinger on Information Operations Part I

182: Natacha Ciezki, from Zaire to America

181: Proxy Wars, by Pawel Bernat, Juneyt Gurer, and Cyprian Kozera

180: Sandor Fabian: Europe is Learning the wrong lessons from the conflict in Ukraine

177: Patrick Passewitz on the Sicilian Model

176: Part II, interview with J. David Thompson

175: Part I interview with J. David Thompson

174: Direct Commissions with Heater Cotter

173: Achieving post conflict stabilization with Prof. Beatrice Heuser (Pt.2)

172: Achieving post conflict stabilization with Prof. Beatrice Heuser (Pt.1)

170: Combat First Aid in Ukraine by Michael Baker

169: Part II, Bas Wouters on Influence and Persuasion

168: Part I, Bas Wouters on Influence and Persuasion

167: Electronic Warfare with Michael Gudmundson

166: Ilya Zaslavsky on Alexei Navalny and Russian Political Dissent

165: Part II of the Courtney Mulhern and Dan Joseph interview

164: Part I, Courtney Mulhern and Dan Joseph on the book "Backpack to Rucksack"