DiscoverSpectrumCathryn Carson & Fernando Perez, Part 2 of 2
Cathryn Carson & Fernando Perez, Part 2 of 2

Cathryn Carson & Fernando Perez, Part 2 of 2

Update: 2014-04-18
Share

Description

Cathryn Carson is an Assoc Prof of History, and the Ops Lead of the Social Sciences D- Lab at UC Berkeley. Fernando Perez is a research scientist at the Henry H. Wheeler Jr. Brain Imaging Center at U.C. Berkeley. Berkeley Institute for Data Science.


Transcript


Speaker 1:        Spectrum's next. 


Speaker 2:        Mm MM. 


Speaker 3:        Uh Huh [inaudible]. 


Speaker 4:        [00:00:30 ] We'll come to spectrum the science and technology show on Katie l x Berkeley, a biweekly 30 minute program bringing you interviews featuring bay area scientists and technologists as well as a calendar of local events. 


Speaker 3:        [inaudible].


Speaker 1:        Hello and good afternoon. My name is Renee Rao and I'll be hosting today's show this week [00:01:00 ] on spectrum present part two of our two part series on big data at cal. The Berkeley Institute for data science bids is only four months old. Two people involved with shaping the institute are Catherine Carson and Fernando Perez. They are today's guest Catherine Carson is an associate professor of history and associate dean of social sciences and the operational lead of the social sciences data lab at UC Berkeley for Nana Perez is a research scientist at the Henry H. Wheeler [00:01:30 ] Jr Brain imaging center at UC Berkeley. He created the iPod iPhone project while he was a graduate student in 2001 and continues to lead the project today. In part two they talk about teaching data science. Brad Swift conducts the interview 


Speaker 5:        on the teaching side of things. Does data science just fold into the domains in the fields and some faculty embrace it, others don't. How does the teaching of data science move [00:02:00 ] forward at an undergraduate level? Yeah, there there've been some really interesting institutional experiments in the last year or two here at Berkeley. Thinking about last semester, fall of 2013 stat one 57 which was reproducible collaborative data science pitched at statistics majors simply because you have to start with the size that can fit in a classroom [00:02:30 ] and training students in the practices of scientific collaboration around open source production of software tools or to look at what was Josh Bloom's course, so that's astro four 50 it's listed as special topics in astrophysics just because Josh happens to be a professor in the astronomy department and so you have to list it somewhere. The course is actually called Python for science 


Speaker 6:        [00:03:00 ] and it's a course that Josh has run for the last, I think this is, this was its fourth iteration and that course is a completely interdisciplinary course that it's open to students in any field. The examples really do not privilege and the homework sets do not privilege astronomy in any way and we see students. I liked her a fair bit in that course as a guest lecture and we see students from all departments participating. This last semester it was packed to the gills. We actually had problems because we couldn't find a room large enough to accommodate. So word of mouth is working. In terms of students finding these [00:03:30 ] courses, 


Speaker 5:        it's happening. I wouldn't say it's working in part because it's very difficult to get visibility across this campus landscape. I am sure there are innovations going on that even the pis and bids aren't aware of and one of the things we want to do is stimulate more innovation in places like the the professional schools. We'll be training students who need to be able to use these tools as well. What do they have in mind or there [00:04:00 ] are other formats of instruction beyond traditional semester courses. What would intensive training stretched out over a much shorter time look like? What gaps are there in the undergraduate or graduate curriculum that can effectively be filled in that way? The Python bootcamp is another example of this that's been going on for 


Speaker 6:        for about four years. Josh and I teach a a bootcamp on also python for data science that is immediately before the beginning of the fall semester. Literally the weekend before [00:04:30 ] and it's kind of, it's a prerequisite for the semester long course, but it's three days of intensive hands-on scientific bite on basically programming and data analysis and computing for three days. We typically try to get a large auditorium and we got 150 to 200 people. A combination of undergrads, Grad Students, postdocs, folks from LVL campus faculty and also a few folks from industry. We always leave, leave a few slots available for people from outside the university to come and that one a has been very popular at [00:05:00 ] tends to, it's intense to have very good attendance be, it serves as an on ramp for the course because we advertise the in the semester course during the bootcamp and that one has been fairly successful so far and I think it has worked well. 


Speaker 6:        We see issues with it too. That would be that we would like to address three days is probably not enough. Um, it means because it's a single environment, it means that we have to have examples that are a little bit above that can accommodate everyone, but it means they're not particularly interesting for any one group. It would be, I think it would be great to have [00:05:30 ] things of this nature that might be a little bit better focused at the life sciences and the social sciences that the physical sciences, so that the examples are more relevant for a given community that may be better targeted at the undergraduate and the graduate level so that you can kind of select a little bit in tune the requirements or the methodological base a little bit better to the audience. But so far we've had to kind of bootstrapping with what we have. 


Speaker 6:        There's another interesting course on campus offered by the ice school by Raymond Lecture at the high school called working with open data [00:06:00 ] that is very much aimed at folks who are the constituency of the high school that have an intersection of technical background with a broader interdisciplinary kind of skills that are the hallmark of the high school and they work with openly available data sets that are existing on the Internet to create basically interesting analysis projects out of them and that's of course that that I've seen come up with some very, very successful and compelling projects at the end of the semester 


Speaker 7:        about the teaching and preparation in universities. In [00:06:30 ] the course of doing interviews on spectrum, a number of people have said that really the only way to tackle sciences interdisciplinary, the big issues of science is with an interdisciplinary approach, but that that's not being taught in universities as the way to do science. Sarah way to break that down using data science as a vehicle. 


Speaker 5:        I can speak about that as a science and technology studies scholar. The practice of interdisciplinarity, what makes it actually work is one of the [00:07:00 ] the most challenging social questions that can be asked of contemporary science and adding into that the fact that scientists get trained inside this existing institution that we've inherited from let's roughly say the Middle Ages with a set of disciplines that have been in their current form since roughly the late 19th century. That is the interface where I expect in the next oh two to five decades major transformations in research universities. [00:07:30 ] We don't yet know what an institution or research institution will look like that does not take disciplines as it sort of zero order ground level approximation to the way to encapsulate truth. But we do see, and I think bids is like data science in general and an example of this. We do see continual pressure to open up the existing disciplines and figure out how to do connections across them. It's [00:08:00 ] not been particularly easy for Berkeley to do that in part because of the structure of academic planning at our institution and in part because we have such disciplinary strengths here, but I think the invitation for the future that that word keeps coming back invitation. The invitation for the future for us is to understand what we mean by practicing interdisciplinarity and then figure out how to hack the institution so that it learns how to do it better. [inaudible] 


Speaker 8:        [inaudible] [00:08:30 ] you're listening to structure fun. K A, l ex Berkeley Fasten Kirsten and Fernando Perez are our guests. They're part of the Berkeley Institute for Data Science for Bids [inaudible] Oh, 


Speaker 6:        it seems that data science has an almost unlimited [00:09:00 ] application. Are there, are you feeling limits? I don't know about limits specifically because I think in principle almost any discipline can have some of its information and whatever the concepts and constructs of that discipline can probably be represented in a way that is amicable to quantitative analysis of some sort. In that regard, probably almost any discipline can have a data science aspect to it. I think it's important not to sort of [00:09:30 ] over fetishize it so that we don't lose sight of the fact that there's other aspects of intellectual work in all disciplines that are still important. That theory still has a role. That model building still has a role that, uh, knowing what questions to ask, it's still important that hypotheses still matter. I'm not so sure that it's so much an issue of dr

Comments 
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Cathryn Carson & Fernando Perez, Part 2 of 2

Cathryn Carson & Fernando Perez, Part 2 of 2