Bringing Genomic Medicine into Focus by Eric Green

>> Welcome, everyone, to the 2013 Penn State
lectures on the frontiers of science. These lectures are a free mini course for
the general public that we were pleased to be able to provide to you through the generosity
of the financial support of the Penn State Eberly College of Science, and also through
the generosity of our speakers who volunteer their time to give these lectures to you. The theme of our lecture series this year
is “Your Genes: How They Contribute to Who You Are.” Today’s event is the third lecture in this
year’s series, and there will be six lectures in all in this year’s series. Our speaker today is Dr. Eric Green. He is the Director of the National Human Genome
Research Institute at the National Institutes of Health. An award-winning scientist and a highly accomplished
research director, Dr. Green was at the forefront from start to finish in the game-changing
Human Genome Project, which now has grown to encompass the structure, function, and
evolution of genomes. He also is an influential scientific leader
as a series editor of a widely-used laboratory manual of genome analysis, as the coeditor
of the Annual Review of Genomics and Human Genetics , and is a founding editor of a journal
called Genome Research , and as the holder of director-level positions at the National
Institutes of Health since 1996. The title of his lecture today is “Bringing
Genomic Medicine into Focus.” Please let’s give a warm welcome to Dr. Eric
Green. [ Applause ] >> Dr. Eric Green: Well, thank you very much
for that introduction. It’s a pleasure to be here, and I’m delighted
to see so many of you come out on a Saturday morning — especially a brisk, cold Saturday
morning to learn about genomics and to give me a chance to tell you a little bit about
this exciting developments that are taking place. Now, I would say, just starting out in thinking
about genomics and its relevance, I would tell you that, you know, for a long time the
field of genomics — which I’m going to describe to you — was very much in the purview of
biomedical researchers. This was a research endeavor, a research exploration. I think, increasingly in recent years that’s
expanded, and we are seeing genomics as I’ll describe to you becoming relevant to healthcare
professionals — actually, almost all healthcare professionals. But the reason I’m so delighted to see all
of you in the audience is that what I want to describe to you is the fact that genomics
is increasingly becoming relevant to patients and friends and relatives of patients, which
means it’s becoming relevant to all of us. So what I want to do during the course of
this lecture is to really cover three major things that will describe to you this landscape
of human genomics, which is both exciting and challenging at the same time. In doing so, I want to start by describing
a little bit of the past; I think some historical perspective will be very important. I want to frame the major part of my talk
around the present that I’ll describe as a framework that was provided to you — if you
didn’t get when you came in, you can get when you’re going out — a recent strategic plan
for genomics that we published a couple of years ago that I think very much organizes
our thinking about where we are in genomics. And then at the end of the talk what I want
to do is to take you into the future and to share with you some of my thoughts on where
I think all of this is leading. Well, now, recognizing a very heterogeneous
audience — and I realize from what I’ve been told about these gatherings is that people
come with different types of backgrounds — I realize not all of you are trained in genetics
and genomics, and so we have to start somewhere. I want to provide a foundation for everyone. Let’s just think about a few basics. The human body are all living system made
up of cells. The human body has about ten trillion cells. Of course, what orchestrates those cells is
a blueprint, if you will; it’s the human genetic blueprint that is carried in the nucleus of
all ten trillion or so of those cells. Interestingly, each of those different cell
types use that blueprint in different ways. That’s why we have brain cells and we have
blood cells but they’re all have in their nucleus the same exact genetic blueprint called
the “genome.” In the nucleus of those cells are these structures
called “chromosomes.” And it is of course these chromosomes that
are basically the suitcases that carry our genetic material from one cell to the next,
from one generation to the next. And the information molecule that is responsible
for carrying that biological information — that blueprint if you will — of course is this
incredibly interesting molecule called “DNA.” Now, if I had to think about the history,
I would say that probably the most important historical moment that really set up what
I’m going to describe to you in genomics probably would have to go back to 1953 because it was
Jim Watson and Francis Crick discovering the structure of DNA, this double helical structure,
which has become the icon for DNA that was really pivotal in gaining an understanding
about how it was that DNA was the information molecule, the blueprint molecule if you will,
that encode all the information necessary for life. And in fact, it was that discovery of DNA’s
double helix that really set into motion a whole series of scientific discoveries in
the decades that followed that in many ways set up everything I’m going to describe to
you today. For example, we learned in the 1960s how was
it that DNA encoded information necessary for making proteins. Proteins are the brick and mortars of our
cells. And we now know from what we learned — the
translation that was called the genetic code — how it was that the letters within DNA
actually encoded information for making those proteins, those building blocks, the brick
and mortar. The 1970s and the 1980s saw advances in our
ability to study DNA in the laboratory and particularly the 1980s brought about the molecular
biology revolution, where we learned how to clone DNA, how to isolate DNA in the laboratory
and be able to manipulate it in ways to be able to study how it was that DNA actually
functioned. And along the way we also figured out the
kinds of methods that would allow us to determine the actual composition and sequence of the
building blocks within DNA, we learned how to sequence DNA. And of course the reason we wanted to sequence
DNA is we recognized that the DNA alphabet is actually incredibly simple. DNA basically consists of four different chemicals. We don’t even say the full names of those
chemicals; we just abbreviate them “A, T, G, and C.” And it is the order of those chemicals
that encode the information necessary for life. It’s the GATTCCCAAA and so forth that has
the code that dictates how a cell interprets it to make proteins to do various other things. The other thing we recognize by the 1980s
was that the human genome itself is a finite problem. It basically consists of 3 billion of these
Gs, As, Ts, and Cs. That’s a finite problem — it’s just 3 billion. So could we order the 3 billion letters across
the human genome and have that as reference knowledge for then figuring out all other
aspects of human biology? And we had developed the technical tools for
isolating DNA and for sequencing DNA, and it became apparent that, yes, it was possible
that we could develop an approach that would allow us to determine the complete sequence
of the 3 billion letters that make up the human genetic blueprint. And that led to the launching in October of
1990 of the Human Genome Project. This large, audacious effort that was basically
biology’s equivalent of the moon shot — putting a person on the moon — we would go in and
we would sequence the human genome. This was an effort that was regarded at the
time as a bit controversial. It would involve big science, big collaborative
approaches, international in scope — many countries involved — and basically developing
teams of scientists to go in and industrialize the process of sequencing DNA to be able to
figure out “What is the sequence of the human genome?” I had just — as a personal level — I had
just graduated medical school, actually an MD PhD program. I had just started my residency in pathology,
and I had the opportunity to join the Human Genome Project on the frontline as a trainee,
and I did it. And I will tell you it was one of those things
where it was very clear you were involved in something incredibly unique and exhilarating. And I will also tell that you it was something
that was absolutely terrifying because we really had no idea what we were doing. We just had this audacious goal — sequence
the human genome — we had some cursory tools, we had to figure out how to do it. But it was so motivating to be a part of it
that in fact lots of people got together, great minds put their ideas together, and
the strategies came forth for mapping and sequencing the human genome. And remarkably, when you get people together
they do incredible things. And by June of 2000, just shy of ten years
after that, came an announcement. It was a big deal announcement involving the
President of the United States, involving people over in the UK, Tony Blair was involved
in the announcement at the time, that we had in fact developed a draft sequence of the
human genome. It wasn’t perfect but it was pretty darn good,
and it was regarded a great celebratory moment. A few months later came a publication that
basically described the first view of the draft sequence of the human genome. And in this issue of Nature it was largely
going to be regarded as one of the most important publications ever to come out of the biomedical
research enterprise. It wasn’t done, by the way, because I would
certainly be the first to tell you that this was just a draft sequence, and in fact, there
was still more work to be done. But we went back to the laboratories, finished
up the sequence, made it really high quantity. And then in April of 2003 — precisely, actually,
fifty years after that Watson-Crick discovery of the double helical structure of DNA — came
the completion of the sequence of the human genome, and with it came an end to the Human
Genome Project. That was about nine and three quarters years
ago. Actually, it was 3,579 days ago that the Human
Genome Project ended, for those of you keeping track of these things like me. It was truly remarkable that this had been
accomplished, and it had been accomplished not in the estimated fifteen years that was
originally envisioned for the project, but rather we had finished the project in thirteen
years. Now, a lot has happened in genomics because
the tools that have been developed for studying the genetic blueprint of humans could be used
in various other ways. And I would say there’s been many, many, applications
of genomics that have taken place in the last nine and three quarters years. I will just show you a subset of these that
are certainly very relevant where genomics has found its way into having a significant
impact on these fields of study. That’s not what I’m going to describe today,
although these are all very interesting. In fact, they could be a talk in and of themselves
in each of these areas. Coming from the National Institutes of Health
and being trained as a physician, you might imagine that where my emphasis is going to
be is on health, disease, and medicine. And in fact, I’ve been at the National Institutes
of Health now for about eighteen-plus years, and we have a — we are funded by the Federal
Government; we’re part of the Federal Government — we have funding to really focus on using
the discoveries that we make to improve human health. And I’m now the Director of the National Human
Genome Research Institute. And our institute was originally created to
lead the US’ effort in the Human Genome Project, but we now see our mission as advancing human
health through genomics research. And that’s the focus of what I want to describe
to you today. And I would tell you in particular as the
director of this institute — and this institute is really the largest funding agency in the
world dedicated to genomics research — I take very seriously what our mission needs
to be. And having been wildly successful at the Human
Genome Project, I think very much what we need to be focusing on is operationalizing
this to actually improve human health. And the focus of what I’m going to describe
to you is on genomic medicine: Putting together the notion of genomics and clinical medicine,
and really to encourage and facilitate what I regard genomic medicine to be, which is
an emerging medical discipline that involves using an individual’s genomic information
as part of their clinical care; in other words, most of us trained as physicians, we treat
patients fairly generically — one patient to the next, one human being to the next — but
in fact, each of us has a unique genetic blueprint. And now with the tools of genomics we have
the potential to use that ability to have information about a patient’s genomic to eventually
help tailor their medical care. So the way I look at this is a journey if
you will to be able to accomplish something such as creating a medical discipline and
really advancing human health through it. And in many ways that journey begins with
the Human Genome Project — that was the starting line. It wasn’t the end of anything; it was really
the beginning of everything. And someday we want to see the realization
of genomic medicine broadly defined. But this is a journey. It’s going to involve many, many steps. I don’t even pretend to know what all those
steps are going to be at the moment. But we embark on a journey like this, reasonably
optimistic because we were so successful at the Human Genome Project. We’ve got to believe we’re going to realize
genomic medicine. And the reason we need to do this is it will
have fulfilled will promise of why we sequenced the human genome in the first place. We convinced the funding agencies, we convinced
the scientific community the reason to have the Human Genome Project: It would provide
foundational information about the human genetic blueprint that would allow us to eventually
improve the way we treat patients, and it will improve human health as a result. And that is the promise that I think we need
to now fulfill. So how do we sort of do this journey? What do we imagine this to be? Well, I will tell you that in thinking about
this journey from the base pairs of the Genome Project to the bedside of patients — or if
you prefer the metaphor “from the double helix to health” — the day the Genome Project ended,
literally the day the Genome Project ended in 2003, our institute published a strategic
vision for what’s next, what’s next for genomics now that we have the first sequence the human
genome. And this was a remarkably valuable document
that sort of cast out a series of ideas of what needed to be done. And that was nine and three quarters years
ago. And I will tell you as a co-author of that
strategic plan I thought, you know, it would be forever before we actually accomplish some
of the audacious things we described in that document. But what’s happened in the last nine and three
quarters years have been just truly remarkable. You know, once upon a time we have graphic
artists that would render illustrations like this, that would marry the idea of clinical
medicine and the double helix and the genome and think that someday, someday, someday these
images would be a reality. But I don’t think nine and three quarters
years ago any of us thought it was going to be anything less than twenty, thirty years. And yet we developed new technologies, we
generated remarkable amounts of data, and with that came incredible opportunities to
actually make this a reality? And in fact, we found a few years ago ourselves
poised to make this a reality in a far shorter period of time than we had ever envisioned
nine and three quarters years ago. And so as a result of that, we recognized
that it was time to put out a new vision because we had actually almost outlived our 2003 vision. And so a couple of years ago, we completed
another strategic planning process that yielded a new strategic vision for genomics, one that
we published then in February of 2011. Actually, we published it in the issue of
Nature precisely ten years after that publication of that first draft sequence of the human
genome. Now, a reprint of this strategic plan was
available when you came in. I hope some of you picked it up. If you didn’t, maybe you could pick it up
on the way out. Any of you who are interested in reading this
and didn’t get a reprint or want to send this to somebody, if you simply look at this URL,
this is a URL where you can freely download a PDF version of this and share it. And I really encourage you to read this. This was not written for a specialist in the
field; this was written to a very, very broad audience. And it really does describe in many, many
ways the exciting opportunities of operationalizing genomics to make it a reality in route to
a realization of genomic medicine. So I thought what I would do is to describe
to you some of the key elements the strategic plan, and then you’ll see I will frame it
because it will give me an opportunity to review what’s taken place over the last nine
and three quarters years. And then we’ll go into the future when I’m
done with that. Interestingly, what we heard about several
years ago before we wrote this new strategic plan in 2011 was that it was time to be more
sophisticated and more specific in iterating the steps that were going to be needed to
realize genomic medicine. And at the end of the day, we found it very
useful to organize our thinking, and increasingly we organized our scientific programs around
five major domains of research activities that were going to be required. Let me introduce them to you. The first domain is to do research to understand
the structure of genomes, how genomes are put together. Sounds a lot like the Human Genome Project,
and in fact, this is what we’re most familiar with. But you don’t want to just know how the genome’s
put together; you want to know how it works. So you also then need to have research activities
that help you understand the biology of genomes, how it is that genomes orchestrate that information
that then confers biological function. Well, once you understand how genomes work,
you might want to know how genomes can influence health and disease, and so you want to have
research that helps you understand the biology of disease. And when we get to it you’ll heard about how
virtually all disease has a genetic and genomic underpinning. Of course, if you know more about disease,
it might give you an opportunity to advance medical science. So you want to do research to be able to demonstrate
that you can use genomics to advance the science of medicine or medical science. And of course, you can’t just stop there because
just because you have a medical advance doesn’t mean when you go to operationalize it in the
healthcare delivery system it will actually do anything. So you also need to do research that actually
demonstrates to you and prove the effectiveness of healthcare. So these five domains in essence become a
framework by which we think about really everything we’re doing in route to genomic medicine now. The other thing that these five domains are
very helpful is they provide us an organizational framework for thinking about the last twenty
years. And then later I’ll be able to show you how
we think about the next twenty years. Because we can sort of inventory where we’ve
been successful and where all this is going. What do I mean by that? Well, for example, we could think about different
eras of time and think about where there have been accomplishments and how they distribute
across these five domains. So for example, the Human Genome Project. Well, during the era of the Human Genome Project
if I simply took a blue dot and put it down when a blue dot represents an accomplishment
— a scientific accomplishment, which is very generic — where would those blue dots pile
up? And as they piled up on top of each other
they may even change colors because they’d be so dense. Basically the bottom line is the Human Genome
Project was almost exclusively about this first domain of activity. We used genomics to understand how genomes
were put together. We sequenced the human genome and other genomes. Now, we learned a little bit about how genomes
worked, but fundamentally the center of gravity was around the first domain. Well, what’s happened in the last nine and
three quarters years since the Genome Project ended? Well, we continued to explore genomes. We’ve continued to learn a lot about how genomes
are put together. But increasingly, as I’ll describe to you
shortly, we’ve begun to understand how genomes work. Oh, sure, there have been some opportunities
to look at diseases but mostly those have been much less in terms of number of accomplishments
compared to these first two domains. So let me then drill down in detail and focus
in for a while on the last nine and three quarters years. What are these accomplishments that have characterized
the field of genomics since the end of the Human Genome Project, and what do these blue
dots really represent? Well, what I want to tell you about are five
major steps that we’ve made, going from the base pairs of the Genome Project in route
to genomic medicine. What are some of those — what are those steps
are? Let’s start with the first step. Well, the first step is to understand the
function of the human genome sequence, understanding how the human genome works; really, research
activities within these first two domains if you will. But why do we need to figure out the function
of human genomes? Isn’t that what the Human Genome Project did? No. What the Human Genome Project did was to produce
this — actually, this is only .001 percent of what the Human Genome Project produced
— this string of Gs, As, Ts, and Cs. I am fairly certain any of you staring at
this cannot immediately interpret the biological information that it encodes, but it encodes
biological information. We just don’t know the grammar, we don’t know
the words, we don’t know the syntax. But it’s all there. We’ve just got to figure it out. Well, I could also tell you that the day the
Genome Project ended, our ability to actually interpret this incredibly complicated text
was very, very infantile. We barely knew how to really interpret this
at all. We had to work very hard. That’s what we’ve done the last nine and three
quarters years. What have we done? Well, the first thing we started at is that
we did what we did well: We sequenced genomes. And we realized that there would be a remarkable
power in comparing our genome sequence — our genetic blueprint — to the genetic and genomic
blueprints of other creatures. Let’s remember humans are just one little
twig off this very complicated tree of life. But we’re all interrelated, and the evolutionary
process has been such that we all share a tremendous amount of common in our genetic
blueprints. But the fact of the matter is we are unique,
and wouldn’t it be cool if we could get the genomic blueprints of these other creatures
— compare them to ours — and then recognize things that are the same must be functionally
important because otherwise evolution would have changed those sequences because that’s
what evolution does? And so off we went immediately sequencing
a series of other animals, including laboratory animals, companion animals, our closest relative,
but we also realized it was very important to sample across this very complicated tree
of life. And so we started sequencing other critters
and recognized that there’s a lot of statistical power associated with sampling across different
parts of this tree of life and then using computer programs to analyze and reanalyze
all the different sequences to finds those sequences that have never changed over large
evolutionary distances because those sequences for sure are functionally important. And it would start to teach us the words that
we needed to interpret the human genome sequence to understand how the genome actually functioned. What did we learn from some of those things? By the way, we’ve now sequenced probably about
three dozen other vertebrate’s genomes have now been sequenced. And we have lots of computer tools that help
us interpret and reinterpret. We learned from that, for example, that across
our three billion letters of the human genome, about five percent of them or so are highly
conserved across virtually all mammals. In fact, they’re incredibly highly conserved,
and almost for certain those are functionally important. And that allows to sort of highlight the parts
of genome that we want to study more intensively and to be able to figure out those are ones
that are likely biologically important that we want to sort of look at more carefully. So for example, we could pull out one color
highlighter, which turns out to highlight about 1.5 percent of the letters in our genome. And that highlighter corresponds to those
sequences that directly code for protein — protein coding genes. In other words, these are the sequences that
actually encode the bricks and mortar in our cells. But it’s only 1.5 percent of our genome. Now it turns out that if 1.5 percent are these
genes and five percent is highly conserved, that leaves about three and a half per that
we better highlight in something else, and we are highlighting that in something else. These are as conserved through revelation
in genes in many cases, but they’re doing things other than directly coding for proteins. Well, what are they doing? Well, we’re learning. We do know that some of them are determining
where and when genes get turned on and how much. I’ll give you an example, and I sort of alluded
to it earlier: Why is it that of every one of our ten trillion cells contain the gene
for insulin, but insulin is only made in a subset of the cells in the pancreas. Well, that’s because there’s a switch that
determines, “Hey, I’m an islet cell in the pancreas. I will turn on this one gene.” And that switch is some of this purple stuff. So there’s all this circuitry that determines
where and when genes get turned on. Similarly, hemoglobin — every one of our
cells contains the gene for hemoglobin, but hemoglobin only gets made in a subset of blood
cells. Why is that? Because of the switches that determine when
a gene gets turned on and so forth. There’s a lot of complexity there, and we’re
learning about that complexity. Oh, but by the way, it turns out that we now
know from the last nine and three quarters years that it’s not just the primary sequence
of DNA that confers function, but there’s other aspects of DNA that confers function. And that’s something called epigenomics. Because it turns out that our DNA gets decorated
with chemicals and gets associated with the other molecules and that those little decorations
that take place on our DNA also influence our how our DNA works; that’s call epigenomics,
and it’s some of the ways that environment influences us by influencing our DNA. And we now are learning a lot about the epigenomics
landscape. And increasingly, we’re developing better
and better methods that allow us to read the second code — the epigenomics code, the decoration
code on our DNA — and that’s a whole additional code besides the primary sequence code. And a lot is happening in that arena as well. And in fact, we recognize the need to have
a very concerted effort in interpreting the human genome sequence, the primary sequence,
the epigenomic sequence. And our institute, for example, launched a
project called the Encyclopdia for DNA Elements or ENCODE, which basically was a large international
effort to get investigators together who just focused on developing laboratory methods,
computational methods to critically interpret all of these information clues within the
human genome, to get catalogues of all the genes, catalogues of all these other switches
and circuits, catalogues of all the epigenomic changes in our DNA and use that as a resource,
if you will, to allow other scientists to understand how the human genome works. Last year was an exciting year for ENCODE. They, in one flurry of papers in these journals,
they published about thirty papers that was really the first comprehensive view of the
functional catalogue of elements in the human genome. And increasingly, such as shown on this little
poster here, we are getting views — I’ll show you a blow up on this slide — here are
two regions of the human genome and overwhelming amounts of information you can see from all
these little marks and all these lines that are called “tracks.” The details aren’t important. I guess the analogy I would make is many of
us have GPSs in our cars, and those GPSs might tell us where the roads are, that might tell
you, like, the sequence. But increasingly, what ENCODE is giving us
is information about what things are associated with different parts of our DNA, just like
the GPS might say, “Oh, here’s a road, but here’s a gas station. Oh, and over here, here’s a hotel, and here’s
a McDonald’s,” and so forth.” So just like we are annotating additional
information on GPS that gives us information about our surroundings, ENCODE is increasingly
giving us information of what’s relevant functionally across different stretches of human DNA. And this will continue into the future because
it’s actually even getting more complicated. We’re also now learning that it’s not just
the linear sequence of letters in DNA that encodes information, and it’s not just the
decoration on DNA that encodes information, but we also now know that the DNA that sits
in the nucleus of all of our cells is not so innocent. It actually has a three-dimensional confirmation,
and different stretches of DNA are interacting with other stretches of DNA as it’s floating
around. And in fact, that also confers biological
information. And increasingly, we are learning how to read
out that information and be able to interpret this three-dimensional information that also
exists in our DNA. The fact of the matter is we will be working
on this for a long time. Decades from now I predict. My children and then my grandchildren will
still be interpreting and reinterpreting the human genome. Today if you ask me “Where are we?” We maybe at best have a Cliff’s Notes view
of the human genome sequence, okay? You know, it’s not like it’s the last scholarly
piece of this great novel; it is just the beginning superficial view. And for a long time we will be trying to understand
the complexities that reside in our three billion letters. So that’s where we are on that first step. How about the second step? Well, the truth of the matter is we’re not
just interested in how a hypothetical human genome works; we want to know how each of
our genomes work because each our genomes is just a teeny bit different. So we’re interested in variation among our
genomes. We’re interested in knowing how we differ,
and therefore how each of our genomes function in a slightly different way. These are research activities still mostly
have been over the last nine and three quarters years in these first two domains of research
activities. Now, what do I mean about by that? Okay, well, what I mean by that — let’s just
sort of back up for a minute. Each of us has two genomes in us, right? We got three billion letters from Mom, we
got three billion letters from Dad. Six billion letters makes up a true individual’s
genome. Now, look to the person — so look to the
person to your left, the person to your right, and I will tell you that you differ from a
person about one out of a thousand places along the stretch of those six billion. Okay? And if you add it up it’s about three million
to five million differences between the person sitting next to you and you. And these are all variants — I’ll indicate
them here by Vs — that the great, great, great, great, great majority of which really
have no biological consequence whatsoever. They’re completely silent in the great majority
of cases, but a subset of them are biologically important. Some of those, for example, might be problematic. They might be a ticking time bomb for some
disease you might get or some other detrimental attribute. Some of those variants you have might be good
variants. They might mean that you won’t get a disease
or that there will be something else that will be more positive. But by the way, each of you have do not have
a private three to five million variance that nobody else has. In fact, many hundreds of thousands of the
variants that you have, somebody else in this auditorium has as well. And in fact, the problem of genetic variants
is actually relatively finite we believe, and in fact, one might imagine that one could
catalogue at least the more common variants that exist in different human populations. And that idea occurred to us. And so several initiatives were launched to
try to understand human genetic variation; in other words, variants that exist within
our genomes. One such effort — the Human HapMap Project
— which had three major publications started this process, and it aimed to do several things. First of all, it collected DNA from well-defined
populations all across the globe because we want to get this to be sort of a variant catalogue
of humanity, not just of any one population. The second thing it did was to use various
fancy methods to figure out some of the most common variants that exist in all of those
populations and make those available on the Internet for all scientists to use. And then the third thing that this particular
project did was to basically start to develop an architectural map, if you will, about how
these variants relate to one another physically along the chromosome. What I mean by that is it turns out that all
of your variants, even when they’re sitting really near each other on a chromosome, they
don’t sort of just go off in different directions every generation, but rather, a given stretch
of a human chromosome, a block of maybe a hundred thousand bases, might have a hundred
places in it that vary but all one hundred of those — that whole block– tends to be
inherited as a block, as a neighborhood from one generation to next. And these are called haplotype blocks, and
this is why this was called the HapMap Project, was to characterize these haplotype blocks
for reasons that will become apparent in a few slides from now. There’s a little scientific trick I wants
to teach you. Well, this was pretty good, but the fact is
there was great interest in getting even more information about rare and rare variance. And so when better technologies became available
another international consortium was formed called the Thousand Genomes Project, which
initially aimed to catalogue all the variants in a thousand people. But of course, as always in genomics, we are
overachievers and now we’re doing it over 2,500 genomes but increasingly we are getting
a richer and richer view of genomic variation that exists and, again, samples from across
the globe and through an initial pilot effort that was published in 2010 and then more recently,
the first real critical look of the initial thousand individuals we are getting remarkably
deep catalogues of genomic variation. And with that is coming some insights about
what individual human genomes look like. So let me digress for a minute, and let me
tell you a few things about your genome. You’re genome by the number. So I already told you your genome has about
six billion nucleotides, six billion letters in it. I already told you that you have about three
to five million places that a single nucleotide is different compared to, let’s say, the person
sitting next to you. There’s also about 150,000 of your three to
five million variants we have never seen before; those are sort of new variants if we would
sequence your genome. Most of them we’ve seen, but a few of them
are relatively rare. And so we haven’t seen them yet. And it turns out that about sixty of your
variants neither of your parents have. Those are new oopses. Those are new typographical changes that took
place in creating you. And so those are new. And then about another 10,000 to 20,000 places
not only do you just have a single nucleotide that’s different, you might have a stretch
of DNA that’s missing or a stretch that’s inverted or a stretch that there’s a couple
of copies of something where in other people there’s only one copy — those are known as
structural pairings. Now, you may also be asking, “Well, across
my genome how many places am I broken? How many places are things maybe not working
the way they should? I’m just an aggregate.” There’s probably about 100 places where a
gene has been disruptive, should not be working. By the way, we have about 20,000 genes in
our genome and about a hundred times you may have one of your two copies of your gene — because
every gene you have two copies of — hundred times might be disruptive where one of your
two copies is broken. Oh, and by the way, each one of us probably
are carrying about twenty of our genes where both copies are broken. Just sort of cool to think about. We have 20,000 genes, and each of us on average
about twenty of those neither copy works. That’s why we’re all a little different. And some of this has biological consequence,
and some of it has less biological consequence. So that’s a little bit of digression about
your genome. But it makes you think about, “Well, which
of these changes in my genome might have relevance to my health? Which ones predispose me to a disease? Which ones might influence which diseases
I might get?” Which brings me to the third step of accomplishment
that I want to share with you. And here we’re starting to get closer to genomic
medicine because we’re starting to deal with human disease, and we’re dealing with this
middle research domain of understanding the biology of disease. There’s great interest in understanding the
genomic basis for human disease. So to describe to you what’s happened over
the last nine and three quarters years, I need to drill a little bit more to explain
to you about genetic diseases. All diseases are genetic. There’s not a single disease practically that
you can’t name that I won’t tell you at least there’s a genetic influence on the severity
of the disease, and many diseases have an overt genetic cause. But diseases can be largely divided into two
broad categories, if you will, that their underlying architecture’s a little bit different. Let me explain. And it’s important to see the distinction
because what’s gone on in the last decade or so in research in these areas have been
very different. On the one hand, there are rare genetic diseases. These are diseases like sickle-cell disease,
cystic fibrosis, Huntington’s disease. Now, rare genetic diseases are rare, but they’re
also simple. And the reason why they’re simple is because
there’s basically a defect in a single gene — one of the 20,000 genes in our genome — that
is the major risk factor for getting the disease. These are otherwise known as “monogenic,”
one gene. They’re also known as “Mendelian disorders,”
named after that famous geneticist Gregory Mendel and his pea plants. He figured out about single genes, so they’re
called Mendelian disorders. Now, there can be other genetic variants that
might influence the severity of a disease, and there might even be some environmental
influence on severity of diseases. But by and large, rare genetic diseases are
caused by defects in a single gene. Now, rare genetic diseases are devastating
to patients and families, but they’re rare. And they don’t in aggregate represent the
major healthcare burdens in the world. The major healthcare burden in the world are
these diseases because these are common diseases. These are diseases like hypertension; cardiovascular
disease; diabetes; Alzheimer’s disease; mental illness; many forms of cancer; and so forth. And common diseases, besides having a huge
burden on human health, turn out to be genetically complex. And the reason they’re genetically complex
is every one of those diseases I just named are not caused by defects in a single gene,
but rather it is a variant variance in multiple different places in the genome that together
conspire with what is typically a larger influence of the environment to confer risk for getting
the disease. It’s not an absolute; it’s an influence, and
it’s a risk, and therefore it’s more complicated. Because it is multigenic, otherwise known
as non-Mendelian disorders. When the Genome Project was being envisioned,
we always believed that having the sequence of the human genome would greatly accelerate
the ability to identify the genetic basis of rare diseases. And there was always a raging debate about:
Would we ever get good enough at figuring out the genetic basis of complex diseases? Let me tell you now for each of these categories
what has happened over the last nine and three quarters years. So shown here is a cumulative histogram of
all of the single gene disorders or traits for which we have identified the genomic basis,
so genomic basis known. And this is a cumulative histogram. Let me remind you. The Human Genome Project began there. The day the Human Genome Project began there
were only 68 diseases or traits for which we knew the genomic basis — 68. And you can’t argue with the fact that the
moment the Genome Project began, shortly thereafter, we saw massive upswing in the number of diseases
— single gene diseases — for which we figured out the defective gene. And the figure as of when I made this slide
a couple weeks ago was 4,800. We went from knowing the genomic basis of
68 single gene disorders to now knowing the genetic basis of over 4,800. Clearly that has been a remarkable advance
in our understanding of the genetic basis of rare diseases. Now, that’s pretty exciting, and it is a remarkable
achievement. But you’ll quickly learn from me I’m a guy
glass half full/glass half empty. The glass half full part of me says, “This
is remarkable. We now know the genomic basis of 4,800 monogenic
diseases and traits.” The glass half empty part of me, though, is
a reminder that there’s still about two thousand more single gene disorders that we don’t yet
know the gene. And on top of that, there’s another two thousand
or so that we think a single gene is involved and we haven’t figured out the gene, either. So this is only about half done and memorize
this pie chart. I’m going to come back to it when I take you
into the future. So that was single gene disorders. What about complex disorders, those multigenic
disorders? Well, here I want to introduce you to a strategy
and then tell you about the remarkable advances that have taken place. There’s something called Genome-Wide Association
study. That’s a mouthful. Let me explain to you a few things. Shown here is a stretch of a human chromosome,
and every one of these little black marks along the top represents a place that varies
in the genome among people. These are known as “single nucleotide polymorphisms,”
basically a letter that is different at that given position. Now, shown here in these inverted red triangles
are these haplotype blocks I talked to you about. So again, this is a stretch of a chromosome. And let’s say this block right there probably
has maybe — I’m making it up — a hundred plate variants that exist across that stretch. But all hundred of those variants tend to
be inherited and blocked together as a neighborhood from one generation to the next. And the same for all the variants on this
block, and the same for this one, this one, and one this, and so forth. And so the idea was to do the following study. This is the study one wants to do. What you want to do is you want to take a
thousand people with a disease — let’s say hypertension — and thousand people without
that disease, without hypertension. And then you want to know the thousand people
with hypertension, which variants do they have compared to those thousand people without
hypertension? Now, you could type all thousand people for
millions and millions of variants, but that would be incredibly expensive and impractical. But imagine if you just took a representative
variant from each of these haplotype blocks knowing that it was a surrogate for all other
variants in that particular neighborhood. So what’s the example I show here? All right. Well, let’s take the people with hypertension,
and let’s type them for a marker that’s up there, and let’s say that this particular
variant comes in two flavors — it’s either a green or it’s purple just for simplicity. Well, you can see the people with hypertension,
the distribution of green and purple — people without hypertension, green or purple — one
might argue that there’s really no correlation between whether you have hypertension and
whether you have the green variant or the purple variant. But let’s keep going. Let’s go and do some other of these haplotype
— let’s go over here. We’ll do that one. Now, all of a sudden the variant we’re testing
is orange and blue. And you can immediately see, “Wow, the people
that got the orange variant are the people that get hypertension more, and the people
that get the blue variant tend not to have hypertension.” So it doesn’t mean that orange variant causes
hypertension, but it does mean that there’s probably a variant somewhere within that neighborhood,
somewhere within that block that might be conferring risk for getting hypertension. This is what is known as “association,” associating
a variant with a disease. If you survey across the whole genome and
all the blocks, you’re doing it genome-wide. And that is why it is called a Genome-Wide
Association study. So the question was — this sounded great,
and you can make really cute PowerPoint slides that show the strategy. The question was: Would it work to the point
that you could truly rule out certain regions of the genome and truly rule in other parts
of the genome and have it make any sense at the end of the day? And the answer was yes, this actually would
work and you could survey across the entire human genome as we’re doing here and find
places where there is high statistical correlation between having a particular variant and having
that particular disease. And we were excited to see by 2005 the first
successful example of this. It was actually when some of the earliest
data from the HapMap Project came out, came a story where age-related macular degeneration,
a region of chromosome one was found associated. And they even figured out what the gene was. And we started to catalogue the success. People always wondered, “Would it work?” And this paper demonstrated that it did work. So we actually started cataloguing it, shown
here is all the human chromosomes. And what we did for that first paper was to
show by a little lollipop where in the genome was it associated with that disease? And then we just started following the scientific
literature. And by 2006 there were a couple other successful
Genome-Wide Association studies, and you can see other lollipops were stuck on the genome. Well, by 2007 this just took off like mad. And all of a sudden, every single time we
looked up in 2007 and in 2008, more and more and more and more papers were being published
that described successful Genome-Wide Association studies. These were for diseases like hypertension,
diabetes, cardiovascular disease, all the things we care about. And all of a sudden we were going from having
— not knowing anywhere to the look on the genome to now having very discreet regions
of the genome labeled by these lollipops that clearly indicated where one might search in
greater detail to find specific variants conferring risk for disease. And that trend just continued and continued
and continued, and here’s the latest graphic. In 2005 the first successful study — fast
forward to now 2013 — over 1,400 successful studies that have demonstrated where there
are regions of the genome we want to be looking in greater detail to find the variants that
are conferring risk for specific human diseases. That has been just truly remarkable. Now, the fact of the matter is it doesn’t
mean we have the answers yet. In fact, all we’ve done is we now have it
down to a neighborhood that we need to now search door-to-door to figure out which is
the variant that’s the one conferring risk. But that has greatly changed the face of many
studies in human genetics and has been really exhilarating to watch the potential this might
have for the next phase of trying to figure out the molecular basis of human disease,
especially complex diseases. Now, that was the glass half full. Let me always bring you back to the glass
half empty. Guess what? We’ve learned something else in past three
or four or five years. We are now learning that increasingly, when
it comes to rare genetic diseases, most of those changes — those mutations, those typographical
differences — are in the protein coding portions of the genome, the part of the genome we understand. We understand how to break the genome and
have it not make a protein. The problem is over here: This great medical
challenge for the future. Most of those variants are sitting outside
of protein coding regions. It turns out that those are probably changes
that are affecting our circuitry. And we don’t know how to interpret that as
well yet because we don’t know how all of that works. And so this is going to be a huge technical
challenge and a knowledge challenge. But it’s one that we are very motivated to
figure out because we need to figure out what variants are conferring risk for these complex
diseases. And then we want to understand what we might
be able to do with that knowledge. And so it’s going to be a hard road ahead,
but at least we’re in a place of being in a much better position of trying to study
it. In order to study it, what we clearly are
going to have to do with our thousand people with hypertension, our thousand people with
Alzheimer’s disease, our thousand people with diabetes, is that we need complete inventories
of all of their variants. Because we’re going to really need to tease
out which of the ones are truly conferring the risk. And in order to do that, we’re going to not
just be able to do Genome-Wide Association studies the way we’ve done them in the past;
we’re really going to have to sequence lots and lots of people’s genomes completely to
be able to get complete inventories of all those variants, which leads me to this third
step along the way, and that’s routinely being able to sequence whole genomes. Now, here I once again have some pretty exciting
things to tell you because it has been truly remarkable what has happened in this field. So far what I’m going to tell you about is
mostly affected these first three domains, but the fact of the matter is with time, as
you’ll see, it will be affecting these other domains as well. Well, what’s the origin of these advances
I’m about to tell you about? Well, remember I told you on the day the Genome
Project ended nine and three quarters years ago we published this strategic vision for
what was needed beyond the Human Genome Project, and we wrote a lot in this article. But one of the things that we wrote, which
is in many ways audacious and in many ways remarkable that we were willing to put this
into print is that we called for technological leaps that seemed so far off as to be almost
fictional but which, if they could be achieved, would revolutionize biomedical research and
clinical practice. And the example that we gave in 2003 was the
ability to sequence DNA at costs that were lowered by four to five orders of magnitude
than the current cost, allowing the human genome to be sequenced for a thousand dollars
or less. Now, why was that such an audacious thing
to put with your name on it — my name was on that — into the scientific literature? Well, the reason it was so audacious is the
day this came out, we had just finished sequencing our first human genome at a cost of $1 billion
roughly. And here we were proposing to take on the
challenge of lopping six zeros off of that figure by incrementally improving technologies
for sequencing DNA and delivering on something that was quickly referred to as a “thousand-dollar
genome.” Well, a thousand-dollar genome became a battle
cry for the scientific community. Our institute gave out grants to all sorts
of people to come up with crazy and wild ideas for sequencing DNA. And the notion was we wanted to replace the
factories that sequenced the first human genome as part of the Human Genome Project and replace
it with some fancy schmancy new technology — a nano this, a microwhatever, something
that would reduce the cost substantially and allow us to generate a sequence of the human
genome for a thousand dollars. Now, I will tell you I’ve been involved in
genomics for 25 years. I could tell you about all sorts of remarkable
technological advances, but no doubt the number one technological advance that’s happened
in genomics in 25 years has been not the development of one or two or three or four or five, not
six, seven, eight, but nine or more new technologies. These are just available instruments today
that allow you to sequence DNA at remarkably reduced costs and with remarkable speed. It has been truly revolutionary what has transpired,
and these technologies are completely changing the face not only of genomics, but are changing
the face of all biomedical research. Now, have these new technologies reduced the
cost of sequencing? Absolutely. It has been truly mind-boggling. Let me show you some real data. If you asked me the question, “What was the
cost for sequencing the human genome over different periods of time if you went to sequence
a human genome?” We actually collect that data because we support
groups that do nothing but sequence human genomes. Let me remind you of a concept, Moore’s Law. Probably many of you have heard of Moore’s
Law. Moore’s Law is the law of the computer industry
that basically says that compute power doubles about every eighteen months. And if you talk to experts in technology development,
they say nobody keeps up with Moore’s Law except for the computer industry. They have set the standard, and nobody beats
them. Let me show you some data. So shown here is a graph that depicts the
cost for sequencing a human genome, starting at about 2001 in green. And notice that the y-axis is logarithmic. And notice in white is Moore’s Law. So if you follow Moore’s Law, you rock. Okay? Well, our sequencing centers, for a long time,
kept up with Moore’s Law. They rocked. And then right here, they switched to these
new technologies that have been developed for sequencing DNA, and ever since then they
have blown Moore’s Law out of the water — truly, truly remarkable. And in fact, if you asked me today, “What
does it cost to sequence a human genome?” I will give you a very accurate answer. As of today, it costs exactly that amount
to sequence a human genome. It’s not quite at a thousand dollars, but
we’re going to get there really soon. Couple thousand dollars, maybe a few thousand,
depends exactly how you calculate it. But it’s not just the cost of sequencing;
it’s actually the speed at which you can sequence a human genome. To put it in context, when we sequenced that
first human genome as part of the Human Genome Project, we were actively sequencing for six
to eight years and it cost a billion dollars. The day the Human Genome Project ended, we
did some calculations. If we would have said, “All these sequencings
groups go back and sequence the second human genome,” they came back and said, “If you
gave us the money, it would take us about three to four months. It would cost about $10 to $15 million.” Pretty good — an improvement but still slow. Today, using various technologies — I hold
in my hand one of the fancy little chips that could be used for sequencing a human genome
with one of those instruments. And with one of those instruments and a chip
like this, you can now sequence a human genome. It would take you about two to three days. I actually know for a fact that that will
be reduced probably to a day within about the next six months and would cost something
on the order of $4,000, $5,000, $6,000 — in some places maybe even less than that. Truly, truly remarkable. Completely game-changing. And you know what? It’s only going to get better. We already know there’s new technologies coming
on board. For example, there’s a fancy new thing called
a “nanopore,” which is basically a molecule that sits in a lipid bilayer, and the DNA
strand gets pulled through it and letter by letter gets read off as it gets pulled through. It’s just completely way cool. All right. And it even gets better than that. Allegedly — and I don’t know if this is true
— but allegedly a company is about to commercialize instruments for using this technology. And one form of the instrument will be a little
USB device that plugs into the laptop, and in a day it will read out a human genome sequence
— if it’s true. Which is truly remarkable. And do you know what else is incredibly remarkable? Is I asked the company, and this USB device
will work equally well in a Macintosh or PC computer. So they’ve thought of everything. So stay tuned. I do not worry anymore about getting to the
thousand-dollar genome; it is not what keeps me up at night. But you might ask me what keeps me up at night,
and I will tell you. This is what keeps me up at night: Routinely
being able to analyze this sequence. We have gone from having a situation where
we are data poor, analysis rich. Because before we had the first genome sequence,
to now having a circumstance where we can generate so much data, but we can’t analyze
it fast enough. And this is affecting every single area across
all these domains. You know, the current bottleneck in all of
biology, but certainly in genomics, is caused by our own successes. We are victims of our own success. We develop technologies for sequencing genomes,
and we haven’t kept pace with our ability to analyze the data. This is what it’s like. It’s like trying to assimilate as these data
gets spewing out of these machines. We just — we can’t capture it all. We’re overwhelmed by this. We are big data people. You know, in the past it was the particle
physicist, it was the climatologist, it was the astronomers — they were the big data
people. Biologists? We have little data. Now we’ve got big data. And this big data’s now a bottleneck. We don’t have enough servers and processors
for handling all the data. We don’t have enough computer tools that allow
us to analyze the data robustly. And for any students in the audience, pay
attention. We know the workforce, the next generation,
we need many, many new people trained in new ways to be able to be biologists; and clinicians;
and physicians; scientists; and so forth. And so we’re tackling this at NIH and other
places. But it’s not the only bottleneck. It’s not just all a compute issue. There’s also an informational problem here. I want to be very candid and clear. We have fancy methods for sequencing a genome. I could sequence any one of your genomes in
two to three days. And I could even get past the computational
bottleneck, and I could even read out the list of your three to five million variants. But the truth of the matter is if you were
a patient in a hospital, or you were a subject in a clinical research study and I had those
three million to five million variants, when I rounded on you in the morning, you know,
I would be puzzled by most of those variants. I don’t know what most of them mean yet. We don’t have information. And most of the time we’re going feel like
this, like, “Wow, there’s the list, but I’m really not sure.” So we got a challenge ahead of us, but we’re
going to deal with it. It is interesting. Harold Varmus, Nobel Prize winner, previous
director of the NIH, now head of the Cancer Institute; he wrote an editorial piece a few
years ago where he talked about how physicians are still a long way from submitting their
patients’ full genomes for sequencings, not because the price is high, but because the
data are difficult to interpret. That’s really where we are now. A colleague of mine wrote a piece, also recently,
talking about the thousand-dollar genome, but the hundred thousand-dollar analysis. And you sort of laugh, and then you say, “Yeah,
but it’s actually not very funny. We got to solve this, too.” And we will. So those are the five steps I wanted to tell
you about. I recognize those aren’t the only five steps. There are other things you’re thinking about,
especially the more clinically-oriented domains, whether it’s new diagnostics, or it’s new
therapeutics, or it’s new prevalent measures — all these are going to happen. And there’s probably things that none of us
have thought about at all, and that are anticipated and that are profoundly exciting. That now leads me to sort of the last segment
of my talk. I promised you I would take you to the future,
and I want to do that now. And I will tell that you that the future is
very much going to be driven by technology. And maybe that’s obvious to all of you, but
I do want to remind you that technologies drive science. You know, whether it’s, you know, when we
came up with the telescope — that’s what drove our understanding of astronomy. We came up with a microscope — that’s what
drove our understanding of cell biology. We came up the imaging technologies, like
CAT scans and PET scans — that’s what drove radiology. And these new sequencing instruments for sequencing
genomes are driving genomics at an incredible, remarkable pace and they will for the foreseeable
future. Now, one of the immediate things I want to
do is to tell you about the next twenty years that are very high level, then I’m going to
give you some specific examples. At a very high level, let’s fill in the rest
of this chart I showed you earlier. How are things going to advance over the next
twenty years? Well, what’s the rest of this decade going
to look like to 2020? I want to be very clear: We’re not going the
change the face of medicine by the end of this decade. The center of gravity is going to move rightward. I think the next decade is going to be very
much about understanding the biology of disease, as well as the remarkable continued advances
in understanding how the genome works. With that will come some clinical advances. But clinical — real clinical advances — are
hard, and they take a long time to fully operationalize. They’re coming beyond 2020. I just want to manage expectations by reminding
you that I think they’ll be there, but I don’t think they’ll necessarily be there just before
the end of the decade. But there will be considerably more achievements
in these clinical domains before the end of the decade. How is that going to be accomplished? Well, we’re going to sequence a lot of people. Oh no, no, no, not just the people on this
slide. No, no, not evenly the people on this slide. In fact, not even just the people on this
slide. We now think — already we believe over the
— before the end of the next few years we’ll have over 100,000 peoples’ genomes will be
sequenced. Most of these are taking place as part of
major clinical research studies. But increasingly, these will be done as we
figure out how to operationalize this for practicing clinicians. And there’s many studies that we’re doing
to start to look at that. Critical to this is as these genome sequences
get generated, they need to be shared widely to the whole research community. Because the greatest power of genomics and
our understanding — especially of the genetic basis of common diseases — is going to be
by having lots of data available and having a lot of the scientists looking at that data. And so increasingly, sequence will be streaming
into public databases for scientists to be able to access and to be able to utilize in
a fashion to allow them to enhance their studies. What are some specific areas that particular
a funding agency like ours is focused on? Well, let me remind you what some of the key
ones are. Using these powerful new sequencing technologies,
we need to fill in this pie chart. And we’re going to do that. We have a program that we’ve just launched. The groups are going to focus on accelerating
the rate at which we can identify the genetic basis of these rare diseases and fill in these
remaining sectors of this pie chart. We similarly have programs where we’re going
to go from these lollipop regions of the genome to drilling down and really figuring out which
variants are the ones conferring risk for Alzheimer’s disease; for diabetes; for mental
illness; for asthma; for cardiovascular disease and so forth. But those are all big long-term projects,
and I’m not claiming that they’re [inaudible] other than a considerable amount of additional
work. A logical and fair question some you have
might ask me is, “What’s some of the lowest hanging fruit out there? What are the things that might actually change
medicine before the end of the decade? What are some of the greatest things you think
are going to happen the soonest?” If you ask me that, I will give you one immediate
answer because I firmly believe it, and that comes in the area of cancer. The reason I picked cancer is — let me remind
you — cancer is a disease of the genome. The reason cancer forms is because the genetic
blueprint has broken in some way, making cells grow out of control. And guess what? We now have the ability to open up a cancer
genome and see where the misspellings are, where the changes have taken place. And through efforts such as the Cancer Genome
Atlas, which is NIH’s big effort in sequences tumors, sequencing cancer specimens, but is
going on around the world. At a ferocious pace we are now at a point
in time where many, many different kinds of cancers are having their genome sequenced,
and it is revealing remarkable things and changing our understanding about cancer, and
in some cases leading to new approaches in cancer. So today cancer diagnostics is mostly takes
place by having a pathologist look under a microscope and say, “Okay, this is this kind
of cancer.” But many times we’re wrong of knowing whether
by this view of this cancer, whether to give a patient this therapy, or that therapy, or
some other therapy. And in the future — and by the “future” I
mean it’s already happening for some cancers and absolutely will happening for others in
the next five years — we’ll continue to have that pathologist look at that specimen and
try to figure out exactly what kind of cancer it is. But more importantly, we will also use a sequencing
instrument to make a genomic map of that cancer, and from that we will get information about
the best course of treatment; and in some cases we’re learning about new avenues for
developing new treatments. This is going to be game-changing in cancer,
and I think it will be one of the first areas of true genomic medicine being implemented. What’s a second example of low-hanging fruit? Things that we’re already seeing? It’s in the area of therapeutics, medicines. You know, I’m not sure most physicians think
a lot about genes, but they’re going to start to now. And one of the ways they’re going to start
to think more and more and more about the genome is because they are recognizing that
every patient is different, that every patient has its own unique genomic information. And the fact of the matter is everyone responds
differently. And everyone responds differently to, not
only things in life, they respond differently to medications. Because the fact of the matter is every single
medication that comes to market — that you can get in your pharmacy — every one of them
work, they just don’t work in everyone. And the fact of the matter is, a major reason
why they don’t work in everyone is because each of us metabolize drugs little bit differently. And guess what? How we metabolize drugs is partially an effect
of genetic variants that we have in our genomes. And so the idea of pharmacogenomics — pharmaceuticals
and genomics coming together — is going to create a new era for many kinds of medications. It’s already true for a handful — but that
list is going to grow with time — where we can take individuals all with the same diagnosis,
all the same disease, but we can in advance stratify them by testing their DNA — testing
their genome — and figure out those people that won’t respond to this medicine, and those
that will respond poorly to the medicine, and don’t give them the medicine. Give them a different medicine, and only give
medicine to these individuals who are going to respond without having side effects. And this isn’t science fiction. This is real. It’s real for a handful of medicines now. And increasingly it’s going to grow [inaudible]. And of us have the experience of getting a
medication that makes us sicker or has no effect, and a lot of that response is due
to medication. So these are the kinds of new initiatives
that we are now putting into play. I just told you about pharmacogenomics. We have other pilot efforts that we, in particular,
are setting out research projects on. Some of these are actually taking clinical
care and using genome sequencing as part of that care, but doing this in a research setting
to figure out how we’re actually going to carry that out. We have various demonstration projects that
we’re now embarking onto to try to test some of these things out and help different groups
learn how to do this from others. Another area that we’re certainly now doing
research in is in the area of newborn screening. Every human born in America — and actually
most developed countries — within a day of birth they get a little heel stick, and they
get a little blood taken from them, and they get a battery of about 25 to 50 genetic tests
done. But what might that future look like when
we now know the genetic basis of four, five, six, seven, 8,000 rare genetic diseases? And maybe it will just be cheaper to sequence
their genomes than to test them for twenty-five or fifty. But there’s lots of issues there: Who should
have that information, what do we do with that information, how to do we actually operationalize
that? So we have research studies that we’re now
starting to investigate “What might a future like look with newborn sequencing? Do we want to go there? How and what might we learn? How would we actually use the information?” And similarly, there’s a lot of information
that we need to learn about all these variants and which ones are relevant clinically. I showed you the cartoon earlier of the person
sitting there at the bedside of the patient, not knowing what any of this means. And I will tell you as I’ve traveled around,
that truly, we need a better clinical system for assimilating all this information, especially
with the onset of better and better electronic health records. We simply need to provide practicing healthcare
professionals much more information about genetic variants so that in their brief encounter
with a patient, they can quickly figure out which genetic variant might be relevant and
which one might not be. And we need to have that be curated so they
can look that up and immediately see what to do with that information. So let me just tell you one other really cool
example before I truly do wind down, because some people think this is also a point I probably
need to make. I don’t mean to imply for a minute that the
only relevant thing in human disease is genetics. I fully appreciate that there is a role for
both the genome and also the environment in understanding human disease. And I showed it in this graph, like where
I clearly show the environment. The fact of the matter is — the main reason
I’ve emphasized genomics is because, one, it’s what I do, but two, because there’s been
this remarkable advance in genome analysis technologies. And the fact is that’s lagged behind considerably
technologies for monitoring the environment, although those are improving. And I think over the course of the next decade
we’ll get better and better ways of monitoring what we eat and what we’re exposed to and
so forth. But let me just quickly tell you a story that
I think is so cool because it’s using genomic technologies to actually also monitor the
environment. And in fact, the environment is the most intimate
environmental exposure that we have because it’s in us and on us, this thing called the
microbiome. You know, when we’re born, we’re born sterile. We have no microbes living in us and on us. We are born essentially sterile. But within a matter of weeks, we are quickly
colonized by many, many, many microbes, bacteria, fungi, and viruses, and so forth. And in fact, each and every one of us has
about a hundred trillion microbes living in us and on us. Remember, we’re only ten trillion. We are outnumbered at the cellular level ten
to one to microbes. You are a minority owner of your little ecosystem. Okay? You probably never realized it and you probably
all want to go wash your hands right now. But get used to it. Collectively, that community of microbes that
outnumbers us, our most intimate environmental exposure, is called the microbiome. One other statistic and then I’ll tell you
the way cool part. Ninety percent of those microbes that are
living in you and outnumbering you have never been isolated, never been studied; we don’t
even know what their name is. We’re blind to them because we don’t have
a laboratory method to isolate them. Only about ten percent of all those microbes
do we even know exist. And when you get cultures done they only grow
out about ten percent. But now with these new sequencing methods
we can actually go through, and we can actually study that other genome. Because now, instead of actually isolating
those organisms, we can just isolate material from you, purify the DNA, sequence it, and
then inventory what we see at their DNA level without actually having grown those organisms. This has led to a big project called the Human
Microbiome Project, which is just winding down. But it is a project to catalogue these microbes
that live in us and on us at different sites — in your mouth; in your ear; in your nose;
on your skin; and so forth — and then be in a position to figure out how those community
of microorganisms change in health and disease. Because most of the time we get along great
with our microbes, but sometimes that little equilibrium gets off and the microbiome could
actually be detrimental. And we want to be able to understand that
and study it. It’s absolutely going to be a major part of
biomedical research in the future, and it’s an area that is going to change many aspects
of clinical microbiology. Once again, I’ll take you to that pathologist,
who right now might look at Petri plates, and might look under a microscope at the motorcycle
organisms, but he’s only seeing ten percent of what’s there. But in the future, he’ll still culture those
ten percent and study those ten percent, but he’ll also sequence what’s there. And with that, he’ll get much better views
of full inventory of microbes, and how those communities are shifting with and without
disease and what role they might play in disease. So it’s just sort of a way cool way of thinking
of it, that we’re using genomics to also monitor our environment. And it’s a growth area in biomedical research
I wanted to tell you about. Okay. Let’s take a step back. I’ve gone through a lot of material, I know. The truth of the matter is when the Genome
Project started, we had a vague idea that some day, some day, some day this might be
relevant for medicine. I think by the time the Genome Project ended,
we started to get some feel for that there was good use of genomics for health and healthcare,
but it was still pretty fuzzy. I would say that when we published our strategic
plan in 2011, it started to come into focus. And while it’s not totally in focus quite
yet, I really do believe by the end of this decade many aspects of it are going to be
crystal clear. And we won’t have all the answers, but we’ll
have a lot of the knowledge of what we really need to do. With that said, there’s lots of other things
I’m sure we could talk about. There’s all sorts of issues at the intersection
between the genomic advances I’ve described and the ethical, legal, and social implications
of what we do. There’s many societal issues. And I wouldn’t be surprised if some of the
questions you asked me — as often is the case in audiences like this — will relate
to that. And there really are many great questions. We actually study a lot of these. And we give grants to study a lot of these
ethical, legal, social implications. And I will just tell you quite candidly, while
I’m extremely enthusiastic and I’m extremely excited about the future, I don’t pretend
to know all the answers. I don’t think the genomics community knows
all the answers. I can’t help but quote Albert Einstein. This is actually a sign I keep immediately
above my door in my office, and it really is true about genomics. I mean, truly, if we knew what we were doing
we couldn’t call it “research.” And we still — everything I’m describing
to you is still research. It’s exciting, but nonetheless, so much of
operationalizes is going to require a research agenda. So let me just leave you with a quote before
I just tell you about two more things. And that quote really reflects my view of
the field of genomics. It’s a Winston Churchill quote. Winston Churchill said that “A pessimist sees
the difficulty in every opportunity.” Genomicists — we are not pessimists. You know, we see opportunities, and we did
the Human Genome Project. We see opportunities. We’re getting close to a thousand-dollar genome. We are optimists. And we see an opportunity in every one of
those difficulty, every difficulty we encounter. And I think that very much flavors the kind
of feel that I get to represent. So just in the last two minutes, I’m just
going to tell you two other things. You might say, “Wow, you are very enthusiastic. What fuels your excitement about genomics?” I can come up with lots of examples, but I
just can’t help but tell you how proud I was that Time magazine at the end of 2012 put
out a whole set of top ten lists: Top ten stories in politics, top ten stories in business,
blah, blah, blah. They put out their top ten medical breakthroughs
of 2012 — the whole world the of biomedicine top ten. We got five of them. They talked about cancer genomics in a project
that one of our big funded centers is involved with. Just exactly what I told you about: Sequencing,
in this case, pediatric tumors. Number seven or six on their list was a story
about breast cancer genomics. The exact project I told you about — the
Cancer Genome Atlas. Number seven, speedy DNA based diagnostics
for newborns, related to newborn sequencing using these two technologies — remarkably
incredible. Again, high on the list. Human Microbiome Project I told you about,
number two. Number one medical breakthrough for 2012 according
was the ENCODE Project — this interpretation, developing a catalogue of how the human genome
actually works. Five of the top ten were genomics, and if
you think that doesn’t make someone like me or other members of genomics community proud,
you’re missing something. Lastly, let me just tell you about — that
was last year — what about this year? I want to extend an invitation to all of you. This is going to be an incredibly exciting
year for genomics. Celebratory year: 60th anniversary of Watson-Crick
discovery of DNA. Pretty remarkable. Tenth anniversary of the completion of the
Human Genome Project, April of this year. That’s — we’re doing lots of things to commemorate
the tenth anniversary of the Human Genome Project. If you’re interested in following some of
those things, go to this URL: There’s going to be symposium and lecture series,
all of which we will put up on the web. We actually have a YouTube channel called
“Genome TV,” and we post lots and lots of videos and all of our major events, which
are described here will be posted. So any of you who thought what I talked about
was cool, you can get even way more cooler stuff if you listen in on some of the things
that will be happening throughout this year. And here’s my invitation. My invitation is: I hope all of you will come
down to Washington D.C. because we have a partnership with the Smithsonian’s Natural
History Museum where we are jointly putting out a new exhibition on human genomics. That exhibition will open in June of 2013. It will be in Hall 23 of the Smithsonian Natural
History Museum, right next to the Hope Diamond. You can’t miss it. Just go to the Hope Diamond and take a left. Okay? It’s — this is the exhibition: Genome Unlocking
Life’s Code. I just reviewed the 95 percent design. It’s been one of the most fun things I’ve
done in my life of co-designing an exhibition. It will be at the Smithsonian for twelve months
starting in June, after which it will tour North American for four to six years. And it’s something we’ve done in partnership
with them. And we think this is just wonderfully exciting. And with that, I will end. I thank you for your attention. I’m happy to answer any questions. [ Applause ] >> Hold up your questions, please. And let’s save your applause. We have about ten minutes for questions and
answers. So please hold your questions up so they can
be collected. >> We can turn the lights up if we’re going
to ask questions. >> Right. I will work on that. Okay. Lights. Okay. They will come up. Okay. Dr. Green, what are thoughts about privacy
issues and other dangers that may result from wide sharing of an individual person’s complete
genome sequence? >> So let me repeat the question, which, as
anticipated is both a good question and a question related to some of the ethical issues
related to privacy. There is obviously quite a lot of intimate
information buried within our genome sequence, especially as we learn more and more about
genome variants each of us might have and what it might mean. And what do I think about these things in
terms of risk to privacy? And what I would immediately say is this is
something we are actively studying, both with respect to what the privacy issues are, what
people’s tolerance of this might be, ways to protect individuals, and so forth. Also related to this is people who are involved
in research studies, making sure they have been appropriately consented; they have full
understanding of what the implications might be; and for many, many of our studies where
we are getting information about individual’s genomes and information about their clinical
characteristics, those sorts of data are put in databases that are protected from the point
of view that only scientists who have applied for access to them can get it and have to
abide by confidentiality issues. But this is going to be an ongoing — and
many of the issues I talked about around newborn screening and newborn sequencing immediately
comes to the fore about, “Well, who has access to that information? If all newborns have genome sequence available,
how do we protect individuals?” This is sort classic two edges to the sword:
On the one hand powerful opportunities, on the other hand we don’t want to have this
used in a detrimental way. It’s an ongoing — we have to constantly be
looking at this, and constantly be aware, and always give people the protections that
they request. And we certainly find, as we study this, a
wide range of people’s concerns about this. And I will tell you that in today’s world,
especially the younger generation who are heavy Facebook users — who I am amazed how
much personal information they share on Facebook — they tend to be much less concerned about
some of these issues about privacy about their genetic information. And there’s people, obviously, the other end
the spectrum, who are very concerned. And we have to honor that, as well. So it’s really trying to define this. We are very aware of this, and that also includes
investing money to research this to understand it. We can’t do this in an insensitive way, and
we won’t. I got a lot of questions there. >> Yeah. >> Sort them all out. Go ahead. >> We will get through them. I’m trying to get through as many as I can. Are you concerned that recombinant DNA technology
could result in serious problems, such as the development of a superbug or other organism
that could drastically alter the ecological balance of the earth? >> Wow, that’s a heavy one. So, by the way, there’s probably very little
that I have spoken about that — that question mostly relates to recombinant DNA concerns
that have been in place every since the molecular biology revolution of the, you know, 1980s. And probably as much of what that’s being
described could genetic and genomic manipulations of microorganisms or selected animals as part
of, you know, sort of livestock. Or are you thinking about, you know, genetically
modified fish and so forth? I’m not personally as concerned about some
of these things as others might be. These have been studied and monitored extensively
for many years by the NIH and continue to have many, many steps of oversight in these
things. There was certainly this year, an interesting
flurry of concern around some studies that were being done: H1N1, flu, and concerns about
some implications of that. And at the end of the day, again, oversight
is being put into place to properly ensure that we don’t have detrimental consequences. I think it’s something for us to be aware
of, and we can’t ignore this. And I, certainly as not — in the grand scheme
of things, I think some of these concerns are far greater than what the scientific basis
of them would indicate. >> If a person has been diagnosed with a rare
genetic disease — specifically a deletion of PMP-22, which results in a rare form of
muscular dystrophy — where does one go for medical advice and consulting? Most physicians are uneducated in this area
and can offer little help. >> So I’m not familiar with the particular
gene you’re talking about there. But this is a very common question about rare
genetic diseases that — you’re absolutely right, these extremely rare diseases, the
typical practicing physician may not know. There are various resources that are available,
including some that we have at NIH. If you sent me — whoever asked the question
— an email, I could easily refer you to staff that will point you to all sorts of places
that you could look at. We have [inaudible] called the Office of Rare
Disease Research which deals with exactly these issues, both for referrals and patient
groups and connecting people and scientists and physicians who might have expertise in
those diseases. The real answer to the question is almost
for certain there’s somebody out there. If the gene has the name, there’s somebody
out there who’s an expert, and it’s just a matter of connecting with individuals. That said, it doesn’t always mean just because
we know the genetic basis there’s a cure or there’s a treatment. In some cases, sadly, we may not have that
yet. But there are ways to network to experts. >> What safeguards do you recommend to protect
individual rights and individuals’ privacy rights? >> So the first thing I would say is that
nobody is forcing anybody to participate in genomics research, and nobody is forcing you
as part of your clinical care to have your genome interrogated or sequenced. So the first thing is you can decide what
you want to do. And so I think that’s the first and foremost
the way to do it. And there are laws that exist in some cases. There’s a thing called the Genetic Information
Nondiscrimination Act that Congress passed a number of years ago that at least protects
you within certain areas that you can’t be discriminated on. We’re certainly working to try to even have
stronger. And there really are, actually, very few examples
where this has been a — in fact, some people believe there’s almost no examples where this
has really materialized in a serious way. There is concern and we actually continue,
again, continue to study this to figure out whether the laws are adequate, to see what
else — what other safeguards might be put into place and so forth. >> A couple of related questions. “Whose genome was originally sequenced?” And related to that is, “Since everyone’s
genome is different, how did the Human Genome Project choose whose genome to sequence, and
how accurate is it as a reference sequence for the diversity we see in human populations?” And also, “Are there different reference sequence
for different populations for regions of the genome that do vary significantly between
populations?” >> So I think some scientist wrote that one. [Laughter] That was a great question. So there are lots of subelements of that. Let me try to dissect them if — I’ll try
to remember them all. But they allude to one very fundamental truth,
is that, you know, we sequenced the first human genome, I sort of lied. It really wasn’t a human genome that we sequenced. What we sequenced was a reference of three
billion letters. Each of us is really six billion letters. So the first thing is we didn’t really sequence
a human’s genome, we sequence a human genome, if you see the difference. The other thing — the answer is we didn’t
just sequence one person. It ended up that for a variety of just purely
logical things as much as anything else, we sequenced stretches that were from one person,
then the next stretch was from another person, and maybe that next stretch was from that
first person, and the next stretch — it was sort of a quilt, a patchwork, if you will. And so at the end of the day when you say,
“Here is the sequence of the human genome produced by the Human Genome Project,” it
is just a hypothetical person patched together over little bits and pieces from many, many
different people. But it’s three billion letters as a reference,
which is 99 roughly — roughly 99.9 percent — identical to everybody else. So for the purposes of having a reference
it was good enough. One of the subquestions in there was, “Well,
is there more than one reference sequence?” In fact, that’s one of the things being discussed
now, is that what we really need is not just a hypothetical; we need a set of really, really
good reference sequences. Some of these are being generated. And as the question also alluded to, probably
for different individuals from different geographical origins, they’re going to vary a little bit
more, a little bit less depending upon exactly what comparisons you do. And you probably need to have that representation
in the set of references, which is exactly why some of these projects like Thousand Genomes
and HapMap Project I told you about sampled across the world, to try to capture that diversity
and represent that appropriately. So it is an oversimplification to think that
the Human Genome Project produced a sequence of a person. It didn’t. It was just a hypothetical reference of three
billion letters that’s 99-plus percent the same as all humans. >> I’m trying to bundle questions so we get
as many answered as we can in the next — it looks like, two minutes. “Within the NIH and your specific domain,
what is your plan for maintaining continued funding, given the current fiscal climate?” And related to that is “How does the NIH partner
with universities, for-profit business, and other organizations to achieve your research
institute’s goals for knowledge and use of the human genome?” >> So both great questions. What do I do to try to ensure funding? First, I vote. Second of all, I tell all of you to vote. And actually, in all seriousness, if you think
this is way cool, why don’t you go write your Congressperson and your Senators, and tell
them how incredibly important you think this is for the future of human health and well-being? Because at the end of the day we’re completely
dependent upon Congress in particular and the White House making decision about our
levels of funding. It is a very precarious time. It is — I hope I have convinced you that
we are at a very bizarre point in our history, where I hope I convinced you that some of
what I’m describing scientifically is the most exciting things going on ever in terms
of opportunities to improve human health at the very same time that I spend the majority
of my time worrying about our funding. You would think we should be pushing the accelerator
incredibly hard and funding this work at an incredibly fast pace, but in fact, I’m worried
about a fiscal cliff and various other things. So it is a strange time to be living in. But, you know, we’re a democracy here. We need to influence this. So I — you know, I hope all of you could
become excited about this and try to influence the people that make these decisions. The other part of the question was “How do
we interact with universities? How do we interact with companies?” We give grants. There’s, sitting right here, some major grantees
of our institute, some very prominent genomic scientists that work at Penn State. They get money for us to do their work. There’s some of our scientific leaders of
this field. Similarly, companies — for-profit companies
— we give grants to as well. We would not be close to a thousand-dollar
genome right now if we didn’t develop partnerships, and in some case, give grants to for-profit
companies to help operationalize these technologies so we could sequence DNA better and better. So this is all very collaborative, both with
academia and with for-profit companies as well. >> You will never let us down in the really
great questions that you write. I will — we didn’t have time to answer all
these questions, so I will copy them and I will give them to our speaker, who let’s give
a big, warm welcome and warm thank you to Dr. Eric Green. [ Applause ]

1 comment

  1. Great talk. I wonder what role quantum computing could play for genomic medicine, especially since Moore's law could come to an end within the next 8 years. I truly hope, that genomic medicine highly speeds up drug development as well.

Leave a Reply

(*) Required, Your email will not be published