Technology Assisted Reading Acquisition: Children Acquiring Literacy Naturally

DOM MASSARO: Today, I appreciate
all of you coming to the talk. And I’ll try to go through
fairly quickly, so you’ll have time for questions. As Shuman said, I was trained
as an experimental psychologist, majoring in
mathematical psychology. And I developed an information
processing model of perception, memory,
and learning. And I wanted to apply it
to real world domains. So I got interested
in language. And so I’ve been working on
speech perception in reading the last 40 years. And I want to give you this
background primarily as a road map to where I ended up, in the
idea that kids can learn to acquire reading naturally,
without instruction. So I’m going to talk a little
bit about our work on multimodal speech perception,
and then a project we did on embellishing face-to-face
communication. And then talk about naturally
acquired literacy. So as you know, language has
been thought as being special, particularly be led by Noam
Chomsky, that it doesn’t follow the rules of perception
in memory and learning. And we took the opposite view,
that indeed, speech perception and reading can be understood
as prototypical pattern recognition. How to make sense of the
world around us. And so what we got interested
in is how much the face contributes to speech
understanding. So here’s a little film exert
that you can take a look at. OK. So it’s particularly important
that the men of the audience picked that up. As I tell my students, if think
guys miss that, see me after class, because it’s
important for the survival of our species. So to study this problem of
speech perception we use synthetic auditory speech. We wanted to also vary the
visible speech, that we could control it exactly. So we developed a computer
animated talking head called Baldi, spelled with an i, rather
than a y, because he’s from California. So here I’ll let Baldi describe
himself to you. BALDI: I am Baldi, and I am
proud there is very little [INAUDIBLE] my attractive
exterior. See, there is only a wire
frame under me. I live through computer
animation and text-to-speech synthesis. My visible speech is accurate. DOM MASSARO: So we approached
the problem as speech scientists. We wanted to make the
visible speech as accurate as possible. And we develop Baldi that he
could be aligned with either synthetic speech, or
natural speech. We also made Baldi multilingual,
by looking at the unique characteristics of
different languages and programming the appropriate
mouth and face movements in Baldi. BALDI: [SPEAKING MANDARIN] DOM MASSARO: How was
that, Shuman? SHUMAN: Good. DOM MASSARO: So Baldi
can also be aligned with natural speech. BALDI: That’s one small
step for man, one giant leap for mankind. DOM MASSARO: OK. So basically, you can
think of Baldi as a puppet on a set of strings. And we’re manipulating those
strings moment by moment, to produce the acrid visible
speech, such as jaw rotation, mouth opening, lip roll,
and zipping, and so on. BALDI: My movements are
controlled by adjusting the wire frame model at
each time period. My speech is accurate because it
is based on real speakers. And I have a lazy tongue
just like they do. I can be texture mapped with
the image of a real person. Hey, who said I’m not real? See you around. DOM MASSARO: So we also modeled
the tongue, because the tongue is very important. For example, if you have
Japanese trying to learn r and l in English, you don’t see much
on the outside the face, but the tongue makes very
different movements. So we use electropalatography,
in which you put a palate on the roof of the mouth, with
sensors, that picks up where the tongue hits the
roof of the mouth. And also ultrasound picking
up how the tongue moves. BALDI: I have a lovely tongue,
and a groovy palate, with [INAUDIBLE] with tongue and palate. DOM MASSARO: So we know that
in speech production people speak with a lazy tongue. The way we articulate one
segment is influenced by the segments that precede
it or follow it. So you can’t really just use a
key frame method, where you have prototypical mouse
movements, and then you just interpolate between those. So we developed a dominance
constraint. So each control of Baldi has a
certain amount of dominance. And if they have equal
dominance, here, as shown in the upper panel, then you do
a simple interpolation. But if you take some
characteristic of production, like lip protrusion, if you say
the word stew, you notice the lip protrusion of the ew
comes much earlier in time, and influences how you
say the s and the t. So the s and the t in stew is
very different than it is in steep, for example. And so we assign higher
dominance for the protrusion for ew, and therefore, when we
interpolate, we see then, the lip protrusion then comes
forward into the s and t. This is a coarticulation
algorithm that’s been tested in many different experiments,
and has been shown to hold up pretty well, in the sense
that it produces acrid visible speech. Our gold standard here is to
get Baldi to give the same kind of quality of visible
speech that a real talker gives. Now some of you have
experienced the McGurk effect, anybody? OK. So what you want to do here is
you want to look at Baldi and he’s going to say a set of
syllables, like bah, gah, vah, tha, dah, mah. Simply, watch Baldi, keep track
of what you were hear. And then we can talk about it. BALDI: Bah, bah, bah, bah. DOM MASSARO: OK. So Baldi said four syllables. Did you hear this syllable
changing from syllable to syllable? Some of you? Some of you not. It’s pretty small, but in fact,
for those of you that did hear it, the auditory
syllable was always bah, but the mouth was going bah,
vah, tha, dah. So if you were hearing different
syllables, in fact, the visible speech was
influencing what you hear. I can play it one more time. You can either look at it again,
or close your eyes. BALDI: Bah, bah, bah, bah. DOM MASSARO: Now I’ve been
looking at that for much longer than I want to remember,
and I still get the allusion that the visible
speech has an impact. And so based on this, we
developed this multisensory pattern recognition scheme,
called the Fuzzy Logical Model of Perception, in which we have
these very simple stages of process model, where you
evaluate the two sources of information in terms of how
much they support the different alternatives. You integrate them together
to get an overall metric of support. And then you make a decision. And a learning occurs with
feedback, so that you can modify the evaluation
value, given the feedback that you get. And this turns out to be what
we call the Fuzzy Logical Model of Perception in space. And Fuzzy Logic, it turns out
to be, it’s mathematically equivalent to Bayes’ theorem
which is an optimal method for combining multiple sources of information to make a decision. I should mention, if anybody has
any questions, please feel free to interrupt– clarifications and so on. And if I’m going too fast,
or too slow, let me know. OK. So given that we had support for
Baldi having good visible speech, again, compared to our
gold standard, Baldi does almost as well– I won’t show you those data– we thought that there would
be value in Baldi being a virtual tutor. So who better than a bunch of
deaf and hard of hearing kids, that are behind in vocabulary,
simply because they have degraded hearing? And we thought that Baldi
might be a good tutor. So here’s a little part of
a Primetime special that evaluated Baldi’s instruction of
vocabulary for these kids. [VIDEO PLAYBACK] -To see how fast Baldi can work,
Primetime had teachers create several new vocabulary
lessons for Timothy. Words he had never
spoken before. -Let’s talk about
what you see. -First Baldi checks
his knowledge. -Click on the bowling balls. -Then, Baldi shows Timothy
each item correctly. -These are bowling balls. -Followed by a drill of speaking
the words for the very first time. -What is this? -Ola balls. -It’s not easy. -Click on the tennis rackets. No. That’s not right. -Timothy mispronounced
and misidentified nine out of ten objects. But just three weeks later,
we re-tested him. -OK, Timothy, you’re ready
for the final test. What is this? -Soccer ball. -This time, he got nine out
of ten correct, and his pronunciation improved
dramatically. -What it this? -Baseball. [END VIDEO PLAYBACK] OK. So it looked like the kids were
learning vocabulary, but we wanted to make sure that
indeed it was the Baldi intervention that
was responsible. So we did some experiments in
which the kids are learning three sets of words. We’re testing on all three sets
every time, but we’re only training on one set. So the idea is that, if it’s the
Baldi intervention that’s important, then they’ll only
learn that set of words, and not the other sets. This is called a multiple
baseline procedure. And so, sure enough, the dark
squares are comprehension, and the open ones are production. And comprehension is always a
little easier than production. But you can see that, sure
enough, this particular student learns the
Set 1 words, but not the Set 2 words. When they start trading on the
Set 2, they learn those items. And then they learn
it for Set 3. So this is an accepted procedure
that is accepted by peer review journals, that
says that, yes, you’re intervention was responsible. So we were happy with Baldi
being an effective virtual tutor for learning vocabulary. And we looked to the autistic
kid community, and thought the Baldi might help there, too. Autistic kids like constancy,
and Baldi is always constant. His emotion can be controlled
exactly. And we can also teach grammar. Here, we’re teaching singular
versus plural with these autistic kids. And I’ll just give you
a quick look at Tony working with Baldi. [VIDEO PLAYBACK] -Let’s practice. Where are the ladybugs? -Ladybugs. DOM MASSARO: So he’s learning
singular versus plural. And he’s clicking on the right
answer, but he is actually saying them, too. The kids really resonate
to Baldi. And they come in, hi, Baldi. I love you, Baldi, and so on. Sure enough, Baldi was
also effective with the autistic kids. And so more recently,
we decided to port Baldi to a tablet. And here we have Taryn working
with Baldi on a tablet, in a simple tile matching game, where
you match two tiles so they disappear. MALE SPEAKER: Question? DOM MASSARO: Yeah. MALE SPEAKER: Is it true that
autistic kids have a harder time parsing human expression? DOM MASSARO: Yes. Autistic kids tend not
to look at the face. And in fact, we taught kids to
look at the face, and found that they could use that
information and integrate it with the voice, in the same
way that normally developing kids do. So that’s a good point. But they tend not to
look at the face. But we were able to get them
to learn how to lip read better with Baldi, then
they had previously. And sure enough, they could look
like normally developing kids, in terms of integrating
the two sources of information. One of our mantra’s is that
people naturally integrate multiple sources
of information, even autistic kids. So thanks for that. So here’s Taryn working with
the tile matching game. [VIDEO PLAYBACK] I got the ball. -Wonderful. -You see? I found it. -You did, that’s great. -He’s gone. [END VIDEO PLAYBACK] DOM MASSARO: So you
can see that the little game can be engaging. And the nice thing about a
virtual tutor, it’s available all the time. But there is a downside. These programs to write,
are expensive. They require some maintenance. And one could argue, it’s 2D
media, they’re not getting personal interaction. So one of the reasons I talk
about this is because, this kind of intervention is
coming kind of late in the child’s life. And one of my mantra’s is that
we need early intervention– just to anticipate some of
things I’m going to say. OK. So now we’re going to switch
gears to a second project, in terms of embellishing
conversations. And we have one out of ten
people, across the world, deaf and hard of hearing, and we
all lose our hearing, particularly the men, as we
become chronologically gifted. And therefore, we depend
more on the face. But the face give some
information, but it doesn’t give complete information. So for example, you
can get place of articulation from the face. You can see the difference
between the and duh, for example. But you can’t really get things
like voicing so well, like the difference between
buh and puh. So what we thought is that, if
we had a hard of hearing people that were getting
degraded hearing, maybe we could embellish the signal, by
giving visual cues about those things that aren’t seen
so easily on the face. And this would be a wearable
appliance, that people would wear, like a pair of glasses. And if it wouldn’t take much
sophisticated technology. You could just have a couple
of LEDs on the corner of your glasses. You’re looking at the person,
lip reading, getting some degraded hearing, if it’s
available, and then integrating these
cues with it. So we decided to develop
a scheme where we would represent these characteristics of the speech visually. So voicing would be indicated by
a blue dot, frication by a white dot, and nasality
by a red dot. So what we did is, we developed
a neural network model that tracks your speech. So you’d have a microphone
on your eyeglasses. You would be tracking the speech
of the interlocutor you’re having a conversation
with. And then these cues would be
showing on your eyeglasses. And the idea is that you would
be integrating these cues with the face and the voice, to
understand the message. And so, you can see
like, remember red is nasal, like men. So you saw the red and the
blue, which is voicing. And then you can say things like
sys, where you get the frication, the voicing
and the frication. Or men, where you get
the the nasality. OK? Or church. So the neural network actually
does a pretty good job. It’s doing it in real time. We’re only lagging by, these are
10 milliseconds steps, so we’re only lagging the speech
signal by about 50 milliseconds. And it does about 90% correct. And so, that’s one half
of the problem. The other half of the problem
is for people to learn the cues. So if you see fan, OK, if you
just see it on the lips, then that could be van because you
don’t see the voicing so well. But if you get to cue
that it’s voiced– let’s see if this works,
live demos– so if you don’t get the white
cue then you’re OK. Fan so I saw it went
white, blue, white. Where if I say van, then
you see, you don’t get the white cue. So that could tell you
the difference between fan and van. So you just have to put those
two things together. So what we want to do
is teach people. And we didn’t have a whole
population of people, deaf and hard of hearing people, so we
depended on university kids that came in every day, or maybe
three or four times a week, for an hour or so a day,
in which they practiced integrating these cues
with the face. So Baldi would mouth a
word with the cues. The student would write and
indicate what they perceived, and then Baldi would
give feedback. All right, so these kids
were very heroic. They came in for over a year,
every day, and were learning the cues. And they did a pretty
good job. But what we found is, they did
pretty well with single words, or maybe even two or three word
phrases, but they were lost in continuous
conversation. Think of trying to track
what I’m saying with all these cues. That’s a tough one. So we also made the observation
that it’s hard to change behavior. Now we know, first of all,
people that require hearing aids, they have very
little patience. Most of them throw them away. Don’t really use them. You probably have experience
with people who have done this. And similarly, we thought,
there’s no way we can convince people to spend a year
learning these cues that might help them. And the other thing is, as you
can see, when we don’t hear something so well, our natural
tendency is to move our ear to the source and lose the
visual information. And so this woman who has
hearing aids is gaining about three decibel of understanding,
but in fact, if she were looking at
the face she would gain about 12 decimals. So how do you teach
people to do that? It’s hard to change behavior. So this is another lesson where
I’m going with, in terms of the naturally acquired
reading. So what we did in this project
was that we thought, well OK, why don’t we just do full blown
speech recognition, and then communicate that way. So the hard of hearing person
can get all of the cues of you talking, and then also
get the words. So they can get the
paralinguistic information, and the linguistic information,
and then have a conversation. [VIDEO PLAYBACK] -Do you have any plans to
enjoy the nice weather? -Not today. [END VIDEO PLAYBACK] OK. So the idea then, again, if I’m
talking to someone that’s hard of hearing, I can ask them
a question, and we all know about speech recognition,
open ended alternatives and so on, but do you know if there’s
a Starbucks nearby in this neighborhood? So, not bad. Did you see the frost on
your roof last night? So you can do a pretty
good job. Now this recognizer
only runs locally. If you had access to the
internet it could do a lot better, and it could be faster
too, simply because, if you know about speech recognition,
it has a much bigger database, and more computational
power, and so on. So this is where we ended up on
this project, where we’re doing the full blown
speech recognition. OK. So we’re making pretty
good time. In fact, maybe I’m
going too fast. OK. So here’s what you all came
for, I guess, and that is, again, I wanted to tell these
stories because I want to show you how this took me
in this direction. I should have shown some
research, also, that we’ve done in reading, just in the
same way that we did in speech perception. That people integrate multiple
sources of information in reading, like putting together
information about the letters themselves, the orthographic
structure, that is how the letters go together in words,
and syntactic and semantic constraints, for example. And so about three years ago,
I arrived at this idea that kids are immersed in spoken
language at birth, why can’t we immerse them in written
language at birth? This has never happened because
we haven’t had the technology, but the technology
is getting there. Why don’t we immerse the kids in
written language at birth? And the idea is that they’ll
learn to read naturally, without instruction. And this has huge implications
for the way society is structured, today. What are we up against here? Well the current belief is that
speech and language, as I said, are very special things. And that they’re more or
less like instincts. Whereas reading is artificial. It’s artificial because it’s was
created a couple thousand years ago, and we created
it, rather than some extraterrestrial influence,
or something. And Maryanne Wolfe, here,
represents the neuroscience community, when she says in your
book, here, “Unlike its component parts such as vision
and speech, which are genetically organized, reading
has no direct genetic program passing it on to future
generations.” So there’s something special
about reading that is artificial. And speech is natural. And so that’s what
we’re up against. Maybe I can have someone from
the audience, would you be willing to participate? So what I want you to do
is, there are going to be a set of pictures. I want you to name the object
in the picture. OK? And just go from left to right,
across the two rows. FEMALE SPEAKER: You want
me to name the picture? DOM MASSARO: Name the picture. FEMALE SPEAKER: Tree, book,
shoe, nest, eggs, baby, rabbit, ring. DOM MASSARO: OK. Now you can do the next one. FEMALE SPEAKER: Tree,
book, shoes. DOM MASSARO: This is like the
Stroop Effect, right? Where you’re trying to name the
color of the print when it’s spelled in words of
a different color. Right? So it makes a point that we did
learn to read, but we do, once you learn how to read,
you can’t help but read. That’s why advertising
is so effective. And that is just one more
similarity with speech. We can’t help but hear. If someone mentions our name,
we can’t help but orient our attention to it. So the way I thought about the
problem is, what’s needed for a child to acquire
spoken language? And does that same child have
that same stuff to acquire written language? And so for spoken language,
the child’s got to do some kind of signal analysis. They have to hear the syllables,
combine the syllables in different orders,
form categories that are associated with meaning, and
most importantly, they need an early exposure. We know from those few sad cases
where kids haven’t had language until adolescence or
even six or seven years old, they can’t acquire language. And what’s really impressive,
you don’t know this, and maybe this is why you’re tired in the
evening if you have kids, in a given year, say from one
to two years old, a child hears about 1,000 hours of
speech, which is about a million words. So that’s a lot. Right? So they need that early
exposure, and they need a lot of it. Now what’s not needed? Well some of you may have heard
of the Theory of Mind. This is something that kids
don’t get till about 3 years old, and that is that they
have some kind of understanding that they are
them, you are you, and you can engage in something
like a dialogue. And you might have different
beliefs than they have. And that you can change
each other’s beliefs. Well that’s not necessary
to learn language. Because by age 3 kids are
incredibly sophisticated language users, even
though they don’t have a Theory of Mind. As a car talk guys might say,
it’s unencumbered by the thought process. So the kids acquire the language
without thought. Now one convincing piece of data
was, you all have heard of Kanzi, who can do amazing
things with language. In fact, how did Kanzi learn
to manipulate these symbols that were associated with
speech, and eventually learn how to understand speech? Well here’s Kanzi on your
mother’s knee, Matata, while they were teaching Matata
these symbols. So they spent a year or so doing
very formal instruction of the adult, Matata, to learn
these symbols, and Kanzi was just there nursing and having
a good time, and so on. Well it turns out that Matata
never learned, and then simply one day, some serendipitous
discovery, they found out that Kanzi had learned all
of the symbols. So that’s pretty wild. Not much is made of that, in
that literature, but I see a really importance here. And this agrees with the idea
that there’s this incredibly explosive brain development that
occurs in the first few years of life. That the brain is getting
bigger, it’s making more connections, pruning connections
that aren’t meaningful. And this is another support
for the idea that it’s important to get in
there early, when the brain is plastic. Now all these companies are
telling us that our brain is plastic through our lifetime,
and we could learn, and lower your chronological
age, and so on. But I’m kind of skeptical
of that. It’s a little plastic, but not
like what the young kids bring to the table. OK, so that’s more or less
what’s needed for speech perception. And we can ask similar questions
about reading. So for reading, we have to do
some kind of signal analysis. We have to learn letters. Combine the letters. Associate those combinations
with particular categories. And, like speech, we need
early exposure. And we need some kind of
language and reading immersion in the same way that we have
it in spoken language. Well how many hours
do we need? We don’t know. But we know, right
now, the kids are getting next to nothing. And I’ll expand on that. First of all, in terms of the
signal analysis, babies come equipped with vision
that’s very sophisticated, very early on. Here’s a baby at 3 weeks old,
where you can see her making the sciatic eye movements,
tracking the toy. That’s at just 3 weeks. A week earlier she
couldn’t do it. Also their visual acuity
is very good. So that at one month, they’re
real good at arm’s distance, looking at you, but by eight
months, they pretty much have the vision that we have. So they can do about as well
on the eye chart as we can. So the babies seem to be
equipped to process visual information. And sure enough, they
can form categories. So at one-month-old, infants can
see the difference between a square and a triangle. Whereas, at one-month-old they
can’t see the difference between the square and the
triangle if they’re embedded in a circle. But at two months they’re able
to make that distinction. So again, there’s this rapid
development of what’s necessary to form categories. And it turns out
that babies are incredible association engines. They learn statistical
constraints like nothing. And developmental psychologists
have made a cottage industry of this
behavior, showing that kids can very quickly learn
associations in speech, in music, in objects. So a typical experiment might
be the experimenter sets up probabilistic constraints
among objects. Here you can think of these as
pairs of objects, three pairs of objects. One object always follows
the other object. But between pairs they follow
each other only a third of the time. So there’s a set of
constraints there. If you take a seven-month-old
infant, and you give them two minutes of this sequence, they
get bored out of their mind. They habituate. You change the statistical
properties of that sequence, the objects are still the same,
the infants wake up. This is how we can tell what
infants know about the world around them. They get bored very easily, we
change the world, they wake up, and we assume that they
noticed the difference. So indeed, they can do this. Pretty impressive. And what’s so interesting,
again, even though the developmental psychologists
have made this a cottage industry, they never thought
of studying letters, and letter combinations. Somehow reading seems to be
off the map for them. So we propose an experiment
of the following kind. That we use the same kind of
constraints, but now we do it in letters, rather
than in objects. And the hypothesis would
be that indeed, kids would learn this. So maybe someday someone gets
some funding to do it, so we start learning about how kids
process written language. Now we think that kids are going
to be able to do this, it’s going to be a no-brainer. Why? It turns out, it seems that,
if you do a topographical analysis of the alphabet and
alphabets of the world, they have the same characteristics
of the world around us, whether it’s a geometric
architectural world, or whether it’s a pastoral world,
they seem to have the same properties. So the argument by Changizi
there, is that the alphabets actually developed, not because
they were easy to write, but rather, to make it
easy for the visual system that’s already prepared for that
kind of information in the environment. So as I mentioned, then, we
want to have this critical period of development. We see it’s true in the auditory
system, the visual system, speech and
sign language. And we’re saying that in reading
it’s the same thing. That we want to have reading
being learned at this critical period of development. Obviously, it doesn’t have
sharp boundaries. But the fact is, the earlier
you can get in, the better. So how do we immerse kids
in written language? As I said, this has never been
done because we really haven’t had the technology. And we still don’t have it,
but we can have successive approximations. So we can think of
picture books. Picture books are really
important for kids, and their acculturation in the world. And we all read picture
books to our kids. And it turns out that you
think, oh picture books, there’s writing in
picture books. Right? Look at the writing. Well obviously, when you take
eye movements of kids in picture book reading, 95% of
the time they’re looking at the pictures and
not the words. And as you can see by the graph
there that the artist makes the fonts real funny,
and so on, to please the adult’s reading, not for
the kids to read. So we developed this app where
we put in all of the popular books that we could find. We put them into our app. And so the caregiver can choose
a book that they’re reading, and then supplement it
with nice, visual letters that the kids can
really easily. So here, I chose Barbapapa,
Barbapapa at the Zoo, and you could be reading along
and the kid would be looking at the book. When they were having a party
after the fire, Barbapapa heard cries for help. And then a fierce leopard had
escaped from the zoo. And so what you can do is, you
can then dictate this, a fierce leopard had escaped
from the zoo. [COMPUTER REPLAY] A fierce leopard had escaped
from the zoo. DOM MASSARO: So now the child’s
getting nice, written language, that supplements the
picture book reading, and they should then get some of this
written language that they need for the development
of learning to read. When we have all the books in
memory, and we know what book the caregiver is reading,
it makes speech recognition a lot easier. But someone can’t really
read our book without having the real book. So we’re not really hurting
any copyright. So here’s another little one. So you can see, it was kind of
difficult for me to negotiate holding the book,
and the iPad. So there’s these nice T-shirts
you can buy, where you can put your tablet right in here. And then it’s easier
to negotiate. So there’s Keegan looking
at the words. And you might wonder about this
format that we’re using, where one word occurs on
top of one another. But in fact, this is called
Rapid Serial Visual Presentation. And psychologists use it
a lot in experiments. It turns out that kids, even
third graders, do better with this Rapid Serial Visual
Presentation format, than they do with a page format. And again, we grew up reading
at our own pace. And so we wouldn’t like that. But in fact, just like we’re
pushed in listening to speech, we can be pushed into reading. And we can actually then, read
faster, more words per minute with better comprehension, in
this kind of presentation, than the standard
presentation. Of course, as you could see,
this gives the visual quality much more value to the kids,
because they get nice big words, and it wouldn’t work very
well in a page format. So here is Nathaniel’s mom,
reading Goodnight Moon. He’s– what is he four-months-old,
or is he eight months? I’m sorry. [VIDEO PLAYBACK] -Goodnight kittens and
goodnight mittens. [END VIDEO PLAYBACK] You see he’s really looking. And he does pick up on it. And the thing that we don’t
realize is that we’re talking to our kids all the time,
and they don’t understand a damned thing. MALE SPEAKER: That
keeps going on. DOM MASSARO: And
it gets worse. It gets worse. So we’re looking at other ways
to embellish the written world for the child. And one is Write My World, where
you can then write what the child is experiencing. [VIDEO PLAYBACK] -Car racing. See? Vroom, vroom, vroom. Car racing. See Keegan, look. Keegan, car racing. [END VIDEO PLAYBACK] So people, you might think
that’s kind of artificial, but in kids learning sign language,
the caregivers are faced with the same problem. They have to get the attention
of the kids– it’s a little different
from speech– to watch the signs. So the caregivers might actually
sign over the object, or get their attention, look
at me, and do the signing. And kids learn sign just
as easily as they learn spoken language. So this other application we
have here, is that the child would carry around a camera
and zero in on the world around them. And we have bar codes, so here
we can distinguish 24 different things with
these bar codes. And then we can present
whatever we want. We can talk about it, we can
show the visual information, and we can also present it. So it’s always right on top
of the object, taking into account the camera angle. So that’s another way,
the child could walk around and do this. Well so here at Google, you’ve
got your ideas about Interactive Digital Signage, so
that eventually we’re going to be surrounded with
a literate world. So there’s no reason that that
digital world can’t be available to young kids, as it
is available to our adults. So that’s certainly indicating
that the technology is getting there, in terms of
having the child immersed in written language. And of course kids
can have robots. The European community is– looks like they’re going to fund
a $10 billion project on robots as companions. And there’s no reason why these
companions can’t provide written language, as well a
spoken language to kids. So here’s a little concept video
that was done in 2009, about a Danish farmer that
goes out with these intelligent glasses, goes
out to his barn. And he looks around the barn. And of course, it’s doing
object recognition. And he sees that the roof needs
repair here, and this cow has to stay on medicine
for another two weeks, and so on. And so these are very
valuable things. And he comes in the house and
the recognition system, of course, recognizes his spouse
and tells him, hey, tomorrow’s your anniversary, you better
get a present. So in our patent application
I proposed this heads up display, where there are two
things that you want to do. You need to understand the
experience of the child and represent that in written
language. So there’s two basic
ways to do that. One is to do speech
recognition. So the speech that’s being said
is very predictive of the child’s experience. And second, is to do object and
action recognition, where recognizing what the child’s
doing is another idea of what they’re experiencing. And both of those would
be associated with written language. So the caregivers says,
you did a fine job. The heads up display might
just say fine job. So it would read it. So you all know about Google
Glass, and the point here is that the wrong person
has it on. The baby should be wearing
the glasses that give the information about the mother. So although she put some toy
glasses on the baby here, to take a picture, it’s really the
baby that should be seeing what’s going on in the world. Eventually it’ll all be on a
contact lens, so that’ll make it even easier. But for now, we just have these
portable tablets that kids seem to be attracted
to, naturally. They love the touch system. And I was telling Shuman that
they could do something like a swipe on the tablet, to interact
with the written language that way. So we’re winding down here. What are the benefits
of early reading? Well the idea would be that
the illiteracy would be no more frequent than speech
impairment would be now. Whereas now, it’s obviously
much more frequent. It would reduce the cost of
reading instruction, and there are advantages of written
language. We did an analysis of the kind
of language you find in picture books, versus the kind
of language adults have when they talk to each other. And we found that the language
in picture books was much richer in being at
a higher pitch. A more unique vocabulary,
more complicated grammar, and so on. So it’s good to get kids
reading as quickly as possible, because they’re going
to be faced with more difficult language, that
only can be beneficial. So if we’re successful, then
this is going to change how we allocate resources. It’s going to have incredible
implications for the deaf and hard of hearing community,
and it allows us to rethink schooling. So right now, if you look at the
public spending on kids as they go through them birth to
adulthood, versus their brain growth, you see this
inverse function. So we spend all our money on
kids after they go to school. Whereas, the brain growth
all occurred before they go to school. So we would need to have a
realignment of some public spending, before schooling,
rather than after schooling. And sure enough, our Nobel Prize
winner, James Heckman showed that the return on
investment is much greater when you invest in preschool
kids, relative to school age, or adult kids. So you get a good return
on investment there. For deaf kids, most deaf kids
read at a fourth grade level. That’s primarily because 19 out
of 20 kids that are born deaf are born to
hearing adults. And the adults want to hear them
talk, so these kids don’t learn sign language
very quickly. With our method, these kids
could be bootstrapped with written language, so that
would be their second language, in addition
to either oral language or sign language. So the written language could
be a real great modality to bootstrap kids that are deaf
and hard of hearing. And then finally, we can
envision schooling after kids are ready for reading at the
age at which they would normally go to school. And so rather than we spend
something, the way I figured it out, we seem to spend
about $10,000 per kid, for a year of schooling. So we could save
a lot of money. And we have the three R’s
reading and writing, the children would have
both of those. So it would be something that
could help the economy a lot. But it would also allow us to
think of schools as Dewey did, where we would have communities
of scholars with certain interests, and they
would congregate together and pursue their interests, rather
than sitting in a desk and learning a litany. So this is the conclusion I
usually give to the groups I talk to, because behavioral and
social sciences is still pretty conservative. So I don’t have to give that
conclusion here, that it’s clear that science and
technology impact life. And that we have to be open
to disruptive ideas. So I’m happy to entertain
questions now. And we can open floor
to you all. Thank you. [APPLAUSE] MALE SPEAKER: When kids are
small, you showed the brain growth and importance
of speech. The– DOM MASSARO: Wrong time
to get a call. MALE SPEAKER: I’ve got a brand
new phone and it’s [INAUDIBLE] or anything. So I have my son who didn’t
speak until he was three. So there was a lot of time
he didn’t speak. So we really couldn’t
communicate. So the other side of the reading
early thing, for me, would be to be able to
communicate effectively with my son, through reading,
as opposed to speech. Hopefully bootstrap the speech,
or maybe get at the early thing. So have you thought of it, in
sort of the reverse way? DOM MASSARO: That’s
a nice idea. Certainly, one of the reasons
that we try to make a behavioral science case for
learning to read, in the same way that you learn to understand
spoken language, and one of the things you see
are kids that are late talkers, so-called
late talkers. Your son was one of these, could
understand hundreds of words but didn’t speak. MALE SPEAKER: We have
no idea if he understood a single word. DOM MASSARO: But once he started
speaking, did he? MALE SPEAKER: Yeah, but it took
2 and 1/2 to 2 and 3/4 years to get him to speak. DOM MASSARO: To get
him to speak. But the point is, he
did understand. MALE SPEAKER: We actually
believe that he hadn’t figured out that that crazy noise
you hear, actually had meaning to it. He just thought it
was just noise. DOM MASSARO: I can talk more
about that lately. There’s a scale you can fill out
that says how many words your kids comprehend, versus
how many they produce. And kids always comprehend much
more than they produce. And that’s a normal
trajectory. And so the point would
be for reading, that could be also the case. That kids could be reading
much more than they are able to write. But one alternative that people
have chosen, in similar situations, are baby signs. That babies are able to make
signs that they use to communicate. So we can talk more
about that, later. FEMALE SPEAKER: While far less
sophisticated or consistent, Sesame Street would frequently
show the words flashing with the object in precisely
how you have it. As the word is spoken you see
the word flashed, and the picture at the same time. Are there any studies that
showed the effectiveness of that on literacy, for children
who watched Sesame Street, versus those who didn’t? DOM MASSARO: Yeah. That’s a good question. I don’t know of any,
but that would be something to look into. So we had that question mark. We don’t know how much written
language the kids need. And so the few times that, today
the show is brought you by the letter L, that’s
probably not enough. But it’s a start. And it would be nice
to determine that. There are couple anecdotes. The gentleman mentioned in
India they’re showing the written language, subtitles,
with the spoken language, to helping in literacy. When I visited Denmark, I was
impressed that they show Sesame Street without dubbing. So the kids are hearing
English, but they show Danish subtitles. So my idea was, hey,
these kids want to learn to read Danish. Because they want to understand
what the hell’s going on in Sesame Street. So that’s kind of neat thing. I don’t know of anything
systematic that’s been done with that. People have looked at picture
books, and seeing what effect that has. Picture books has a
big effect on the language, but not literacy. Because again, the kids simply
aren’t looking at the words. And there’s some controversy
about to what extent kids can learn from 2D media. But obviously, reading is 2D
media, so I think that’s a no-brainer. Of course they can learn. Yeah? MALE SPEAKER: Have you
experimented with teaching kids phonics using
the flash method? DOM MASSARO: OK, so many years
ago, we wrote a paper that said the trick about, how do
you teach a kid to read by today’s schooling? Well what you do is teach
them how to decode. What does decode mean? It means that they’re able to
map the written language into spoken language. And the way you do that is that
you teach them phonics. And so a colleague of mine just
shared an anecdote with his granddaughter
the other day. She had a picture of a cat,
and the word was written underneath. And she went cah, uh,
ca, ca, t, cat. And then the next picture came
on, it was an insect. And she went buh uh ug, bug. Except it said ant. OK? So that’s a great anecdote. One of the things we see we
wrote, in 1979, was that one of the benefits of phonics might
not be with respect to decoding, but drawing the
kids attention to the orthographic structure. That means the spelling
constraints in the language. What letters follow other
letters, and where they occur in words. And we showed, that indeed,
people are sensitive to these constraints, even though
they have nothing to do with spoken language. There’s just certain
constraints in the written language. Some are dependent on spoken
language, and some aren’t. But we’re also sensitive
to those. So our ideas is that written
language has the same constraints, and a deaf child
could learn written language independently of spoken
language. When we heard the picture book
reading, we have an option, you can either have the
voice on or not. You heard it on. And that could help bootstrap
the child if she already knows the speech, but it doesn’t
have to be on. And the idea is that the deaf
child could learn it that way. So get back to the original
question– I’m sorry about the long
answer– is that we think that the child can decode into spoken
language, and then it’s a no-brainer. Right? They can understand the
spoken language. But in fact, I’m not so
sure that’s the case. Even if they do decode
successfully, it could be the case that that decoding takes
so much attention and cognitive processing it
distracts them away from the understanding of the message. And therefore, if we had the
early literacy, they would comprehend the written language
directly, without going through the
spoken language. And not needing the
decoding process. MALE SPEAKER: Does that mean if
you start with very young infants you want to
start showing them words, not letters? DOM MASSARO: That’s right. So you’re not going to teach
them the alphabet. MALE SPEAKER: Just skip
the alphabet? DOM MASSARO: Yeah. And it will be learned
naturally. With the idea that, as we saw
here, and let me tell you about the baboon study. These investigators in France
just did this study. They had these baboons
in an open play area. They could go up to the computer
anytime they wanted. They probably were a little
hungry, that they could get this food. And they saw four
letter strings. And some of those letters, half
of those latest strings were words, and the other half
were non-words that were composed of the same letters,
but they were not words, so they had different orthographic
structure. These baboons were able to
classify these items as words versus non-words. They didn’t know the meaning,
but the point is, they were using the structure to
discriminate those two. What letters combination occur
in what positions, and so on. So they’re picking up
these constraints. So the idea is, that when kids
see whole words, they’re going to learn about the individual
letters, because they are separate objects, and to learn
about them in the constraints. FEMALE SPEAKER: In that example,
did the whole words bring food, but the
non-words didn’t? DOM MASSARO: No, the correct
answer brought food. They have two categories. I’m sorry, I didn’t explain
it clearly. They said word or non-word,
by two levers. And when they were right
they got food. Yes, sorry, you were first. MALE SPEAKER: So there’s
obviously differences in languages. For example, in Chinese you
learn symbol in some sense anyway, rather than
the pronunciation. Right? DOM MASSARO: Well like Shuman
will tell you, the Chinese, you’re really teaching them
something like Pinion, first, where they’re getting something closer to an alphabet. OK? So again, in my system, I don’t
think I’d want to throw characters at the kids. But rather, maybe something
like Pinion. But like Shuman said earlier,
kids can learn anything. And maybe they could learn
the characters too. But there’s no reason why we
would have to give them the characters. We could give them the Pinion
and eventually they would learn the characters. MALE SPEAKER: That’s what I’m
saying, other languages, of course, it works better to learn
than the characters. For example, in Spanish, or
somewhere, pronunciation and the actual letters are much
better matched than in English, where a u can be
pronounced very differently. DOM MASSARO: Yeah, so that’s a
whole nother dimension of the reading wars, of whether a nice
direct mapping between the written language and the
spoken language makes you better reader? And there’s not much evidence
that it really does. And again, my argument would be
that you’re going to pick up the structure of the written
language regardless of how it’s mapped into the
spoken language. So it could be a deep
orthography or a surface orthography. It doesn’t matter. Good question. MALE SPEAKER: So I know that– my son had trouble
reading, too. But the big thing about reading
is, once you get it, you get it. And you might get it a couple
years late, but once you know how to read, you know
how to read. And you’ve made your vocabulary,
but you actually know how to read. Is that somehow, true
for speech, too? Once you know how to speak,
you know how to speak? There’s actually, for adult
readers, there’s a thing, you say, you’re an adult reader. And, bang! You can be 40 years older, or
15, and you haven’t changed. DOM MASSARO: Yeah. That’s actually a
good question. Because, I don’t know how
easily we can test it. Because, again, most of us learn
to speak very quickly. And so it’s hard to know, and
say, now you can understand the language. Right? And reading, again,
you’re waiting until they go to school. And then they learn
the decoding. And by fourth grade, then, the
teachers are saying, yeah, they decode, but they can’t
really comprehend. So again, I don’t know
when you say that someone is a reader. That might be they
do the decoding. MALE SPEAKER: It’s just like
adult readers, you can actually read things. So they did [INAUDIBLE] and
try to make sure that you actually achieve
adult reading. DOM MASSARO: I think, again,
it’s a gradual process, in the sense that the material
you’re reading. So as I mentioned the, reading
material’s much more demanding than spoken language material. And so that plays a
big role in it. There are websites where you
can put in a passage and it will tell you the
reading level. And you can see that reading
levels can differ a lot. I’m sorry, we’re going to
have to close here, and I can talk offline. SHUMAN: Let’s thank Dom
for his [INAUDIBLE]. [APPLAUSE]

2 comments

  1. Naturally… with an iPad. How come children "acquired" literacy for thousands of years without having to pay money to a company that promotes slave labor in China and now, for some unclear reason, are supposed to need it? Please help protect young people from psychologists greed.

  2. Excellent talk and good questions answers and ideas. I do find that the presence of flash cards early reading and content was very good when i was young. The reading makes you remember it in your brain structures giving you more to ponder about and you end up coming to conclusions easier, and learning to separate things. And learning the difference of opinion. Funding this should be prioritized for american schools. and then math and science focus with more reading

Leave a Reply

(*) Required, Your email will not be published