Share this video
In this age of abundance, algorithms are emerging as the great enabler and tool for discovery. Searching for the right movie, piece of music or headline news is a mathematical equation away – one that is already sophisticated enough to feed off your mood, activity and time of day. This session explores both the opportunity and the pitfalls of algorithms in helping us find the needles in the proverbial haystack.
How do algorithms shape our world? It turns out that you are actually training the algorithm…or is it training you? Apparently, with the current advances in machine learning, both are true. The best deep-learning algorithms now combine content filtering and collaborative filtering for a hybrid method that provides a massive advantage in companies’ understanding of the consumer. For instance, when you consistently click on the second item in a list, rather than the first one, say, you’re training the algorithm in the process called collaborative filtering. In this video, information and communication technology experts discuss the intersections between humans, algorithms, and regulations. The relationship between algorithms and policy – such as Canadian content policy – is now dependent on ‘filter bubbles’ in the social network. Is there an increasing loss of diversity as a result of the ownership of content-distribution platforms? Maybe there’s a role for the algorithm, and a role for government and corporations in promoting Canadian content.
“[We need] to be specific about the influence of algorithms…and try to understand how humans and algorithms relate both in terms of teams working to design what’s trending, while also simultaneously how humans and public interact with these algorithms…What exactly is the role of algorithms in an age of discoverability?”
“When [we’re] talking about algorithms, there are a lot of different approaches. It’s not just one-size-fits-all. Each has their own flavor, and I think that has a big impact on the kinds of content that we’re going to be receiving, taking in in the future.”
“You ended up getting a fascinating…fish-net effect, where aggregate human behavior was actually shaping the statistics and shaping which books were actually going to be seen and presented [in a recommender system]. It’s a two-way street…there’s the intent of the data scientist and the computer scientist in constructing the algorithm, and then there’s what all of you folks do to them…Some of the best algorithms learn, and they learn [from] measurements that you’re putting in. When you choose to click on the second item in a list, as opposed to the third item or the first item, you’re training it.”
“A lot of these [algorithms] are working from correlation rather than causation. Just looking at a bunch of data, you can find weird connections, like three days before tornado hits, Target sells more strawberry Pop Tarts.”
“We’re still stuck in the bubble, except that instead of [the trend] being decided by the news agencies, it’s algorithms who do it.”
“Is there a role for the government or big corporation in Canada to play in promoting Canadian content? I think this is a very important question, and…if we leave it up to algorithms, we may not like the result in the end.”
Fenwick: Okay, everybody, if we could begin. I’d like to thank you all today for attending our panel, Algorithms or How Content Finds You. Before we begin, we have a short video so if we could just dim the lights in this comfortable auditorium, we can transition to the movie.
Video: We live in a world of choice, content is available everywhere, on so many platforms…
Video: In this digital era, we are overwhelmed with content on a variety of platforms…
Video: Being able to find this content is a challenge……not just here, but around the world.
Video: The challenge is to find content in a digital environment that is constantly changing…
Video: …and more often than not, they’re doing it through access versus ownership.
Video: Users are now faster than TV…
Video: Users are now faster than TV…
Video: The age of abundance turned our world upside-down…
Video: How did we get here?
Video: How did we get here?
Video: The post-TV era, is made of disruption…
Video: And innovation…
Video: Millennials… are commanding change in the digital space.
Video: Forcing change in the digital space.
Video: What can analytics tell creators? In the digital era…
Video: Find your content…
Video: Find your content…
Video: Or let it find you?
Video: Or let it find you?
Video: Future is now…
Video: Future is now…
Video: And now, how do we move from discoverability… To discovered?
Fenwick: Okay. Welcome everybody to our panel on Algorithms or How Content Discovers You. I’m the moderator for this panel. My name is Fenwick McKelvey. I’m an assistant professor at Concordia University in the Communication Studies Department where I study, quite broadly, information and communication technology policy. I’m joined by three very esteemed panelists. I’m very happy to be able to be in conversation with them today. On my immediate left, is Christopher Berry, senior manager in Product Intelligence at CBC. To benefit Canadians, his team turns data into product and enables others to do so. Previously he co-founded Authintic.
Fenwick: Okay, I’m very bad at pronunciation. A social authentication analytics company developed product at Synapsis, and co-founded the marketing science department at the digital agency Critical Mass.
Next, we have Daniel Lemire who is a computer science professor at the University of Quebec. He has a BSC and a MSC and a PhD in Engineering Mathematics at the Ecole Polytechnique at the Université de Montréal. He has written over 45 peer-reviewed publications including 25 journals. He has held competitive research grants and he has been an invited expert to appear on many panels including for the NSERC and the FQRNT. His open source software has been used by major corporations such as Google and Facebook. His research interests include databases, information retrieval and high-performance programming. Clearly, someone else who has much expertise to bring to this conversation. Then finally, we have Dylan Reibling?
Fenwick: Reibling. I was always going to do that. He’s a filmmaker who lives and works in Toronto. He works across several genres, from documentary to drama. His documentary work focuses on science, technology, and investigative work. His most recent film, Looking for Mike, just recently was premiered at CBC, led to the solving of a 23-year-old missing person’s case. In his spare time, Dylan works in interactive art, using digital projection technologies to explore cinematic forms in non-cinematic context.
I’d like you to join me in welcoming our panelists. I’ll begin our conversation broadly speaking about the relationship between algorithms and discoverability. For those who’ve read this CMF report, they’ll know that it’s one important industrial lever influencing how content is discovered by Canadians.
It’s an interesting time for this panel on algorithms because we’ve recently discovered that in United States, the recent Facebook controversy of the news trending team has suddenly told us that algorithms aren’t really important. Then, it’s humans really that we should be paying attention to. With that, I think we can adjourn our panel and we’ll leave it at that.
The truth of it is I felt like this was a great beginning to our conversation because it encourages us to specific about that influence of algorithms, what particularly they’re doing and trying to understand how humans and algorithms relate both in terms of teams working to design what’s trending while also simultaneously how humans and public interact with these algorithms.
I felt we could kick it off by just beginning by trying to qualify what particularly is the role of algorithms in an age of discoverability? When we are prepping a little bit, Dylan, you mentioned the story, the difference between Netflix and Amazon.
Dylan: Amazon. In terms of-
Fenwick: Yeah. You could begin because I thought that might be good introduction to some of these themes here.
Dylan: Yeah. Just to give a little bit more context, I’m a filmmaker so I work within the cultural industry. I wear the hat but I’m also, for the past year, I’ve been developing this project with the NFB, an interactive documentary about algorithms. I’ve been doing lots of deep diving to literature and talking to people about algorithms. I’m examining it from the ethics of algorithms and trying to get the flavor, trying to get the soul of the machine.
One of the things that really stuck out was prepping for this panel, someone … The challenge is how do you describe an algorithm because it’s like a series of mathematical steps or procedural steps. Someone likened recommendation engines, they sort of talked about … I think it was Pedro Dominguez from the Master Algorithm, he had talked about the difference between the Amazon algorithm for recommendation, which if you think about it like a bookstore, Amazon is, they want to welcome you into the store and then show you the bestsellers. “Hey, you like that book? Well, here let me show the second shelf.” They want to upfront all the new titles, all the popular titles.
Whereas Netflix, their system is a lot different. They’ve got their big flashy titles at the store front and they welcome you into the store but how their model works is it’s more like, “Hey, come check out our big flashy titles upfront and then while you’re here, come on and check out, you know we’ve got these old weird books in the back. I think I’ve got just the right book for you.”
When talking about algorithms, there are a lot of different approaches. It’s not just one size fits all. Each have their own flavour and I think that has a big impact on the kinds of content that we’re going to be receiving, taking in in the future.
Fenwick: Daniel, one of the piece… Before the panel, I tweeted one of the pieces that Daniel wrote as an introduction to some of these themes. If you hadn’t had the chance to read it, I really encourage it, but you also picked up on some of the ways that Amazon works. I was wondering if you could just elaborate a little bit more on the Amazon algorithm and its relationship to collaborative filtering.
Daniel: Right. You described recommender systems as being procedural. The idea is that this is a clear logic, but that’s not … I think you could describe it a little bit differently. When Amazon came up with its famous recommender system where it says, “If you like this book, you’re going to like these books.” This was patented in 1998 by Greg Linden and others. How it works is essentially the algorithm itself is very easy to understand but it’s eating a lot of data about our behavior. It’s more like statistics really than algorithm. Really, we call this panel, a panel about algorithms but it could be called statistics panel, too.
It’s a lot harder than to understand what’s going on because you got massive amount of data to cipher through. If you don’t have this data and you just look at the algorithm, you cannot predict what it’s going to do.
Christopher: In fact, in a 2009 paper on Amazon, a marketing scientist looked at how people were using those recommendations and how people are actually training that algorithm. It turns out that one of the biggest factors in terms of buying a recommended book was trying to find a price point that brought you just slightly over the free shipping limit.
You ended up getting as fascinating, the author called it a fish net effect where aggregate human behavior was actually shaping the statistics and shaping which books were actually going to be seen and presented. It’s a two-way street that there’s the intent of the data scientist and the computer scientist in constructing the algorithm and then there’s what all of you folks do to them. We’ve seen some pretty horrible things that you’ve done in aggregate. We’ve also seen some beautiful things that you’ve done in aggregate.
Daniel: It’s interesting because in response to some of these, trying to think about the relation … The term algorithm itself is one that become … Lo and behold, it’s become a popular phrase that captures, I think, a dynamic that we might talk about as more interactive media in a sense and that first, that we do have people making decisions, generating large amount of data and that data is then being analyzed and computed in certain ways in using, in accessible language, something like statistical algorithm, to make inferences.
You can watch then how this is both used as something intuitive and also something game that manipulate in creating this place that I think is actually helpful, the term, discoverability because it’s in this environment or milieu that you’re actually trying to make decisions about what content you’re interested in. It’s about trying to be attentive to that dynamic processes at work.
I think that that’s partially what was interesting about the Facebook case, is that in some ways, it emphasizes some of the traditional aspects of broadcasting where Facebook was making decisions about what was important and in some ways, some of the themes of this panel are more subtle in the sense that content is being recommended in ways that really depend upon your interaction within your personalizations.
Chris, you’ve mentioned some of the data side to it too. I was wondering if you want to elaborate more on what’s that link between algorithms and data being generated?
Christopher: Yeah, 100%. Some of the best algorithms learn and they learn off of measurements that you’re putting in. When you choose to click on the second item in a list, as opposed to the third item or the first item, you’re training it. You all, in aggregate, train that up. In specific types of algorithms, specific types of recommendation systems, it’s something that’s called collaborative filtering. Some of the best algorithms in the valley actually combine both content filtering and collaborative filtering. When you put those together, you end up getting a hybrid method.
It’s said that some companies have a massive advantage because they understand you personally. That when you type in something into a search box at YouTube or at Google, your identity and everything that you’ve done in the past form a part of that query. The reason why they outperform in terms of relevance is because they understand you and that you’re a fundamental part of the overall query. That’s how the two pieces, your behavior and what your intent is at a given moment and what is available to serve up all get combined, mashed together into something that’s so much more than its individual pieces.
Fenwick: Daniel, you mentioned that some of the… The important way that you mentioned, about the Canadian context as well in your piece, you’re mentioning the relationship of PIPEDA, basically the public privacy act in Canada, I’m blanking on the name, Personal Information on Privacy Electronic Documents Act, but you mentioned that and I think that’s one of actually, interesting tensions that a lot of these data has been collected in helping algorithms make decisions.
One of the ways that we could understand that link, and part of the interest of why we’re here today is kind of the intersections between algorithms and the regulatory context. I thought that that might be … Daniel, I just was wondering if you’re comfortable just talking a little bit more about the link between data and PIPEDA.
Daniel: Right. I’m not an expert on privacy but what’s absolutely clear and yesterday, there was a talk about the teenager felt about privacy and they were very concerned about it but they were, okay, with a lot of people having access to their information but they didn’t want to say, Facebook, to know much about that. Of course, they would be very surprised by how much Facebook can tell about them. We’ve got this entire industry that’s basically following you whether you like it or not.
What studies show is that they can tell a lot about you. They can tell your sexual orientations or your gender, your income level fairly easily. They can tell a lot and they use this data in ways that are interesting. For example, a recent study show that if you self-identify online as a woman, you’re less likely to receive ads for high-paying jobs. Whereas, if you self-identify as man, you’re going to be presented with jobs that pay more. Though we don’t know exactly why that is, one theory that I have is that possibly, men are more likely to click on ads for jobs that pay more, whereas women maybe being a little bit more cautious. Maybe, they don’t click. The system learns that possibly and then says, “You’re a woman, we’ve got this ad for low-paying jobs. Here it is.”
It’s interesting. As far as the privacy goes, yeah, it’s a really big change because of course, companies like YouTube, Facebook, they don’t really abide by our rules. It’s difficult to see how it could be all enforced because they don’t actually need to know you by name. They just need to know that you’re someone and then they can add all this information but they don’t need to tie it down to your name which make legal issues much fuzzier.
Fenwick: That’s certainly one of the challenges about using the term, privacy in some ways, to frame some of these debates and issues because yeah, a lot of the phenomenons are occurring in the aggregate. The ways that we can think in some aspects of PIPEDA talking about the data collection and data storage rather than necessarily emphasizing the individual privacy. Some of these ways, they’ve become aggregated, they become problematic.
I think what’s also interesting in a theme that I’m noticing from a few people is that the unpleasantness of having some of our behaviors reflected back to us through algorithms and some of the consequences of that. I’m really reminded of about Microsoft’s recent chatbot which is on the internet for all of 24 hours before it … It said a lot of things which doesn’t bear repeating but you can look at certain … It was certainly targeted by trolls on the internet to say very provocative things but it also just speaks to how these consequences of the algorithms will reflect the data.
In part, that’s not necessarily what we’re hoping some of our systems to behave like. In some ways, they are. You talked about the myth of algorithm but in some ways, we want them to be a better standard because it captures the imaginary that we have that our media can make us better.
On these themes, some of the concerns that work … Dylan, you’re working on this project. I was wondering if you could talk any other ways, this link between the data and the algorithm as being problematic.
Dylan: Just to build off what Daniel was saying, if you’re interested in cases of being profiled, there’s a case, I think it’s Latanya Sweeney who was … I think she was a Cornell. She was a very senior computer scientist at Cornell. Look up Latanya Sweeney and she found that compared to her colleagues, she was being delivered a lot of ads for bail bondsmen like, “Find ways to get your partners or yourself out of jail.” It’s just because she had a black-sounding name. She is black and just because of the way her name was spelled, she was being delivered these ads. It wasn’t a self-identification thing. It just her data was pulled based on the cookies that she carried along with her and any kind of identification that she had to put into websites but look up Latanya Sweeney because it’s a really fascinating case.
She went in and reverse engineered the algorithm where these racial biases were and it wasn’t someone saying, “Oh, if black name then deliver these ads.” It’s this problematic of the way algorithms are being build and especially, in a machine-learning context, a lot of these things are working from correlation rather than causation. Just looking at a bunch of data, you can find weird connections like three days before tornado hits, Target sells more strawberry Pop Tarts.
This is a correlation and it’s agnostic of someone sitting down and being like, “Okay, well, people are getting into emergency preparedness kits or whatever.” It’s just a straight-on correlative assumption or correlative result that is coming out of the statistics and it is problematic because at least, if it’s someone sitting down there writing a formula, you can say, “Okay, well, why did you put that?” You can question a person but I think there’s this sort of agnostic thing of like, “Oh, it’s data. There are no biases to data and correlation is correlation.” All those checks and balances that we have in place for these systems that if a person is running it, we don’t have necessarily for algorithms. It is foolish to say, “Oh, it’s just a computer doing, so we have no control and it is unbiased.” Because they’re assumptions baked in.
Fenwick: Yeah. Dylan, it’s so nice because I think it’s also important why this panel is actually quite significant to me as someone who’s been interested in this relationship between algorithms and policy. This particularly emphasizes this particular ways that this question will be answered by Canadians and I think one of the things to keep in mind is the Latanya Sweeney example is that do we have similar issues in Canada? Have researched been done whether first nations in Canada would have similar amounts of discrimination faced with advertisements online. It would be really great to duplicate those types of study. They will just understand some of the consequences that are taking place here that because we’re not doing the research or because it’s difficult, we don’t actually know some of these phenomenon. I think that’s actually one of the things when I have the mic for a second, it’s just emphasizing what’s really important that some of us is trying to again, bring in these important matters into our existing regulatory discussions.
I want to bring it back a little bit to discoverability too because I think one of the relationships is this question between correlation. I was joking with a friend is that mostly but I listen to Spotify now, it’s kids’ music but at no point these kids’ music appear in my discovery playlist. At some point, the algorithm has figured out that I don’t need more recommendations for Sharon Lois And Bram.
The question I had was about this question of personalization, in the CMF report, there was a concern about a loss of diversity and certainly, one of the things that the Facebook trending story tells is that Facebook was saying, “No. These are the important stories that need to be…” These stories, there’s a malfunction in the algorithm that was necessarily something that we think should be trending.
I’m curious in this moment where we often talk about the relationship between algorithms and filter bubbles. I was wondering if one of our panelists would be brave enough to introduce this concept of a filter bubble and then whether we could link it back to some of these ways that filter bubbles might influence how we are exposed to Canadian content.
Christopher: Yeah. In the Facebook context, a filter bubble occurs when you curate your social network, when you curate your own grass and you populate out your feed and in some more recent studies that were very, very, very good to Facebook, they quantified the differences between liberals and conservatives and they observe the click path. Are liberals more likely to click on challenging conservative content? Are conservatives more likely to click on challenging liberal content? They discovered a whole bunch of biases and the way that people actually filtered out the content that they saw based on a social network that they’ve curated. That’s a social network.
We do the same things with interest networks as well. We curate Twitter profiles to follow. Specific types of algorithms can be far more prone to filter bubbling where challenging content is deliberately left out because we enable human beings to actually do it and that appears to be a natural tendency of some human beings to actually maximize the utility that they’re getting for their time in a given digital experience. That would be my crack at the definition of the filter bubbles.
Daniel: Yeah. This definition is basically right. I think recently, Facebook announced, I don’t have the exact quotes but that they would try to encourage diversity because one of the effect that filter bubble ask is that people don’t interact with content as much because this is all stuff you’ve seen before and so, when you’re not challenged anymore, they see that you don’t interact with the content and you let it slide. This actually causes a problem and people try to get back in the game.
In the early days of the web, in the early days of recommender systems, there was a big hope that the web would … and there was talk about the long tail. I don’t know if people remember that we were leaving the era of the blockbusters with TV and some where a few key people may call the decisions and now, we’re going to make the decisions and we’re going to… and the small producers would have a chance finally, because they could break even though say, the big agencies don’t want them but that’s interesting because of the facts like the filter bubbles and so on, diversity is still not there and the studies show that these effects tend to favor the blockbusters.
For example, just for myself because I’ve decided to design my Twitter feed. I’m getting a lot of buzz from Donald Trump for some reason. It’s all over and it’s getting on my nerves frankly.
Male: That’s because he’s all over.
Daniel: Because he’s all over but it’s not very different from where we were like 30 years ago. We’re still stuck in the bubble except that instead of being decided by the news agencies, it’s algorithms who does it in this thought.
Fenwick: It’s certainly important just as you say that is to say that this is a question emerging out of broadcasting and what we might think of is the traditional complex of attention that was Canadian broadcasting is changing in ways and that we’re thinking of news people that people are encouraged to pay attention and the ways those systems are functioning.
I think what’s interesting, Daniel to your point, is trying in some ways getting back to where we can situate algorithms is that in some ways, if you look at the front page of Netflix, recently, the front page of Netflix is Netflix content, takes up 50%. I don’t know if you have the similar … Who knows whether we have the similar television screen anymore. That’s part of the challenge but 50% of the screen is taken up by Netflix as its own. You can think about how ownership of platforms and content distribution platforms is able to and be influential in ways that are much more tangible as traditional forms of broadcasting.
Then secondly, there are these algorithmic recommendations of what it appears and what I would like based on my viewing content and then there’s also a lot of the ways the web has changed how we are able to find and interact and I often think that it’s important also to situate this discoverability technologies or things like Reddit and things like Wikipedia which provide you different pathways of knowledge into which you’re able to discover things. I think that that’s important but put them in dialogue in some ways because you’re able to then say what does this allow me to discover in terms of content? Getting back to Dylan’s comment about the difference between Amazon and Netflix.
As opposed to how you can spend three hours on Wikipedia clicking through and going from what seemed like a very relevant article into a very strange place on the internet. Part of what I think that question also becomes this question that the CMF report describes with this loss of diversity and one of the concerns is that it’s only popular content or content that’s seemingly going to be popular and thereby, likely marketable and will generate clicks. Is the content being recommended? What is the consequences to some of the different ways we’ve expected media to behave like recommending important content in terms of the news or recommending Canadian content in terms of the cultural industry.
I’m just wondering whether this convergence might be one of an algorithm with convergence around metric about recommendation. Dylan, I want to make sure you’re … but I also just want to say that I’ve used these types of filter bubbles. Some of the consequences is what is being filtered and for what reason? I was wondering in the same way that we might talk about some of the problems about only popular content being filtered if you’ve encountered some of these problems with filter bubbles.
Dylan: I’m just going to leap into the meat of what I think. We’re here, we’re talking about algorithms at our CRTC panel and to me, I can’t help but think that the meat of the conversation for me is if most people are getting their content through algorithmic means, recommendation engines and we’re in this world where Netflix doesn’t pay Canadian taxes. It, to date, has only supported … There’s a co-production. It’s a show called, Between, which is a Canadian co-production with Netflix and Rogers and I think that the terms of the deal are that … Crave is Rogers, right?
Dylan: Show me. That maybe it’s not Rogers but anyway, I think it’s-
Female: It’s Rogers.
Dylan: Oh, man. Chorus. Anybody?
Fenwick: It’s not MA.
Dylan: Anyway, there’s this one Canadian TVs how that Netflix is putting production funding into and I have to say, as a Canadian media producer, how I make money is by previously, Canadian content quotas on channels like history or CTV and if we’re shifting into this world where Netflix does not care about Canadian content, which I don’t think they do, is there a role for the CRTC to say, “Hey, listen. Here’s your algorithm. It recommends these TV shows. I don’t know slap a 30% Canadian content.” I’m being self-serving here but it’s up for debate. I think it needs to be a discussion but we were talking earlier, is that even possible? Can you just slap a 30% CanCon filter on an algorithm and still have it work the same way. It may not even be possible.
Christopher: I mean the optimization objective of the Netflix algorithm is to minimize churn, it’s to minimize subscriber churn and thereby maximize earnings per share. The optimization objective of the Netflix algorithm is not to ensure a 30% CanCon, right? Moreover, I’d be very surprised given the number of countries that Netflix is in and engaged in, that they’d be amenable to actually modifying the algorithm out and down probably fairly low probability. If the CanCon contributes to Netflix as earnings per share, they’re going to put far more like it would be 99% CanCon that would be filtered up.
In general, wouldn’t it be nice if we had Canadian companies that had very robust algorithms to promote native Canadian content up but this is part and parcel a result of the fantastic success of American civilization at being able to push its technology all the way around the world.
Fenwick: I would say that’s an old, very Canadian question and I think that that’s one of the things that is trying to understand this challenge and I think that we’re here in this room today largely because these questions were considered long ago and I think that part of this conversation is adapting to. Dylan, “Should we have CanCon algorithms?” would be a great title for the panel but somebody said raise it, I think it poses the question away that I think is functional because we’ll used to have this system and in a way, that quotas were attentive to how media was influential and essential to schedule and then you target the specific part of the discoverability function of the television schedule in the ways that made that fit cultural policy objectives and I think that that’s one of the challenges and the exciting thing about this summit is that we don’t necessarily need to only be responsible but it also be proactive in the ways that we think about dealing with these questions of discoverability.
Daniel, I wanted to say two things, one, I wanted to get you to respond to this but two, one of the reasons I’m very excited to have Daniel to participate in this panel is that long ago, I was familiar with this in discovery program, which was a project which he can spend a little more about, which was a way of trying to develop a collaborative filtering system to recommend Canadian musicians in a way that we could develop a way of Canadian Content being discovered. The one I’m familiar with is an early attempt and Daniel, I just wanted to pass it to you.
Daniel: Yeah. At the time, this was 2002, at a time before YouTube where putting mp3 file online and I think people download it was high tech and it was this issue that people would not have bandwidth to get the song. I’m serious. This was… and yeah, at the time, I was working with the National Research Council and we thought, it would be really, really nice to have all these independent Canadian musicians at a site where they can upload content and have people discover it. We thought that using interesting algorithms would be one way to do it. Technology is nice.
Female: The humans did it.
Daniel: I’ll come back to in Discover but one thing I wanted to say was yeah, you’re right. One problem is that it’s hard to check and you said it so. You have to realize, people sometimes don’t realize this but if you go to Google and you enter some keywords, there’s absolutely no guarantee that you’re going to get the same answer as I do. You have to understand that and so, it’s fairly hard to know what … Even if the government said to Netflix people have to watch 30% content, how would the government check it? Of course, the government could require the company to provide all the user data but do we actually want the government to be monitoring us at that level? Probably not.
It’s hard but I would say one thing though, we can ask for more transparency, I think. You were alluding to recommender systems that recommend something but we don’t know why. Amazon as a system in its book recommender system, it can actually tell you why it’s recommending a given book. It says, “Well, we recommend this book because you bought this other book.” There are certain things we can do and Netflix, I think is being a little bit of an extremist because for example, let’s say you wanted … I mean I’m French-speaking and if you want to find French content on Netflix, believe it or not, it’s hard. There’s no way to know and if you want to find Canadian content, I mean good luck. Unless you’re an industry insider, you won’t be able to tell and clearly, this will be very, very easy for them to implement and they don’t implement it on purpose.
Now, going back briefly to Discover, yeah, this eventually died out because in the era of YouTube and so on, there are big players that come in and it makes it very difficult for small teams to compete in and control the platform but I think the idea of is there a role for the government or big corporation in Canada to play in promoting Canadian Content? I think this is a very important question and I think that if we leave it up to an algorithms, we may not like the result in the end.
Fenwick: I want to make sure we have time for questions. I just wanted to give everyone a chance to touch on Daniel’s comment about this accountability for algorithms. When I think that that’s actually one of the key takeaways to me about this panel is that we do live in very complicated technical systems and they impact us on an everyday basis. One of the challenges that they no longer appear public in the same way that an outrage … we don’t witness or experience it all the same way. In fact, what we need to be attentive to is the ways that algorithm’s constantly personalizing content and it’s only in comparing and understanding the differences between us that sometimes, we’re able to understand pattern.
I think that’s one of the important questions about discoverability is one about trying to find ways of renewing democratic oversight of the media system at a time when it is largely or it’s close relationship with algorithmic personalization, data driven systems and largely globalized players. I’m very fond of this interview with member of OKCupid who’s an online dating site describing that if you use the site, you’re being experimented on. I think one of the challenges of Canadians is that we are part of a global experiment and we are very … where at universities, we have an ethics review board. There’s no ethics review board for some of the global experiments we know take place in the web.
I’m wondering if the panelists just might be able to talk through in just different ways and facets about this relationship between algorithms and accountabilities. Dylan, I know you have been thinking about that. I wonder if you want to.
Dylan: Yeah. One of the main things in my research I’m trying to figure out. I’ve been posing this question to a lot of people. Algorithms are used for parole hearings in Pennsylvania. They’re used for predictive policing. These have major impacts on our lives and how do we come to terms with them? They’re a bunch of different layers. I’ve forwarded to panel, there’s a really good article by Andrew Turcot an FDA for algorithms that says, maybe there should be government rules around transparency. I think transparency is the most important step that we can take right now to make sure that we’re disclosing what exactly we’re doing. The VW emissions test.
The reason why that algorithm has snuck by everyone is because it was held under intellectual property rights. This is our intellectual property. This is algorithm that helps us run our engine, so you can’t touch it but we need to have a better oversight. I think we need to start demanding transparency. Then also on our side in the media, we need to develop better algorithmic literacy.
Fenwick: Chris, do you want to …
Christopher: Transparency is really, really great because the more trust you have with the algorithm, with the utility, the better it can be and the more incremental aspects of yourself you may be willing to share with it. That all comes down to trustability of the brand and the trustability of the company.
There is a slight dark side on the transparency front. There was this great golden age for text analytics and search. Basically, you could go into typical piece of web analytics software. You could extract all of the keywords that people were using in order to find your content. You could use scoring. Just a very, very beautiful piece of innovation, a beautiful year of science during those years.
Google had several angry meetings about how they were making people way smarter. They were making web masters way smarter. They were making the SEOs way smarter and they ended up clamping back on it. We actually have less data today about how people use Google and how it interacts with your old media site than we did before. It was because some people were attempting to manipulate the search rankings in an effort to bend them to their own will and Google regards that as being a form of spamming. Whenever you’re overriding the relevance of the engine, you are in many ways, actually harming Google’s revenue base. If there’s anything that Google is very passionate, it’s defending their revenue base and their profit share.
I’m all for transparency, so long as the negative effects, so long as the commons don’t get completely destroyed and ruined but by and large, the individual should be massively empowered to be able to understand how their information is used. That’s just really, really great design.
Daniel: Yeah. The panel cannot do this job now but we need to define transparency because even if you have access to YouTube source code, I mean all of it, I dump all of it on your hard drive and I say you know …
Dylan: The flip side of that is, we have to improve our literacy but one of the ways … Nick Diakopoulos talks a lot about this. He’s like, “You know what, even if we can understand every line of code and maybe lines of code don’t matter in the age of machine learning, at least we could say we know what inputs are being put in and we know what’s coming out and we can measure those things.” Those are useful stakes in the ground that we can orient ourselves around.
Fenwick: I think there’s lots of ways to be attentive. I think, Chris, your point about search engine optimization, I think is really, really important because it’s one also I think that we should be aware of that’s already going on. Click-bait and going viral, I think are actually examples of people who are the next generation of search engine optimization. I mean sure, they talk about it that way but in a certain sense, there are new affordances in the new elite like new ways that knowledge is being concentrated and that means that certain people are experts are gaming and manipulating algorithmic systems.
There’s a tension between trying to understand this knowledge exists and how do we in ways, publicize aspects of it that encourage cultural and policy objectives that we’ve set forth and whether that’s in the broadcasting act or in future policy.
I think there’s also really legitimate concerns that are taking place in Canada right now. Certainly, one of the things are the TransPacific partnerships has issues about the disclosure of source code and so, this might put us in a more precarious situation where it would be more difficult as a result to be able to find some of these systems accountable even if we were able to in the first place.
I think one of the things, and this is something that Nick and other people have talked about is this emphasis on studying running code. One of the ways that I get into actually is the inverse. One of the reasons I get into internet measurement and concerns about how we study the operation of the internet was in some ways, trying to develop mechanisms of accountability of complex technical systems.
Because I’d like to transition to questions but one of the things is to say that’s important is that I really encourage and maybe I’ll tweet out the links to Nick’s work in Propublica which is an organization in the States that’s been really interesting in reverse engineering how Obama will write messages to different demographics. I think there are really tangible ways that we can start bringing more accountability to algorithms and one of the things that could be easily done is start potentially reverse engineering ways, not reverse engineering but studying and being attentive to the ways that content is being recommended to the different systems.
That I think is important research to be done and being attentive to but we have about 15 minutes. Do we have about 15 minutes for questions? I just wanted to make sure that if you all are here on a beautiful sunny day at 9:00 in the morning that you also have a chance to ask some questions to our very, very knowledgeable panelists. We have a hand up there. How do I do this? Yeah, you’re the quickest. You’re quick. You want a jeopardy round.
Male: For those in the room who are content creators, content owners, content contributors, do you got … We’re recording. Okay. Hi, guys. For those in the room that are content creators, content owners, content distributors, do you guys have any tips or tricks that we can use to make sure that our content shows up in those algorithms like the Facebook, the YouTube, the Netflix other than spam? It’s already mentioned, so that could be good and anyone following discoverability as a hashtag on Twitter, I’m sure. Seeing all that spam and how they can work or the click-bait. I need better hands-on tips from anyone.
Dylan: Oh, I’ve got no suggestions.
Christopher: RTFM with respect, that means Read The Fracking Manual. Google is very, very explicit in how to format your meta tags for it to be crawled. Incredibly explicit in fact and there’s quite the little industry that we have here in Canada with people that go around and repeat what’s in that document.
Yeah, that was a nice shot, right? By and large, the way that I like to think of it is if you were writing really, really relevant content, most engines are incented to sort that content and put it in front of the most relevant audiences because the incentives typically, with certain exceptions recently, in the press. Typically, the incentives are very much aligned.
If you focus more on making human readable content as opposed to massive amounts of energy trying to game the system, I think that you’ll find a higher overall sustainable competitive advantage in pursuit of that policy.
Daniel: Yesterday, they were talking about the workshops they did with the teenagers a couple of days ago and one of the workshop … and I don’t mean to say that content creators are teenagers but they had a workshop on teaching kids to control their brand. Basically, we’re telling them, “Well, create a site or a blog whatever and have all your activities link back to it.” Obviously, the lead-in question was thinking probably of page rank, which may or may not be actually used but the idea is that if you link at something a lot, it’s going to rank higher.
I’ve been teaching college students about blogging for some time as a computer science prof and one thing that I found was very interesting is a lot of people, they don’t actually have any strategy about their own brand online. When they do something, they do post it but they never think about. A lot of people, even people who try to break into some industry, they often don’t like marketing or they only want to market their product but not themselves and I see that as a problem because online, you have to be someone. You have to take control of your brand, I think.
Christopher: I have nothing to add. I was talking to someone who does social media for Vice last week and we’re talking about where is your best money and time spent and she was just like, “You know what? Buying promoted Facebook ads and buying promoted Twitter ads, I think we’re reaching the end of organic, viral explosions.” and this is what Facebook and Twitter have been scheming for, for the last decade of finding ways to monetize their products and how to do that is to promote our tweets because we’re at the end of, I think, maybe I’m wrong, of just organic breakouts.
Fenwick: Certainly, I think that plays to a link about this advertising typically was again, something on the schedule and now, we might see that these platforms finally figure out because it wasn’t clear initially how advertising was going to work in early radio. Certainly, this might be an example of the platform trying to figure out that they’re actually exceptionally more influential and as a result of that, know how then to better monetize their influence.
One of the things I would just also say that is one of my other comments I want to make is that there’s also a lot of talk about some of the difficulty and the precarity of being a producer on online platforms. Certainly, there’s recent criticisms about how it’s very difficult actually to be sustainable on YouTube for example. I think that that’s one of the themes is that it’s not about discoverability but actually, creating sustainable jobs.
One of the things to say is that given the difficulties of being a successful content producer in some of these platforms, I think there are opportunities to create new platforms and I’ve been very attentive to what’s going on the States about this idea of platform cooperativism, which is trying to create more horizontal affinities between content producers and creating platforms that work for them. I think that one of the other responses too is that this might be an opportunity for Canadians to start taking seriously that idea of platform cooperativism as the future of public media and as a future of ways of addressing some of these issues that you are going to be increasing on platforms that might be very disinterested in making news successful unless you’re willing to pay for it.
Oh, how’d it go?
Fenwick: Then I’ll make sure you … We’ll have one more question.
Drew: Drew Robinson, Corus Entertainment. We’ve been talking about Facebook and things like that and products. We’ve been talking about recommending a blender because you have another blender. It’s a little bit, I think in my mind, easier and clear cut but content is hard to define and so, I might define something as a drama series, you might define it as a comedy.
When you put that into an algorithm, I guess there’s two sides in terms of who controls and defines that content. I mean the algorithms is only as good as these meta tags you’re referring to. In this new world, can you give examples of maybe how content producers, owners have maintained or kept some of that control and whether or not that is playing a big part … whether it’s Netflix or another platform?
Daniel: Okay. When the web was emerging, a lot of people spent a lot of time on taxonomies and on manually categorizing things and this did not scale very well and the Facebook issue where they pay people to decide what’s important, part of the problem is that also it does not scale very well. The reason why we use machine learning is that it’s only this way that we can actually provide personalized content to millions of people without paying millions of dollars.
I’m not sure how much necessarily, for example, Google, I’m not sure how much weight Google puts on manually tagging things. It’s probably less than people might think.
Christopher: I think that there are so many additional applications of really, really rich tagging of a really, really rich taxonomy. Just say for instance, once I wanted to characterize an audience that was interested in AHL hockey, an Atlantic local hockey, in order to build that taxonomy, I had to actually search long and hard for the name of every single player and the name of every single coach and put it in and it was incredibly painful.
Yesterday, it was mentioned that some company should have a VP of taxonomy to make sure that the pipeline is actually properly populated with the metadata. There are so many really good additional uses for the metadata for machines on the content site. From an information retrieval perspective, from a Google perspective, likely less so but from a Canadian media, from an owned content perspective, significantly more so and I understand that’s a major pinpoint and I understand that I’m rather demanding when I expect the content producer to spend all this amazing amount of time, labor intensive to get all the way up to the one yard line and ask him to invest the incremental effort to put in rich metadata and tagging.
It’s yet one more thing on their plate, which is why I think that machine learning … and there’s no breaks on the AI hype train. It would be nice if we got some help from our machine friends for sure. Make it more robust.
Fenwick: Okay. We have two more questions and I just wanted to say, we have 10 minutes left. I see three hands now. Yes? I also just want to say, this is also an interesting question about Upworthy. They have a graduate student who’s studying Upworthy and in particular, they’re talking about a turn towards attention minutes is a different way of valuing what is measurable and what informs an algorithmic system of recommendation? One of the things that could be a potential outcome of this summit is whether there is actually a greater Canadian consultation and what would be tangible analytics. Is it like there’s a national standard for attention minutes? Or what are the certain ways that broadcasters and content producers might be able to agree on common standards for metrics? So that we could have a common system and that might bend and potentially form how these algorithms work?
One, two, three. I’m just telling my panelists we have about 10 minutes. I might not make sure everybody answers it but we’ll make sure that we answer all these questions. Apologies for the time.
Solange: No problem. My name is Solange Drouin. I represent the Independent Producers in Quebec and so I’m in the music industry. We’ve been dealing with that kind of issues for many years and we all know. I would like to congratulate the CRTC who have that kind of panel because I guess, it’s a key panel in this discoverability summit.
We all know for now, in the music industry that it’s not sufficient to be good, to be found, to be heard. It’s important that we are put forward and we all know that we have quotas in the Canadian industry, radio industry and it created a vibrant Canadian industry. It’s important that we have that in mind for the future because it was not a minus, it was a plus to have that kind of rules because now, we can tell in the music industry that the long tail is now the long fail. It’s a long fail. It doesn’t work. You have to be put forward.
I don’t buy the argument that if we put in place a Canadian music platform, it will solve all the problems. We don’t buy it. In Quebec, we don’t buy it. I know that my friend in the music industry and the rest of Canada have the same opinion. It’s important that we are on all platforms even foreign platforms. What I hear from you is that it’s not a question of there’s no technological problem to make sure that all this platform could put our Canadian content forward. It could be done. For me, that is the question.
After that, we have to ask ourselves how do we do that? How the CRTC could do that kind of regulation? Because we all know that if we leave it to the Netflix and Spotify and Pandora of the earth, they won’t do it. They don’t do it because they have no incentive to do so. I would like to hear it very clearly from you, the expert that would it be possible for Netflix in terms of technology or Spotify to put in place in their algorithm some tagging referring to the origin of the content and the CRTC, let’s say, could ask each platform to put every x frequency, put forward for the Canadian public, not all the public of the earth, Canadian public, put forward that content if it’s embedded in their algorithm and if it’s tagged properly.
Fenwick: With all due respect, I just wanted to make sure we have the last two questions and then we’ll respond to all three because we’re running short on time and then we have this gentleman in the front. Just state the question.
Female: Yeah. Thank you, Daniel. My question is a statement that connects to a lot of others. I’ve just returned from Singularity University, which is on the Google campus where I’ve been exposed to all the world’s leading technologies and where they’re going and I’ve been trying to relate this to our Canadian media problems or challenges. I think one of the issues is that while there are three billion people online right now, there are up to five billion more coming on by 2020. It seems to me that there is no algorithm that can make people want to watch anything and that going back to the attention statement, isn’t it true that one of the keys to our system is to adjust it so that we create content that up to a billion people want to watch and winning, so to speak, will solve all those problems?
Fenwick: Last question here on the front?
Male: As I watch this conference, it strikes me that technological forces perhaps are changing the role of government less from regulation into more of facilitation. You commented that at least in terms of algorithms, regulation is rather difficult. I wonder if you could comment on facilitation in the role of government taking that approach?
Fenwick: Yeah. Last comments from our … in response like who would like to …
Dylan: I’m just going to weigh in on the CMF report. They said there’s no magic algorithm to make Canadians like Canadian content, which I thought was pretty … that’s some harsh shade but I also think that one of the things that … I don’t know. I think we talked about content as though, oh, American content is better for whatever reason and I think we’re ignoring this question that taste is subjective. Taste is a social construct as well. There are a lot of great Canadian breakout hits like Kenny versus Spenny. That’s not on any American radar. That kind of came out of nowhere or Degrassi that whole chain, especially, in the first couple of iterations.
I don’t think we could construct and I don’t know. I just think that the CRTC needs to have some more consultations to see about what values that we want to preserve. Maybe we don’t want to preserve Canadian content but not from this position of oh, the American product is better. Maybe it’s more diversification of content. Or Letterkenny is another good example of something that’s been a huge that’s weird and left field and can travel in strange ways. Yeah, I think it’s a more nuanced conversation and a little bit more messy, which maybe we should embrace.
Daniel: Yeah. I wanted to comment on the issue of regulation. In the software industry, what we do is we do a lot of testing. Half of the time, we spend developing software is spent in testing and testing does not require you to have an access to the source code. It doesn’t even require for the people to be willing participant. A famous case was during the browsers wars, Microsoft claimed that Internet Explorer could not be split up from Windows because it was tightly embedded and an independent tester proved them wrong.
Certainly, it would be possible for, I’m not saying the CRTC should do it, but the CRTC could test Netflix and see what happens when you are given customer with a given profile. What is the result? You can be almost sure that there’s a lot of testing going on in Netflix and they do that kind of stuff. They say, “Well, what if I’m this type of user. What happens?”
Certainly, a regulation agency could come in and could test this. They could try to see what happens without necessarily spying on us and without having access to the source code of the company. I’m just saying that regulations has to take a different form but it does not need to be impossible.
Christopher: To touch directly on that point about five billion Canadians coming on-
Female: They’re not five billion.
Christopher: Or not Canadians, hopefully, five million Canadians. We got a very, very, very big country that for the past 455 years, we’ve been trying to fill up and unlike Britain, which got full a long time ago and began exporting its culture out, what is so exciting about the globalization of these platforms and these algorithms is that Canadians can actually compete and actually have an equal opportunity to compete. Being in that mindset, accelerating the way that we build content, the way that we measure how that content is performing on those international platforms and then learning from it, actually, actively, incorporating those lessons back in could actually open up and expand the pie way bigger.
I suppose I view it less as being a protectionist’s perspective to being more like a, how can we collaborate with government? Get the right policy mix in there that enables us to overcome some of our natural geographical challenges that we have in this country and be able to go out and really, really kick some A.
Christopher: Yes, yeah.
Female: That’s a great.
Fenwick: Just to conclude, I just wanted to say that one of the things that I hope you’ve taken away from this panel is that one, that we think of algorithms as having a focused role in a larger question of discoverability and I think hopefully, you’ve seen some of the [inactions 01:10:27] and connections. The second point, I just wanted to say is that there’s also an important way and role for regulatory oversight and that I certainly think that one, the challenges that we’re faced with a lot of increasingly global companies. How do we have a democratic transparency in Canada?
I think that is Canadians citizens, that’s one of the levels that we might play over is that is globalization has taken place but this questions of democracy haven’t been solved globally and so, are continued to having been solved locally and then finally, I think this is also important to remember. In the past, the CBC experimented with new distribution techniques like BitTorrent and I think this is a real reminder that for a public media in Canada to embrace being experimental in thinking through the ways its historically been experimental and the ways in the future it can be experimental, these new audiences and platforms and opportunities.
With that, I would like to thank my panelists for this fantastic panel. Thanks everyone for the questions and again, have a great day.