RSS

Algorithm & APIs News

These are the news items I've curated in my monitoring of the API space that have some relevance to the algorithm conversation and I wanted to include in my research. I'm using all of these links to better understand how the space is testing their APIs, going beyond just monitoring and understand the details of each request and response.

I Do Not Fear AI, I Fear The People Doing AI

There is a lot of FUD out there when it comes to artificial intelligence (AI) and machine learning (ML). The tech press enjoy yanking people’s chain when it comes to the dangers of artificial intelligence. AI is coming for your jobs. AI is racist, sexist, and biased. AI will be lead to World War III. AI will secure and protect us from the bad out there. AI will be the source of all of our worries, and the solution to all of our worries. I’m interested in the storytelling around all of this, and I’m fascinated by the distracting quality of technology when it comes to absolving the humans behind of doing bad things.

We have the technology to make this black boxes more observability and accountable. The algorithms feeding us news, judging us in courtrooms, and deciding if we are insurable or a risk, can all be wrapped with APIs, and made more accountable. However, there are many human reasons why we don’t do this. Every AI out there can be held accountable, it isn’t rocket science. The technology exists to keep AI from hurting us, judging us, and impacting our lives in negative ways. However, it is the people behind who do not want it, otherwise their stories won’t work. Their stories won’t have the desired effect and control over our lives.

APIs are the layer being wielded for good and for bad on the Internet. Facebook, Twitter, and Reddit, all leverage APIs to be available on our mobile phones. APIs are how people automate, advertise, and fund their activities on their platforms. APIs are how AI and ML are being exposed, wielded, and leveraged. The technology is already there to make them more accountable, we just don’t have the human will to use the technology we have. There is more money to be made in telling wild stories about what is possible. Telling stories that make folks afraid, and in awe of what is possible with technology. APIs are used to tell you the stories, while also making the fire shoot from the stage, and the smoke and the mirrors operate, instead of helping us see, understand, and verify what is going on behind the scenes.

We rarely discuss the fact that AI isn’t coming for our jobs. It is the people behind the AI, at the companies developing, deploying, and operating AI that are coming for our jobs. AI, like APIs, are neither good, nor bad, nor neutral–they are a tool. They are technology, and anything they do is because of us humans. I don’t fear AI. I only fear the people doing AI. The people who tell the stories. The people who are believers. I don’t fear technology because I know we have the tools to do what is right, and hold the people who are using technology in bad ways accountable. I’m afraid because we don’t seem to have the will to look behind the curtain. We hold up many of the people telling stories about AI as visionaries, leaders, and truth tellers. I don’t fear AI, I only fear its followers.


Specialized Collections Of Machine Learning APIs Could Be Interesting

I was learning more about CODEX, from Algorithmia, their enterprise platform for deploying machine learning API collections on premise or in the cloud. Algorithmia is taking the platform in which their algorithmic marketplace is deployed on and making it so you can deploy it anywhere. I feel like this is where the algorithmic-centered API deployment is heading, potentially creating some very interesting, and hopefully specialized collections of machine learning APIs.

I talked about how the economics of what Algorithmia is doing interests me. I see the potential when it comes to supporting machine learning APIs that service an image or video processing pipeline–something I’ve enjoyed thinking about with my drone prototype. Drone is just one example of how specialized collections of machine learning APIs could become pretty valuable when they are deployed exactly where they are needed, either on-premise or in any of the top cloud platforms.

Machine learning marketplaces operated by the cloud giants will ultimately do fine because of their scale, but I think where the best action will be at is delivering curated, specialized machine learning models, tailored to exactly what people need, right where they need them–no searching necessary. I think recent moves by Google to put TensorFlow on mobile phones, and Apple making similar moves show signs of a future where our machine learning APIs are portable, operating on-premise, on-device, and on-network.

I see Algorithmia having two significant advantages right now. 1) they can deploy their marketplace anywhere, and 2) they have the economics, as well as the scaling of it figured out. Allowing for specialized collections of machine learning APIs to have the metering, and revenue generation engines built into them. Imagine a future where you can deploy and machine learning and algorithmic API stack within any company or institution, or the factory floor in an industrial setting, and out in the field in an agricultural or mining situation–processing environmental data, images, or video.

Exploring the possibilities with real world use cases of machine learning is something I enjoy doing. I’m thinking I will expand on my drone prototype and brainstorm other interesting use cases beyond just my drone video. Thinking about how I can develop prototype machine learning API collections, that could be used for a variety my content, data, image, or video side-projects. I think when it comes to machine learning I’m more interested in specialty collections over the general machine learning hype I”m seeing peddled in the mainstream right now.


Algorithmic Observability In Predictive Policing

As I study the world of APIs I am always on the lookout for good examples of APIs in action so that I can tell stories about them, and help influence the way folks do APIs. This is what I do each day. As part of this work, I am investing as much time as I can into better understanding how APIs can be used to help with algorithmic transparency, and helping us see into the black boxes that often are algorithms.

Algorithms are increasingly driving vital aspects of our world from what we see in our Facebook timelines, to whether or not we would commit a crime in the eyes of the legal system. I am reading about algorithms being used in policing in the Washington Monthly, and I learned about an important example of algorithmic transparency that I would like to highlight and learn more about. A classic argument regarding why algorithms should remain closed is centered around intellectual property and protecting the work that gives you your competitive advantage–if you share your secret algorithm, your competitors will just steal it. While discussing the predictive policing algorithm, Rebecca Wexler explores the competitive landscape:

But Perlin’s more transparent competitors appear to be doing just fine. TrueAllele’s main rival, a program called STRmix, which claims a 54 percent U.S. market share, has an official policy of providing defendants access to its source code, subject to a protective order. Its developer, John Buckleton, said that the key to his business success is not the code, but rather the training and support services the company provides for customers. “I’m committed to meaningful defense access,” he told me. He acknowledged the risk of leaks. “But we’re not going to reverse that policy because of it,” he said. “We’re just going to live with the consequences.”

And remember PredPol, the secretive developer of predictive policing software? HunchLab, one of PredPol’s key competitors, uses only open-source algorithms and code, reveals all of its input variables, and has shared models and training data with independent researchers. Jeremy Heffner, a HunchLab product manager, and data scientist explained why this makes business sense: only a tiny amount of the company’s time goes into its predictive model. The real value, he said, lies in gathering data and creating a secure, user-friendly interface.

In my experience, the folks who want to keep their algorithms closed are simply wanting to hide incompetence and shady things going on behind the scenes. If you listen to individual companies like Predpol, it is critical that algorithms stay closed, but if you look at the wider landscape you quickly realize this is not a requirement to stay competitive. There is no reason that all your algorithms can’t be wrapped with APIs, providing access to the inputs and outputs of all the active parts. Then using modern API management approaches these APIs can be opened up to researchers, law enforcement, government, journalists, and even end-users who are being impacted by algorithmic results, in a secure way.

I will be continuing to profile the algorithms being applied as part of predictive policing, and the digital legal system that surrounds it. As with other sectors where algorithms are being applied, and APIs are being put to work, I will work to find positive examples of algorithmic transparency like we are seeing from STRmix and HunchLab. I’d like to learn more about their approach to ensuring observability around these algorithms, and help showcase the benefits of transparency and observability of these types of algorithms that are impacting our worlds–helping make sure everyone knows that black box algorithms are a thing of the past, and the preferred approach of snake oil salesman.


Algorithmic Observability In Predictive Policing

As I study the world of APIs I am always on the lookout for good examples of APIs in action so that I can tell stories about them, and help influence the way folks do APIs. This is what I do each day. As part of this work, I am investing as much time as I can into better understanding how APIs can be used to help with algorithmic transparency, and helping us see into the black boxes that often are algorithms.

Algorithms are increasingly driving vital aspects of our world from what we see in our Facebook timelines, to whether or not we would commit a crime in the eyes of the legal system. I am reading about algorithms being used in policing in the Washington Monthly, and I learned about an important example of algorithmic transparency that I would like to highlight and learn more about. A classic argument regarding why algorithms should remain closed is centered around intellectual property and protecting the work that gives you your competitive advantage–if you share your secret algorithm, your competitors will just steal it. While discussing the predictive policing algorithm, Rebecca Wexler explores the competitive landscape:

But Perlin’s more transparent competitors appear to be doing just fine. TrueAllele’s main rival, a program called STRmix, which claims a 54 percent U.S. market share, has an official policy of providing defendants access to its source code, subject to a protective order. Its developer, John Buckleton, said that the key to his business success is not the code, but rather the training and support services the company provides for customers. “I’m committed to meaningful defense access,” he told me. He acknowledged the risk of leaks. “But we’re not going to reverse that policy because of it,” he said. “We’re just going to live with the consequences.”

And remember PredPol, the secretive developer of predictive policing software? HunchLab, one of PredPol’s key competitors, uses only open-source algorithms and code, reveals all of its input variables, and has shared models and training data with independent researchers. Jeremy Heffner, a HunchLab product manager, and data scientist explained why this makes business sense: only a tiny amount of the company’s time goes into its predictive model. The real value, he said, lies in gathering data and creating a secure, user-friendly interface.

In my experience, the folks who want to keep their algorithms closed are simply wanting to hide incompetence and shady things going on behind the scenes. If you listen to individual companies like Predpol, it is critical that algorithms stay closed, but if you look at the wider landscape you quickly realize this is not a requirement to stay competitive. There is no reason that all your algorithms can’t be wrapped with APIs, providing access to the inputs and outputs of all the active parts. Then using modern API management approaches these APIs can be opened up to researchers, law enforcement, government, journalists, and even end-users who are being impacted by algorithmic results, in a secure way.

I will be continuing to profile the algorithms being applied as part of predictive policing, and the digital legal system that surrounds it. As with other sectors where algorithms are being applied, and APIs are being put to work, I will work to find positive examples of algorithmic transparency like we are seeing from STRmix and HunchLab. I’d like to learn more about their approach to ensuring observability around these algorithms, and help showcase the benefits of transparency and observability of these types of algorithms that are impacting our worlds–helping make sure everyone knows that black box algorithms are a thing of the past, and the preferred approach of snake oil salesman.


API Wrappers To Help Bring Machine Learning Into Focus

I was taking a look at the Tensorflow Object Detection API, and while I am interested in the object detection, the usage of API is something I find more intriguing. It is yet another example of how diverse APIs can be. This is not a web API, but an API on top of a single dimension of the machine learning platform TensorFlow.

“The TensorFlow Object Detection API is an open source framework built on top of TensorFlow that makes it easy to construct, train and deploy object detection models.” It is just a specialized code base helping abstract away the complexity of one aspect of using TensorFlow, specifically for detecting objects in images. You could actually wrap this API with another web API and run on any server or within a single container as a proper object recognition API.

For me, it demonstrates one possible way of wrapping a single or cross section of a machine learning implementation to abstract away the complexity and helping you train and deploy ML models in this particular area. This approach to deploying an API on top of ML shows that you can use to APIs to help simplify and abstract ML for developers. This can be done to help satisfy business, regulatory, privacy, security, and other real or perceived concerns when it comes to artificial intelligence, machine learning, or any other digital voodoo that resembles magic.

No matter how complex the inputs and outputs of an algorithm are, you can always craft an API wrapper, or series of API wrappers that help others make sense of those inputs, from a technical, business, or even political perspective. I just wanted to highlight this example of ML being wrapped with an API, even if it isn’t for all the same reasons that I would be doing it. It’s just part of a larger toolbox I’m looking to create to help me make the argument for more algorithmic transparency in the machine learning platforms we are developing.


Algorithmia Invests More Resources Into Machine Learning APIs For Working With Video

I got my regular email from Algorithmia this last week and I like where they are going with some of their machine learning APIs. They have been heavily investing in machine learning applied to video, allowing for the extraction of information from video, as well as applying interesting transformations to your videos.

Here are some of the video tools they have been working on:

These are all things I’m interested in using as part of my drone and other video work that I’ve been working on as a hobby. I’m interested in the video pipeline aspect because it’s fun to work with the video I capture, but I also see the potential when it comes to drones in agriculture and mining, and I am also curious the business models associated with this type of a video pipeline. I think video, images, plus APIs, coupled with the API monetization strategy Algorithmia already has in place is their formula for success.

I’m keeping an eye on what Amazon, Google, and Microsoft are up to, but I think Algorithmia has a first mover advantage when it comes to the economic of all of this. I’m glad they are investing more into their video resources. I think there are endless uses for API-driven pipelines that process images and video and apply machine learning models using APIs, then metered, and made available via an algorithmic catalog like Algorithmia offers.


Exploring The Economics of Wholesale and Retail Algorithmic APIs

I got sucked into a month long project applying machine learning filters to video over the holidays. The project began with me doing the research on the economics behind Algorithmia's machine learning services, specifically the DeepFilter algorithm in their catalog. My algorithmic rotoscope work applying Algorithmia's Deep Filters to images and drone videos has given me a hands-on view of Algorithmia's approach to algorithms, and APIs, and the opportunity to think pretty deeply about the economics of all of this. I think Algorithmia's vision of all of this has a lot of potential for not just image filters, but any sort of algorithmic and machine learning API.

Retail Algorithmic and Machine Learning APIs
Using Algorithmia is pretty straightforward. With their API or CLI you can make calls to a variety of algorithms in their catalog, in this case their DeepFilter solution. All I do is pass them the URL of an image, what I want the new filtered image to be called, and the name of the filter that I want to be applied. Algorithmia provides an API explorer you can copy & paste the required JSON into, or they also provide a demo application for you to use--no JSON required. 

Training Your Own Style Transfer Models Using Their AWS AMI
The first "rabbit hole" concept I fell into when doing the research on Algorithmia's model was their story on creating your own style transfer models, providing you step by step details on how to train them, including a ready to go AWS AMI that you can run as a GPU instance. At first, I thought they were just cannibalizing their own service, but then I realized it was much more savvier than that. They were offloading much of the costly compute resources needed to create the models, but the end product still resulted in using their Deep Filter APIs. 

Developing My Own API Layer For Working With Images and Videos
Once I had experience using Algorithmia's deep filter via their API, and had produced a handful of my own style transfer models, I got to work designing my own process for uploading and applying the filters to images, then eventually separating out videos into individual images, applying the filters, then reassembling them into videos. The entire process, start to finish is a set of APIs, with a couple of them simply acting as a facade for Algorithmia's file upload, download, and DeepFilter APIs. It provided me with a perfect hypothetical business for thinking through the economics of building on top of Algorithmia's platform.

Defining My Hard Costs of Algorithmia's Service and the AWS Compute Needed
Algorithmia provides a pricing calculator along with each of their algorithms, allowing you to easily predict your costs. They charge you per API call, and the compute usage by the second. Each API has its own calculator, and average runtime duration costs, so I'm easily able to calculate a per image cost to apply filters--something that exponentially grows when you are applying to 60 frames (images) per second of video. Similarly, when it comes to training filter models using AWS EC2 GUP instance, I have a per hour charge for compute, storage costs, and (now) a pretty good idea of how many hours it takes to make a single filter. 

All of this gives me some pretty solid numbers to work with when trying to build a viable business built on top of Algorithmia. In theory, when my customers use my algorithmic rotoscope image or video interface, as well as the API, I can cover my operating costs, and generate a healthy profit by charging a per image cost for applying a machine learning texture filter. What I really think is innovative about Algorithmia's approach is that they are providing an AWS AMI to offload much of the "heavy compute lifting", with all roads still leading back to using their service. It is a model that could quickly shift algorithmic API consumers to be more wholesale / volume consumers, from being just a retail level API consumer.

My example of this focuses on images and video, but this model can be applied to any type of algorithmically fueled APIs. It provides me with a model of how you can safely open source the process behind your algorithms as AWS AMI and actually drive more business to your APIs by evolving your API consumers into wholesale API consumers. In my experience, many API providers are very concerned with malicious users reverse engineering their algorithms via their APIs, when in reality, in true API fashion, there are ways you can actually open up your algorithms, make them more accessible, and deployable, while still helping contribute significantly to your bottom line.


Pushing For More Algorithmic Transparency Using APIs

I saw the potential for collaboration when it came to using web APIs back around 2004 and 2005. I was seeing innovative companies opening up their digital assets to the world using low-cost, efficient Internet technology like HTTP, opening things up for potentially interesting approaches to collaboration around the development of web and mobile applications on top of valuable digital resources. This approach has brought us valuable platforms like Amazon Web Services and SalesForce. 

Common API discussions tend to focus on providing APIs to an ecosystem of developers and encouraging the development of web and mobile applications, widgets, visualizations, and other integrations that benefit the platform. In the course of these operations, it is also customary to gather feedback from the community and work to evolve the APIs design, available resources, and even the underlying data model--extending collaboration to also be about the APIs, and underlying resources, in addition to just building things on top of the API.

This approach to designing, defining, and deploying APIs, and then also web and mobile applications on top of these APIs is nothing new, and is something that I have been tracking on for over the last six years. The transparency that can be injected into the evolution of data, content, and potentially the "algorithms behind" with APIs is significant, which is how it became such a big part of my professional mission, and fueling my drive to spread the "gospel" whenever and wherever I can. 

Ok, so how can APIs contribute to algorithmic transparency? To fully grasp where I am taking this, you need to understand that APIs can be used as an input and output for data, content, as well as algorithms. Let's use Twitter as an example. Using Twitter and the Twitter API I can read and write data about myself, or any user, using the /account and /users API endpoints--providing the content and data portion of what I am talking about.

When it comes to the algorithm portion, Twitter API has several methods, such as GET statuses/user_timelineGET statuses/home_timeline and GET search/tweets, which return a "timeline of Tweet data". In 2006 this timeline was just the latest Tweets from the users you follow, in sequential order. In 2016, you will get "content powered by a variety of signals". In short, the algorithm that drives the Twitter timeline is pretty complicated, with a number of things to consider:

  • Your home timeline displays a stream of Tweets from accounts you have chosen to follow on Twitter. 
  • You may see suggested content powered by a variety of signals. 
  • Tweets you are likely to care about most will show up first in your timeline. 
  • You may see a summary of the most interesting Tweets you received since your last visit
  • You may also see content such as promoted Tweets or Retweets in your timeline.
  • Additionally, when we identify a Tweet, an account to follow, or other content that's popular or relevant, we may add it to your timeline.

There are a number of considerations that would go into any one timeline response--this is Twitter's algorithm. While I technically have access to this algorithm via three separate API endpoints, there really isn't much algorithmic transparency present, beyond their overview in the support section. Most companies are going to claim this is their secret sauce and their intellectual property. That is fine, I don't have a problem with y'all being secretive about this, even though I will always push you to be more open, as well as leave the API layer out of your patents you use to pretect your algorithms.

Algorithmic transparency with APIs is not something that should be applied to all APIs in my opinion, but for regulated industries, and truly open API solutions, transparency can go a long way, and bring a number of benefits. All Twitter (and any other API provider) has to do is add parameters, and corresponding that open up the variables of the underlying algorithm for each endpoint. What goes into considerations about "what I care about", constitutes "interesting", and what makes things "popular or relevant"? Twitter will never do this, but other API providers can.

It is up to each API provider to decide how transparent they are going to be with their algorithms. The ideal solution when it comes to transparency is that the algorithm is documented and shared along with supporting code on Github, like Chicago, did for their food inspection algorithm. This opens up the algorithm, and the code behind for evaluation by 3rd parties, potentially improving upon it, as well as validating the logic behind--potentially opening up a conversation about the life of the algorithm.

There are a number of common reasons I have seen for companies and developers not opening up their algorithms:

  • It truly is secret sauce, and too much was invested to just share with the world.
  • It is crap, and the creator doesn't want anyone to know there is nothing behind.
  • There are malicious things going on behind the scenes that they do not want to be public.
  • Insecurities about coding abilities, security practices and logic applied to the algorithms.
  • Exist in competitive space with lots of bad actors, and may want to limit this behavior.
  • What is accomplished isn't really that defensible, and the only advantage is to keep hidden.

I have no problem making an argument for algorithmic transparency when it comes to regulated industries, like financial, healthcare, and education. I think it should be default in all civic, non-profit, and other similar scenarios where the whole stack should just be open sourced, and available on Github. You won't find me pushing back to hard on the startups unless I see some wild claims about the magic behind, or I see evidence of exploitation, then you will hear me rant about this some more.

Algorithmic transparency can help limit algorithmic exploitation and the other shady shit that is going on behind the scenes on a regular basis these days. I have added an algorithm section to my research, and as I see more talk about the magic of algorithms, and how these amazing creations are changing our world--I am going to be poking around a bit, and probably asking to see more algorithmic transparency when I think it makes sense.


The Opportunity For API-Driven Algorithmic Transparency At The Mobile Data Plan Level

API Evangelist is focused on helping push for sensible API-driven transparency wherever I can get it. When done in sensible ways an API can crack open the often black box that is the algorithm, giving us access and more control over our online experience.

One of the most significant algorithmic bottlenecks that govern our daily lives is our mobile data plans. All of our mobile phones are governed at the data plan level--this is where the telecom companies make their money, throttling the bits and bytes we depend on each day. 

The mobile data plan is a great place to discuss the algorithmic and data transparency that APIs can assist with, and one example of this in action is with the Google Mobile Data Plan API. Google wants more access at this level to improve the quality of experience for end-users when using mobile applications like Youtube, which can be severely impacted by data plan limits, while also significantly impacting your data plan consumption if not optimized.

There is so much opportunity for discussion between mobile network operator, API providers, developers, consumers at the data plan level. I know mobile network operators would rather keep this a black box, so they can maximize their revenue, but when you crack the network layer open with a publicly available API, there will be a number of new revenue opportunities.

Data plans affect all of us, every single day. We need more transparency into the algorithms that meter, limit, and charge us at the mobile data plan layer. We need the platforms we depend on each day to have more tools to optimize how applications consume (or do not consume) this extremely (seemingly) finite, and valuable resource (thanks, telcos!). 

I've added an algorithms area to my research to keep an eye on this topic, curate stories I find, and share my own thoughts when it comes to algorithmic transparency using APIs.


Keeping A Window Open Into How Power Flows Within Algorithms Using APIs

I just read The Pill versus the Bomb: What Digital Technologists Need to Know About Power, by Tom Steinberg (@steiny), and I'm reminded of the important role APIs will (hopefully) continue to play in helping provide a transparent window into some of the power structures being coded into the algorithms we are increasingly relying on in this digital world we are crafting.

In this century, we are seeing a huge shift in how power flows, and despite the rhetoric of some of the Silicon Valley believers, this power isn't always being democratized along the way. Much of the older power structures is just being re-inscribed into the algorithms that drive network switches, decide pricing when purchasing online, via our online banking, and virtually ever other aspect of our personal and business worlds.

APIs give us a window into how these algorithms work, providing access to 3rd party developers, government regulators, journalists, and many other essential actors across our society and economy. Don't get me wrong, APIs are no magic pill, or nuclear bomb, when it comes to making algorithmic power flows more transparent and equitable, but when they are done right, they can have a significant effect.

If APIs are a complete (or near complete) representation of the algorithms that are driving platforms, they can be used to better understand how decisions behind the algorithmic curtain are made, and exactly how power is flowing (or not) on web, mobile, and increasingly connected device platforms--API does not equal perfect transparency, but will help prevent all algorithms from being black boxes.

We may not fully understand Uber's business motivations, but through their API we can test our assumptions. We may not always trust Facebook's advertising algorithm, but using the API we can develop models for better understanding why they serve the ads they do. Drone operators may not always have the best intentions, but through mandatory device APIs, we can log flight times and locations. These are just a handful of examples that APIs can be used to map out digital power.

All of this is one of the main reasons that I do API Evangelist. I feel like we have a narrow window of opportunity to help ensure APIs act as this essential transparent layer for ALL API operations across industries. As the established power structures (eye of Sauron) turn their attention to the web, and increasingly APIs, their powers of transparency are becoming more diminished. It is up to us API Evangelists, to help make sure APIs stay publicly available to 3rd party developers, government, journalists, end users, and other key players--providing much needed transparency into how algorithms work, and how power is flowing on the web and mobile Internet.


If you think there is a link I should have listed here feel free to tweet it at me, or submit as a Github issue. Even though I do this full time, I'm still a one person show, and I miss quite a bit, and depend on my network to help me know what is going on.