RSS

Algorithm & APIs News

These are the news items I've curated in my monitoring of the API space that have some relevance to the algorithm conversation and I wanted to include in my research. I'm using all of these links to better understand how the space is testing their APIs, going beyond just monitoring and understand the details of each request and response.

APIs And Other Ways Of Serving Up Machine Learning Models

As with most areas of the tech sector, behind the hype there are real world things going on, and machine learning is one area I’ve been studying, learning, and playing withd what is actually possible when it comes to APIs. I’ve been studying the approach across each of the major cloud platforms, including AWS, Azure, and Google t push forward my understanding of the ML landscape. Recently the Google Tensorflow team released an interesting overview of how they are serving up Tensorflow models, making machine learning accessible across a wide variety of use cases. Not all of these are API specific, but I do think they are should be considered equally as part of the wider machine learning (ML) application programming interface (API) delivery approach.

Over the past year and half, with the help of our users and partners inside and outside of Google, TensorFlow Serving has advanced performance, best practices, and standards for ML delivery:

  • Out-of-the-box Optimized Serving and Customizability - We now offer a pre-built canonical serving binary, optimized for modern CPUs with AVX, so developers don’t need to assemble their own binary from our libraries unless they have exotic needs. At the same time, we added a registry-based framework, allowing our libraries to be used for custom (or even non-TensorFlow) serving scenarios.
  • Multi-model Serving - Going from one model to multiple concurrently-served models presents several performance obstacles. We serve multiple models smoothly by (1) loading in isolated thread pools to avoid incurring latency spikes on other models taking traffic; (2) accelerating initial loading of all models in parallel upon server start-up; (3) multi-model batch interleaving to multiplex hardware accelerators (GPUs/TPUs).
  • Standardized Model Format - We added SavedModel to TensorFlow 1.0, giving the community a single standard model format that works across training and serving.
  • Easy-to-use Inference APIs - We released easy-to-use APIs for common inference tasks (classification, regression) that we know work for a wide swathe of our applications. To support more advanced use-cases we support a lower-level tensor-based API (predict) and a new multi-inference API that enables multi-task modeling.

I like that easy-to-use APIs are on the list. I predict that ease of use, and a sensible business model will provide to be just as important as the ML model itself. Beyond these core approaches, the Tensorflow team is also exploring two experimental areas of ML delivery:

  • Granular Batching - A key technique we employ to achieve high throughput on specialized hardware (GPUs and TPUs) is “batching”: processing multiple examples jointly for efficiency. We are developing technology and best practices to improve batching to: (a) enable batching to target just the GPU/TPU portion of the computation, for maximum efficiency; (b) enable batching within recursive neural networks, used to process sequence data e.g. text and event sequences. We are experimenting with batching arbitrary sub-graphs using the Batch/Unbatch op pair.
  • Distributed Model Serving - We are looking at model sharding techniques as a means of handling models that are too large to fit on one server node or sharing sub-models in a memory-efficient way. We recently launched a 1TB+ model in production with good results, and hope to open-source this capability soon.

I’m working to develop my own ML models, and actively using a variety of them available over at Algorithmia. I’m pushing forward a project to help me quantify and categorize ML solutions that I come across, and when they are delivered using an API, I am generating an OpenAPI, and applying a set of common tags to help me quantify, categorize, and index all the ML APIs. While I am focused on the technology, business, and politics of ML APIs, I’m interested in understanding how ML models are being delivered, so that I can help keep pace with exactly how APIs can augment these evolving practices, making ML more accessible, as well as observable.

We are in the middle of the algorithmic evolution of the API space, where we move beyond just data and content APIs, and where ML, AI, and other algorithmic voodoo is becoming more mainstream. I feel pretty strongly that APIs are an import piece of this puzzle, helping us quantify and understand what ML algorithms do, or don’t do. No matter how the delivery model for ML evolves, batches, distributes, or any other mutation, there should be an API layer for accessing, auditing, and shining a light on what is going on. This is why I regularly keep an eye on what teams like Tensorflow are up to, so that I can speak intelligently to what is happening on this evolving ML front.


Algorithmic Observability Should Work Like Machine Readable Food Labels

I’ve been doing a lot of thinking about algorithmic transparency, as well as a more evolved version of it I’ve labeled as algorithmic observability. Many algorithmic developers feel their algorithms should remain black boxes, usually due to intellectual property concerns, but in reality the reasons will vary. My stance is that algorithms should be open source, or at the very least have some mechanisms for auditing, assessing, and verifying that algorithms are doing what they promise, and that algorithms aren’t doing harm behind the scenes.

This is a concept I know algorithm owners and creators will resist, but algorithms observability should work like food labels, but work in a more machine readable way, allowing them to be validated by other external (or internal) systems. Similar to food you buy in the store, you shouldn’t have to give away the whole recipe and secret sauce behind your algorithm, but there should be all the relevant data points, inputs, outputs, and other “ingredients” or “nutrients” that go into the resulting algorithm. I talked about algorithm attribution before, and I think there should be some sort of algorithmic observability manifest, which provides the “label” for an algorithm in a machine readable format. It should give all the relevant sources, attribution, as well as input and outputs for an algorithm–with different schema for different industries.

In addition to there being an algorithmic observability “label” available for all algorithms, there should be live, or at least virtualized, sandboxed instances of the algorithm for verification, and auditing of what is provided on the label. As we saw with the Volkswagen emissions scandal, algorithm owners could cheat, but it would provide an important next step for helping us understand the performance, or lack of performance when it comes to the algorithms we are depending on. Why I call this algorithmic observability, instead of algorithmic transparency, is each algorithm should be observable using it’s existing inputs and outputs (API), and not just be a “window” you can look through. It should be machine readable, and audit-able by other systems in real time, and at scale. Going beyond just being able to see into the black box, but also be able to assess, and audit what is occurring in real time.

Algorithmic observability regulations would work similar to what we see with food and drugs, where if you make claims about your algorithms, they should have to stand up to scrutiny. Meaning there should be standardized algorithmic observability controls for government regulators, industry analysts, and even the media to step up and assess whether or not an algorithm lives up to the hype, or is doing some shady things behind the scenes. Ideally this would be something that technology companies would do on their own, but based upon my current understanding of the landscape, I’m guessing that is highly unlikely, and will be something that has to get mandated by government in a variety of critical industries incrementally. If algorithms are going to impacting our financial markets, elections, legal systems, education, healthcare, and other essential industries, we are going to have to begin the hard work of establishing some sort of framework to ensure they are doing what is being sold, and not hurting or exploiting people along the way.


Algorithmic Observability Should Work Like Machine Readable Food Labels

I’ve been doing a lot of thinking about algorithmic transparency, as well as a more evolved version of it I’ve labeled as algorithmic observability. Many algorithmic developers feel their algorithms should remain black boxes, usually due to intellectual property concerns, but in reality the reasons will vary. My stance is that algorithms should be open source, or at the very least have some mechanisms for auditing, assessing, and verifying that algorithms are doing what they promise, and that algorithms aren’t doing harm behind the scenes.

This is a concept I know algorithm owners and creators will resist, but algorithms observability should work like food labels, but work in a more machine readable way, allowing them to be validated by other external (or internal) systems. Similar to food you buy in the store, you shouldn’t have to give away the whole recipe and secret sauce behind your algorithm, but there should be all the relevant data points, inputs, outputs, and other “ingredients” or “nutrietns” that go into the resulting algorithm. I talked about algorithm attribution before, and I think there should be some sort of algorithmic observability manifest, which provides the “label” for an algorithm in a machine readable format. It should give all the relevant sources, attribution, as well as input and outputs for an algorithm–with different schema for different industries.

In addition to there being an algorithmic observability “label” available for all algorithms, there should be live, or at least virtualized, sandboxed instances of the algorithm for verification, and auditing of what is provided on the label. As we saw with the Volkswagen emissions scandal, algorithm owners could cheat, but it would provide an important next step for helping us understand the performance, or lack of performance when it comes to the algorithms we are depending on. Why I call this algorithmic observability, instead of algorithmic transparency, is each algorithm should be observable using it’s exsiting inputs and outputs (API), and not just be a “window” you can look through. It should be machine readable, and auditable by other systems in real time, and at scale. Going beyond just being able to see into the black box, but also be able to assess, and audit what is occurring in real time.

Algorthmic observability regulations would work similar to what we see with food and drugs, where if you make claims about your algorithms, they should have to stand up to scrutiny. Meaning there should be standardized algorithmic observability controls for government regulators, industry analysts, and even the media to step up and assess whether or not an algorithm lives up to the hype, or is doing some shady things behind the scenes. Ideally this would be something that technology companies would do on their own, but based upon my current understanding of the landscape, I’m guessing that is highly unlikely, and will be something that has to get mandated by government in a variety of critical industries incrementally. If algorithms are going to impacting our financial markets, elections, legal systems, education, healthcare, and other essential industries, we are going to have to begin the hardwork of establishing some sort of framework to ensure they are doing what is being sold, and not hurting or exploiting people along the way.


Publish, Share, Monetize Machine Learning APIs

I’ve been playing with Tensor Flow for over a year now, specifically when it comes to working with images and video, but it has been something that has helped me understand what things looks like behind the algorithmic curtain that seems to be part of a growing number of tech marketing strategies right now. Part of this learning is exploring beyond Google’s approach, who is behind Tensor Flow, and understand what is going on at AWS, as well as Azure. I’m stil getting my feet wet learning about what Microsoft is up to with their platform, but I did notice one aspect of the Azure Machine Learning Studio emphasized developers to, “publish, share, monetize” their ML models. While I’m sure there will be a lot of useless vapor ware being sold within this realm, I’m simply seeing it as the next step in API monetization, and specifically the algorithmic evolution of being an API provider.

As the label says in the three ML models for sale in the picture, this is all experimental. Nobody knows what will actually work, or even what the market will bear. However, this is something APIs, and the business of APIs excel at. Making a digital resource available to consumers in a retail, or even wholesale way via marketplaces like Azure and AWS, then playing around with features, pricing, and other elements, until you find the sweet spot. This is how Amazon figured out the whole cloud computing game, and became the leader. It is how Twilio, Stipe and other API as a product companies figured out what developers needed, and what these markets would bear. This will play out in marketplaces like Azure and Google, as well as startup players like Algorithmia–which is where I’ve been cutting my teeth, and learning about ML.

The challenge for ML API entrepreneurs will be helping consumers understand what their models do, or do not do. I see it as an opportunity, because there will be endless amounts of vapor ware, ML voodoo, and smoke and mirrors trying to trick consumers into buying something, as well as endless traps when it comes to keeping them locked in. If you are actually doing something interesting with ML, and it actually provides value in the business world, and you provide clear, concise, no BS language about what it does–you are going to do well. The challenge for you will be getting found in the mountains of crap that is emerging, and differentiating yourself from the smoke and mirrors that we are already seeing so much of. Another challenge you’ll face is navigating the vendor platform course set up by AWS, Google, and Azure as they battle it out for dominance–a game that many of us little guys will have very little power to change or steer.

It is a game that I will keep a close eye on. I’m even pondering publishing a handful of image manipulation models I’ve been working on. IDK. I do not think they are quite ready, and I’m not even entirely sure they are something I want widely used. I’m kind of enjoying using them in my own work, providing me with images I can use in my storytelling. I don’t think the ROI is there yet in the ML API game, and I’ll probably just keep being a bystander, and analyst on the sideline until I see just the right opportunity, or develop just the right model I think will stand out. After seven years of doing API Evangelist I’m pretty good at seeing through the BS, and I’m thinking this experience is going to come in handy in this algorithmic evolution of the API universe, where the magic of AI and ML put so many people under their spell.


Temporal Logic of Actions For APIs

I’m evolving forward my thoughts on algorithmic observability and transparency using APIs, and I was recently introduced to TLA+, or the Temporal Logic of Actions. It is the closest I’ve come to what I’m seeing in my head when I think about how we can provide observability into algorithms through existing external outputs (APIs). As I do with all my work here on API I want to process TLA+ as part of my API research, and see how I can layer it in with what I already know.

TLA+ is a formal specification language developed by Leslie Lamport, which can be used to design, model, document, and verify concurrent systems. It has been described as exhaustively-testable pseudocode which can provide a blueprint for software systems. In the context of design and documentation, TLA+ can be viewed as informal technical specifications. However, since TLA+ specifications are written in a formal language of logic and mathematics it can be used to uncover design flaws before system implementation is underway, and are amenable to model checking for finding all possible system behaviours up to some number of execution steps, and examines them for violations. TLA+ specifications use basic set theory to define safety (bad things won’t happen) and temporal logic to define liveness (good things eventually happen).

TLA+ specifications are organized into modules.Although the TLA+ standard is specified in typeset mathematical symbols, existing TLA+ tools use symbol definitions in ASCII, using several terms which require further definition:

  • State - an assignment of values to variables
  • Behaviour - a sequence of states
  • Step - a pair of successive states in a behavior
  • Stuttering Step - a step during which variables are unchanged
  • Next-State Rlation - a relation describing how variables can change in any step
  • State Function - an expression containing variables and constants that is not a next-state relation
  • State Predicate - a Boolean-valued state function
  • Invariant - a state predicate true in all reachable states
  • Temporal Formula - an expression containing statements in temporal logic

TLA+ is concerned with defining the correct system behavior, providing with a set of operators for working through what is going on, as well as working with data structures. There is tooling that has been developed to support TLA+ including an IDE, model checker, and proof system. It is all still substantially over my head, but I get what is going on enough to warrant moving forward, and hopefully absorbing more on the subject. As with most languages and specifications I come across it will just take some playing with, and absorbing the concepts at play, before things will come into focus.

I’m going to pick up some of my previous work around behavior driven assertions, and how assertions can be though of in terms of the business contracts APIs put forward, and see where TLA+ fits in. It’s all still fuzzy, but API assertions and TLA+ feels like where I want to go with this. I’m thinking about how we can wrap algorithms in APIs, write assertions for them, and validate across the entire surface area of an algorithm, or stack of API exposed algorithms using TLA+. Maybe I’m barking up the wrong tree, but if nothing else it will get me thinking more about this side of my API research, which will push forward my thoughts on algorithmic transparency, and audit-able observability.


The API Stack For Disrupting The World

I know people don’t understand why I’m so obsessed with APIs. Sometimes I ask the same question. When I began in 2010, it was 75% about my belief in the good that APIs can do, and 25% about pushing back on the bad things being done with APIs. In 2017, it is 15% about the good, and 85% about pushing back on the bad things that APIs can do. API driven platforms are being used for some pretty shady things these days, and increasingly they are a force for disruption, and not about making the world a better place.

With this in mind, I wanted to take a moment to highlight the API stack right now that is being used to disrupt the world around us. These are the APIs that have shifted the political landscape in the U.S., and are being used to replicate, automate, and scale this disruption around the world.

  • Facebook - The network effect is what brings the troublemakers to Facebook.They are on pace to have 2 billion active users. Something that has the potential to create quite a network effect when sharing stories and links, and when you seed that, target it, and grow it using the Facebook advertising engine–it makes for an excellent engine for disruption.
  • Twitter - Twitter is a different beast. Less of the mainstream population than Facebook enjoys, but still a sizable, and very public audience. You can use the Twitter engine to spin things up, get people sharing, do some of the same sharing of stories and links, seeding, targeting, and growing with advertising. Often times the viral nature will spread to Facebook on take on a life of its own.
  • Reddit - Now Reddit is entirely just an organic engine for disseminating information, which makes it great for propaganda, everything fake, and stoking the haters. The network effect that is Reddit, works very, very well will Twitter and Facebook, making for a perfect storm of virality that can spread like wildfire.
  • WordPress - WordPress is where the news, and other websites get published. Because WordPress is an open source solution, it can be installed anywhere. It can be installed as many times as you want, with no costs beyond your hosting. Each of those installations have an API, which allow you to easily publish across hundreds or thousands of installations. When you slap advertising on these beasts, and plant your seeds across Facebook, Twitter, and Reddit, you make for a pretty efficient propaganda machine.
  • Google - Google is the advertising engine for use on the WordPress sites. Google Adwords and Adsense are where disruptors buy the ads they need to plant seeds, but more importantly, it is how they generate revenue from the click throughs, and page views generated from the Facebook, Twitter, and Reddit network effects. Beyond advertising, the Google index provides another great way for the message to spread. All you have to do is play the SEO game, which is something that is greatly aided, and gamed, by the inbound traffic received from Facebook, Twitter, and Reddit.

This is a pretty simplistic snapshot of the API surface area that is being used to mess with our realities right now, but if you break down the number API resources available for each of these platforms, you begin to see all the knobs and dials that can be turned at scale, to disrupt the world. I’ll write another post that walks through all the paths and endpoints, but I want to look at the problem from the 100K view, and point the finger at APIs. Without them, such a small group of people wouldn’t be able to do such a huge amount of damage.

The key driver of disruption here is the advertising engine brought to the table by Google, Facebook, and Twitter, which allows troublemakers to sew the seeds of destruction, as well as target, and grow revenue that supports their efforts. Secondarily, the network effect of Facebook, Twitter, and Reddit allow for the larger demographics to be stirred up, and mobilized, and potentially completing the loop for very disruptive memes. When advertising alone can’t drum up the attention needed, the API driven bot armies on Twitter, Facebook, and Reddit pick up the slack, get to work voting up, liking, sharing, until each platforms algorithms get triggered, and do the rest of the work. Not every message will enjoy the virality necessary to do damage, but every once in awhile the disruptors will hit the big time, and there message will take on a life of its own, with each platform doing the rest of the work for them.

All of this would be possible without APIs. However, it would take armies of people to do. APIs are the bullhorn, the amplification, the automation needed to scale this type of disruption. In my next post I will publish a complete list of the knobs and dials that can be turned by the disruptors, to crank up the volume on their campaigns, and to direct their bot armies in support of a specific message, or to attack a target. All of the advertising and targeting engines for the platforms above have APIs, allow campaigns to be automated, and scaled for both spending, and generating revenue. WordPress APIs make the open source platform into a kind of printing press for the propaganda machine, with Facebook, Twitter, Google, and Reddit APIs acting as the messenger boys. APIs are the key element, that makes all of this seem so deafening at the moment.

APIs are not evil, nor are they good, or even neutral. They are just a tool. They only do what the platform operators design them to do, and what the API consumer decides to put it to work doing. This is what has made APIs so interesting to me. When you set this stage, the possibilities for innovation have been great. However, recent history has shown, when advertising is the main revenue engine for both the platforms and consumers, the API providers are more than willing to ignore and look the other way at what is going on. This has made for a perfect storm of disruption at a scale we’ve never seen before, and because of the technical complexity, it is something that most people don’t even see happening. They can’t see the strings. They can’t see how their world is being disrupted. They don’t see their role in it. They don’t understand that things are being amplified and algorithmically distorted–they just think that is the way the world is.


I Do Not Distort My Images With Machine Learning Because They Look Better

I have been playing with machine learning since the election. I started a side project I have called algorotoscope, which I started applying texture transfer machine learning algorithms to videos. I don’t have the budget I did around the holidays, so it has been reduced to just photos, but it is something I dedicate regular time to each week. Many of the photos I apply the filters to actually look better than the filtered images, but yet I keep doing it. Not because they look better or worse, but because I want to show how our world is increasingly being distorted with algorithms.

Historically, I often used Noun Project images in my stories, because it reflected the minimalist look of my website. After the 2016 presidential election things changed for me radically. It has been a build over the last several years, but during this election it became clear that we were going to be living in a permanent state of algorithmic distortion from here on out. Now, I am a poor to mediocre photographer, but I love taking photos, and playing with my drone and other video cameras. I enjoy using these photos in my storytelling, but I feel that the algorithmic filters I can apply to them add another dimension to my storytelling.

Most of the time the meaning behind is only something I will understand, but other times I will tell the story behind. Regardless of my approach I feel like algorithmically distorted images go well with my API storytelling. Not only are APIs being used to apply artificial intelligence and machine learning, but they are being used to algorithmically distort almost everything else in our lives, from our Twitter and Facebook walls, to the videos and images we see on Instagram, Snapchat, and Youtube. Even if my algorithmic distortion doesn’t convey the direct meaning I intended with each story I tell, and image I include, I think the regular reminder that algorithmic distortion exists is an important reminder for us all, and something that should be recognizable throughout our online experiences.

One thing that is different with my image texture transfers from the majority of platforms you are seeing is I am using Algorithmia’s Style Thief, which allows me to choose both the source of the filter, as well as the image I’m applying to. This gives me a wider range of which textures I’m transferring, and in my opinion, more control over what meaning gets extracted, transferred, and applied to each images. Also, 98% the images I’m filtering are my own, taken either on my iPhone, my Canon, Drone, or Osmo equipment. I’m slowly working to get my image act together so I can more efficient recall images. I’m also working to build a library of art, propaganda, and other work that I can borrow (steal) the textures from and apply to my work. I’m also working to maintain some level of attribution in this work, allowing me to cite where I derive filters, and recreate distortion that works for me.

Not sure where this is all going, but it is something I’ll keep playing with alongside my regular storytelling. For me, it is a vehicle for pushing my storytelling forward, while also providing a constant reminder for myself, and my readers about how APIs and algorithms are distorting everything we know today. It is something we have to remember, other wise I’m afraid we won’t be able to even tread water in this new environment we’ve created for ourselves.


I Do Not Fear AI, I Fear The People Doing AI

There is a lot of FUD out there when it comes to artificial intelligence (AI) and machine learning (ML). The tech press enjoy yanking people’s chain when it comes to the dangers of artificial intelligence. AI is coming for your jobs. AI is racist, sexist, and biased. AI will be lead to World War III. AI will secure and protect us from the bad out there. AI will be the source of all of our worries, and the solution to all of our worries. I’m interested in the storytelling around all of this, and I’m fascinated by the distracting quality of technology when it comes to absolving the humans behind of doing bad things.

We have the technology to make this black boxes more observability and accountable. The algorithms feeding us news, judging us in courtrooms, and deciding if we are insurable or a risk, can all be wrapped with APIs, and made more accountable. However, there are many human reasons why we don’t do this. Every AI out there can be held accountable, it isn’t rocket science. The technology exists to keep AI from hurting us, judging us, and impacting our lives in negative ways. However, it is the people behind who do not want it, otherwise their stories won’t work. Their stories won’t have the desired effect and control over our lives.

APIs are the layer being wielded for good and for bad on the Internet. Facebook, Twitter, and Reddit, all leverage APIs to be available on our mobile phones. APIs are how people automate, advertise, and fund their activities on their platforms. APIs are how AI and ML are being exposed, wielded, and leveraged. The technology is already there to make them more accountable, we just don’t have the human will to use the technology we have. There is more money to be made in telling wild stories about what is possible. Telling stories that make folks afraid, and in awe of what is possible with technology. APIs are used to tell you the stories, while also making the fire shoot from the stage, and the smoke and the mirrors operate, instead of helping us see, understand, and verify what is going on behind the scenes.

We rarely discuss the fact that AI isn’t coming for our jobs. It is the people behind the AI, at the companies developing, deploying, and operating AI that are coming for our jobs. AI, like APIs, are neither good, nor bad, nor neutral–they are a tool. They are technology, and anything they do is because of us humans. I don’t fear AI. I only fear the people doing AI. The people who tell the stories. The people who are believers. I don’t fear technology because I know we have the tools to do what is right, and hold the people who are using technology in bad ways accountable. I’m afraid because we don’t seem to have the will to look behind the curtain. We hold up many of the people telling stories about AI as visionaries, leaders, and truth tellers. I don’t fear AI, I only fear its followers.


Specialized Collections Of Machine Learning APIs Could Be Interesting

I was learning more about CODEX, from Algorithmia, their enterprise platform for deploying machine learning API collections on premise or in the cloud. Algorithmia is taking the platform in which their algorithmic marketplace is deployed on and making it so you can deploy it anywhere. I feel like this is where the algorithmic-centered API deployment is heading, potentially creating some very interesting, and hopefully specialized collections of machine learning APIs.

I talked about how the economics of what Algorithmia is doing interests me. I see the potential when it comes to supporting machine learning APIs that service an image or video processing pipeline–something I’ve enjoyed thinking about with my drone prototype. Drone is just one example of how specialized collections of machine learning APIs could become pretty valuable when they are deployed exactly where they are needed, either on-premise or in any of the top cloud platforms.

Machine learning marketplaces operated by the cloud giants will ultimately do fine because of their scale, but I think where the best action will be at is delivering curated, specialized machine learning models, tailored to exactly what people need, right where they need them–no searching necessary. I think recent moves by Google to put TensorFlow on mobile phones, and Apple making similar moves show signs of a future where our machine learning APIs are portable, operating on-premise, on-device, and on-network.

I see Algorithmia having two significant advantages right now. 1) they can deploy their marketplace anywhere, and 2) they have the economics, as well as the scaling of it figured out. Allowing for specialized collections of machine learning APIs to have the metering, and revenue generation engines built into them. Imagine a future where you can deploy and machine learning and algorithmic API stack within any company or institution, or the factory floor in an industrial setting, and out in the field in an agricultural or mining situation–processing environmental data, images, or video.

Exploring the possibilities with real world use cases of machine learning is something I enjoy doing. I’m thinking I will expand on my drone prototype and brainstorm other interesting use cases beyond just my drone video. Thinking about how I can develop prototype machine learning API collections, that could be used for a variety my content, data, image, or video side-projects. I think when it comes to machine learning I’m more interested in specialty collections over the general machine learning hype I”m seeing peddled in the mainstream right now.


Algorithmic Observability In Predictive Policing

As I study the world of APIs I am always on the lookout for good examples of APIs in action so that I can tell stories about them, and help influence the way folks do APIs. This is what I do each day. As part of this work, I am investing as much time as I can into better understanding how APIs can be used to help with algorithmic transparency, and helping us see into the black boxes that often are algorithms.

Algorithms are increasingly driving vital aspects of our world from what we see in our Facebook timelines, to whether or not we would commit a crime in the eyes of the legal system. I am reading about algorithms being used in policing in the Washington Monthly, and I learned about an important example of algorithmic transparency that I would like to highlight and learn more about. A classic argument regarding why algorithms should remain closed is centered around intellectual property and protecting the work that gives you your competitive advantage–if you share your secret algorithm, your competitors will just steal it. While discussing the predictive policing algorithm, Rebecca Wexler explores the competitive landscape:

But Perlin’s more transparent competitors appear to be doing just fine. TrueAllele’s main rival, a program called STRmix, which claims a 54 percent U.S. market share, has an official policy of providing defendants access to its source code, subject to a protective order. Its developer, John Buckleton, said that the key to his business success is not the code, but rather the training and support services the company provides for customers. “I’m committed to meaningful defense access,” he told me. He acknowledged the risk of leaks. “But we’re not going to reverse that policy because of it,” he said. “We’re just going to live with the consequences.”

And remember PredPol, the secretive developer of predictive policing software? HunchLab, one of PredPol’s key competitors, uses only open-source algorithms and code, reveals all of its input variables, and has shared models and training data with independent researchers. Jeremy Heffner, a HunchLab product manager, and data scientist explained why this makes business sense: only a tiny amount of the company’s time goes into its predictive model. The real value, he said, lies in gathering data and creating a secure, user-friendly interface.

In my experience, the folks who want to keep their algorithms closed are simply wanting to hide incompetence and shady things going on behind the scenes. If you listen to individual companies like Predpol, it is critical that algorithms stay closed, but if you look at the wider landscape you quickly realize this is not a requirement to stay competitive. There is no reason that all your algorithms can’t be wrapped with APIs, providing access to the inputs and outputs of all the active parts. Then using modern API management approaches these APIs can be opened up to researchers, law enforcement, government, journalists, and even end-users who are being impacted by algorithmic results, in a secure way.

I will be continuing to profile the algorithms being applied as part of predictive policing, and the digital legal system that surrounds it. As with other sectors where algorithms are being applied, and APIs are being put to work, I will work to find positive examples of algorithmic transparency like we are seeing from STRmix and HunchLab. I’d like to learn more about their approach to ensuring observability around these algorithms, and help showcase the benefits of transparency and observability of these types of algorithms that are impacting our worlds–helping make sure everyone knows that black box algorithms are a thing of the past, and the preferred approach of snake oil salesman.


Algorithmic Observability In Predictive Policing

As I study the world of APIs I am always on the lookout for good examples of APIs in action so that I can tell stories about them, and help influence the way folks do APIs. This is what I do each day. As part of this work, I am investing as much time as I can into better understanding how APIs can be used to help with algorithmic transparency, and helping us see into the black boxes that often are algorithms.

Algorithms are increasingly driving vital aspects of our world from what we see in our Facebook timelines, to whether or not we would commit a crime in the eyes of the legal system. I am reading about algorithms being used in policing in the Washington Monthly, and I learned about an important example of algorithmic transparency that I would like to highlight and learn more about. A classic argument regarding why algorithms should remain closed is centered around intellectual property and protecting the work that gives you your competitive advantage–if you share your secret algorithm, your competitors will just steal it. While discussing the predictive policing algorithm, Rebecca Wexler explores the competitive landscape:

But Perlin’s more transparent competitors appear to be doing just fine. TrueAllele’s main rival, a program called STRmix, which claims a 54 percent U.S. market share, has an official policy of providing defendants access to its source code, subject to a protective order. Its developer, John Buckleton, said that the key to his business success is not the code, but rather the training and support services the company provides for customers. “I’m committed to meaningful defense access,” he told me. He acknowledged the risk of leaks. “But we’re not going to reverse that policy because of it,” he said. “We’re just going to live with the consequences.”

And remember PredPol, the secretive developer of predictive policing software? HunchLab, one of PredPol’s key competitors, uses only open-source algorithms and code, reveals all of its input variables, and has shared models and training data with independent researchers. Jeremy Heffner, a HunchLab product manager, and data scientist explained why this makes business sense: only a tiny amount of the company’s time goes into its predictive model. The real value, he said, lies in gathering data and creating a secure, user-friendly interface.

In my experience, the folks who want to keep their algorithms closed are simply wanting to hide incompetence and shady things going on behind the scenes. If you listen to individual companies like Predpol, it is critical that algorithms stay closed, but if you look at the wider landscape you quickly realize this is not a requirement to stay competitive. There is no reason that all your algorithms can’t be wrapped with APIs, providing access to the inputs and outputs of all the active parts. Then using modern API management approaches these APIs can be opened up to researchers, law enforcement, government, journalists, and even end-users who are being impacted by algorithmic results, in a secure way.

I will be continuing to profile the algorithms being applied as part of predictive policing, and the digital legal system that surrounds it. As with other sectors where algorithms are being applied, and APIs are being put to work, I will work to find positive examples of algorithmic transparency like we are seeing from STRmix and HunchLab. I’d like to learn more about their approach to ensuring observability around these algorithms, and help showcase the benefits of transparency and observability of these types of algorithms that are impacting our worlds–helping make sure everyone knows that black box algorithms are a thing of the past, and the preferred approach of snake oil salesman.


API Wrappers To Help Bring Machine Learning Into Focus

I was taking a look at the Tensorflow Object Detection API, and while I am interested in the object detection, the usage of API is something I find more intriguing. It is yet another example of how diverse APIs can be. This is not a web API, but an API on top of a single dimension of the machine learning platform TensorFlow.

“The TensorFlow Object Detection API is an open source framework built on top of TensorFlow that makes it easy to construct, train and deploy object detection models.” It is just a specialized code base helping abstract away the complexity of one aspect of using TensorFlow, specifically for detecting objects in images. You could actually wrap this API with another web API and run on any server or within a single container as a proper object recognition API.

For me, it demonstrates one possible way of wrapping a single or cross section of a machine learning implementation to abstract away the complexity and helping you train and deploy ML models in this particular area. This approach to deploying an API on top of ML shows that you can use to APIs to help simplify and abstract ML for developers. This can be done to help satisfy business, regulatory, privacy, security, and other real or perceived concerns when it comes to artificial intelligence, machine learning, or any other digital voodoo that resembles magic.

No matter how complex the inputs and outputs of an algorithm are, you can always craft an API wrapper, or series of API wrappers that help others make sense of those inputs, from a technical, business, or even political perspective. I just wanted to highlight this example of ML being wrapped with an API, even if it isn’t for all the same reasons that I would be doing it. It’s just part of a larger toolbox I’m looking to create to help me make the argument for more algorithmic transparency in the machine learning platforms we are developing.


Algorithmia Invests More Resources Into Machine Learning APIs For Working With Video

I got my regular email from Algorithmia this last week and I like where they are going with some of their machine learning APIs. They have been heavily investing in machine learning applied to video, allowing for the extraction of information from video, as well as applying interesting transformations to your videos.

Here are some of the video tools they have been working on:

These are all things I’m interested in using as part of my drone and other video work that I’ve been working on as a hobby. I’m interested in the video pipeline aspect because it’s fun to work with the video I capture, but I also see the potential when it comes to drones in agriculture and mining, and I am also curious the business models associated with this type of a video pipeline. I think video, images, plus APIs, coupled with the API monetization strategy Algorithmia already has in place is their formula for success.

I’m keeping an eye on what Amazon, Google, and Microsoft are up to, but I think Algorithmia has a first mover advantage when it comes to the economic of all of this. I’m glad they are investing more into their video resources. I think there are endless uses for API-driven pipelines that process images and video and apply machine learning models using APIs, then metered, and made available via an algorithmic catalog like Algorithmia offers.


Exploring The Economics of Wholesale and Retail Algorithmic APIs

I got sucked into a month long project applying machine learning filters to video over the holidays. The project began with me doing the research on the economics behind Algorithmia's machine learning services, specifically the DeepFilter algorithm in their catalog. My algorithmic rotoscope work applying Algorithmia's Deep Filters to images and drone videos has given me a hands-on view of Algorithmia's approach to algorithms, and APIs, and the opportunity to think pretty deeply about the economics of all of this. I think Algorithmia's vision of all of this has a lot of potential for not just image filters, but any sort of algorithmic and machine learning API.

Retail Algorithmic and Machine Learning APIs
Using Algorithmia is pretty straightforward. With their API or CLI you can make calls to a variety of algorithms in their catalog, in this case their DeepFilter solution. All I do is pass them the URL of an image, what I want the new filtered image to be called, and the name of the filter that I want to be applied. Algorithmia provides an API explorer you can copy & paste the required JSON into, or they also provide a demo application for you to use--no JSON required. 

Training Your Own Style Transfer Models Using Their AWS AMI
The first "rabbit hole" concept I fell into when doing the research on Algorithmia's model was their story on creating your own style transfer models, providing you step by step details on how to train them, including a ready to go AWS AMI that you can run as a GPU instance. At first, I thought they were just cannibalizing their own service, but then I realized it was much more savvier than that. They were offloading much of the costly compute resources needed to create the models, but the end product still resulted in using their Deep Filter APIs. 

Developing My Own API Layer For Working With Images and Videos
Once I had experience using Algorithmia's deep filter via their API, and had produced a handful of my own style transfer models, I got to work designing my own process for uploading and applying the filters to images, then eventually separating out videos into individual images, applying the filters, then reassembling them into videos. The entire process, start to finish is a set of APIs, with a couple of them simply acting as a facade for Algorithmia's file upload, download, and DeepFilter APIs. It provided me with a perfect hypothetical business for thinking through the economics of building on top of Algorithmia's platform.

Defining My Hard Costs of Algorithmia's Service and the AWS Compute Needed
Algorithmia provides a pricing calculator along with each of their algorithms, allowing you to easily predict your costs. They charge you per API call, and the compute usage by the second. Each API has its own calculator, and average runtime duration costs, so I'm easily able to calculate a per image cost to apply filters--something that exponentially grows when you are applying to 60 frames (images) per second of video. Similarly, when it comes to training filter models using AWS EC2 GUP instance, I have a per hour charge for compute, storage costs, and (now) a pretty good idea of how many hours it takes to make a single filter. 

All of this gives me some pretty solid numbers to work with when trying to build a viable business built on top of Algorithmia. In theory, when my customers use my algorithmic rotoscope image or video interface, as well as the API, I can cover my operating costs, and generate a healthy profit by charging a per image cost for applying a machine learning texture filter. What I really think is innovative about Algorithmia's approach is that they are providing an AWS AMI to offload much of the "heavy compute lifting", with all roads still leading back to using their service. It is a model that could quickly shift algorithmic API consumers to be more wholesale / volume consumers, from being just a retail level API consumer.

My example of this focuses on images and video, but this model can be applied to any type of algorithmically fueled APIs. It provides me with a model of how you can safely open source the process behind your algorithms as AWS AMI and actually drive more business to your APIs by evolving your API consumers into wholesale API consumers. In my experience, many API providers are very concerned with malicious users reverse engineering their algorithms via their APIs, when in reality, in true API fashion, there are ways you can actually open up your algorithms, make them more accessible, and deployable, while still helping contribute significantly to your bottom line.


Pushing For More Algorithmic Transparency Using APIs

I saw the potential for collaboration when it came to using web APIs back around 2004 and 2005. I was seeing innovative companies opening up their digital assets to the world using low-cost, efficient Internet technology like HTTP, opening things up for potentially interesting approaches to collaboration around the development of web and mobile applications on top of valuable digital resources. This approach has brought us valuable platforms like Amazon Web Services and SalesForce. 

Common API discussions tend to focus on providing APIs to an ecosystem of developers and encouraging the development of web and mobile applications, widgets, visualizations, and other integrations that benefit the platform. In the course of these operations, it is also customary to gather feedback from the community and work to evolve the APIs design, available resources, and even the underlying data model--extending collaboration to also be about the APIs, and underlying resources, in addition to just building things on top of the API.

This approach to designing, defining, and deploying APIs, and then also web and mobile applications on top of these APIs is nothing new, and is something that I have been tracking on for over the last six years. The transparency that can be injected into the evolution of data, content, and potentially the "algorithms behind" with APIs is significant, which is how it became such a big part of my professional mission, and fueling my drive to spread the "gospel" whenever and wherever I can. 

Ok, so how can APIs contribute to algorithmic transparency? To fully grasp where I am taking this, you need to understand that APIs can be used as an input and output for data, content, as well as algorithms. Let's use Twitter as an example. Using Twitter and the Twitter API I can read and write data about myself, or any user, using the /account and /users API endpoints--providing the content and data portion of what I am talking about.

When it comes to the algorithm portion, Twitter API has several methods, such as GET statuses/user_timelineGET statuses/home_timeline and GET search/tweets, which return a "timeline of Tweet data". In 2006 this timeline was just the latest Tweets from the users you follow, in sequential order. In 2016, you will get "content powered by a variety of signals". In short, the algorithm that drives the Twitter timeline is pretty complicated, with a number of things to consider:

  • Your home timeline displays a stream of Tweets from accounts you have chosen to follow on Twitter. 
  • You may see suggested content powered by a variety of signals. 
  • Tweets you are likely to care about most will show up first in your timeline. 
  • You may see a summary of the most interesting Tweets you received since your last visit
  • You may also see content such as promoted Tweets or Retweets in your timeline.
  • Additionally, when we identify a Tweet, an account to follow, or other content that's popular or relevant, we may add it to your timeline.

There are a number of considerations that would go into any one timeline response--this is Twitter's algorithm. While I technically have access to this algorithm via three separate API endpoints, there really isn't much algorithmic transparency present, beyond their overview in the support section. Most companies are going to claim this is their secret sauce and their intellectual property. That is fine, I don't have a problem with y'all being secretive about this, even though I will always push you to be more open, as well as leave the API layer out of your patents you use to pretect your algorithms.

Algorithmic transparency with APIs is not something that should be applied to all APIs in my opinion, but for regulated industries, and truly open API solutions, transparency can go a long way, and bring a number of benefits. All Twitter (and any other API provider) has to do is add parameters, and corresponding that open up the variables of the underlying algorithm for each endpoint. What goes into considerations about "what I care about", constitutes "interesting", and what makes things "popular or relevant"? Twitter will never do this, but other API providers can.

It is up to each API provider to decide how transparent they are going to be with their algorithms. The ideal solution when it comes to transparency is that the algorithm is documented and shared along with supporting code on Github, like Chicago, did for their food inspection algorithm. This opens up the algorithm, and the code behind for evaluation by 3rd parties, potentially improving upon it, as well as validating the logic behind--potentially opening up a conversation about the life of the algorithm.

There are a number of common reasons I have seen for companies and developers not opening up their algorithms:

  • It truly is secret sauce, and too much was invested to just share with the world.
  • It is crap, and the creator doesn't want anyone to know there is nothing behind.
  • There are malicious things going on behind the scenes that they do not want to be public.
  • Insecurities about coding abilities, security practices and logic applied to the algorithms.
  • Exist in competitive space with lots of bad actors, and may want to limit this behavior.
  • What is accomplished isn't really that defensible, and the only advantage is to keep hidden.

I have no problem making an argument for algorithmic transparency when it comes to regulated industries, like financial, healthcare, and education. I think it should be default in all civic, non-profit, and other similar scenarios where the whole stack should just be open sourced, and available on Github. You won't find me pushing back to hard on the startups unless I see some wild claims about the magic behind, or I see evidence of exploitation, then you will hear me rant about this some more.

Algorithmic transparency can help limit algorithmic exploitation and the other shady shit that is going on behind the scenes on a regular basis these days. I have added an algorithm section to my research, and as I see more talk about the magic of algorithms, and how these amazing creations are changing our world--I am going to be poking around a bit, and probably asking to see more algorithmic transparency when I think it makes sense.


The Opportunity For API-Driven Algorithmic Transparency At The Mobile Data Plan Level

API Evangelist is focused on helping push for sensible API-driven transparency wherever I can get it. When done in sensible ways an API can crack open the often black box that is the algorithm, giving us access and more control over our online experience.

One of the most significant algorithmic bottlenecks that govern our daily lives is our mobile data plans. All of our mobile phones are governed at the data plan level--this is where the telecom companies make their money, throttling the bits and bytes we depend on each day. 

The mobile data plan is a great place to discuss the algorithmic and data transparency that APIs can assist with, and one example of this in action is with the Google Mobile Data Plan API. Google wants more access at this level to improve the quality of experience for end-users when using mobile applications like Youtube, which can be severely impacted by data plan limits, while also significantly impacting your data plan consumption if not optimized.

There is so much opportunity for discussion between mobile network operator, API providers, developers, consumers at the data plan level. I know mobile network operators would rather keep this a black box, so they can maximize their revenue, but when you crack the network layer open with a publicly available API, there will be a number of new revenue opportunities.

Data plans affect all of us, every single day. We need more transparency into the algorithms that meter, limit, and charge us at the mobile data plan layer. We need the platforms we depend on each day to have more tools to optimize how applications consume (or do not consume) this extremely (seemingly) finite, and valuable resource (thanks, telcos!). 

I've added an algorithms area to my research to keep an eye on this topic, curate stories I find, and share my own thoughts when it comes to algorithmic transparency using APIs.


Keeping A Window Open Into How Power Flows Within Algorithms Using APIs

I just read The Pill versus the Bomb: What Digital Technologists Need to Know About Power, by Tom Steinberg (@steiny), and I'm reminded of the important role APIs will (hopefully) continue to play in helping provide a transparent window into some of the power structures being coded into the algorithms we are increasingly relying on in this digital world we are crafting.

In this century, we are seeing a huge shift in how power flows, and despite the rhetoric of some of the Silicon Valley believers, this power isn't always being democratized along the way. Much of the older power structures is just being re-inscribed into the algorithms that drive network switches, decide pricing when purchasing online, via our online banking, and virtually ever other aspect of our personal and business worlds.

APIs give us a window into how these algorithms work, providing access to 3rd party developers, government regulators, journalists, and many other essential actors across our society and economy. Don't get me wrong, APIs are no magic pill, or nuclear bomb, when it comes to making algorithmic power flows more transparent and equitable, but when they are done right, they can have a significant effect.

If APIs are a complete (or near complete) representation of the algorithms that are driving platforms, they can be used to better understand how decisions behind the algorithmic curtain are made, and exactly how power is flowing (or not) on web, mobile, and increasingly connected device platforms--API does not equal perfect transparency, but will help prevent all algorithms from being black boxes.

We may not fully understand Uber's business motivations, but through their API we can test our assumptions. We may not always trust Facebook's advertising algorithm, but using the API we can develop models for better understanding why they serve the ads they do. Drone operators may not always have the best intentions, but through mandatory device APIs, we can log flight times and locations. These are just a handful of examples that APIs can be used to map out digital power.

All of this is one of the main reasons that I do API Evangelist. I feel like we have a narrow window of opportunity to help ensure APIs act as this essential transparent layer for ALL API operations across industries. As the established power structures (eye of Sauron) turn their attention to the web, and increasingly APIs, their powers of transparency are becoming more diminished. It is up to us API Evangelists, to help make sure APIs stay publicly available to 3rd party developers, government, journalists, end users, and other key players--providing much needed transparency into how algorithms work, and how power is flowing on the web and mobile Internet.


If you think there is a link I should have listed here feel free to tweet it at me, or submit as a Github issue. Even though I do this full time, I'm still a one person show, and I miss quite a bit, and depend on my network to help me know what is going on.