Monday, August 6, 2012

Just How Do Those $&%*# Amazon Algos Work Anyway?

(Yes, I'm still around -- thank you to those of you who have emailed their concern. I'm just concentrating on other social media areas and posting to a few private and public groups, which is cutting into personal blog time.)

Warning: This (very long) post contains math.

Authors are a funny lot. Since a core group of us introduced the idea that Amazon's algorithms for its popularity lists contain a price bias and that freebies now seem to count about 1/10 as much as paid sales on those lists, the amount of misunderstanding regarding the findings has been, to say the least, staggering. Misinformation and disinformation propagating through the interweebs and the various forums point to a serious non-grasping of the underlying concepts.

And the authors reacting to just plain wrong information are understandably questioning why they aren't seeing results in line with what they're reading.

So let's quickly revisit what those findings are and what they aren't. Then we'll take a look at how the algorithm for determining rank on the popularity lists probably works at its most basic structure. Mind you, the actual algorithm is likely much more complicated than what I'll present, but the simple form you'll see here should help you to understand how it determines the playing field.

Popularity List Findings

First, the popularity list is NOT the bestseller list. They are two different beasts. On the Amazon webpage and on Kindle devices, you have to actually navigate to the bestseller list to find it. The BS list will show the paid bestsellers in one column on the left and the free bestsellers in a column to the right. If you're not seeing BOTH paid and free books on the page, you are not looking at the actual bestsellers. If at the bottom of the page you don't see links to specifically scroll through the Top 100 books and ONLY the Top 100, you're not looking at the actual bestseller list.

What you're looking at is the popularity list. And it's THIS list we'll be discussing.

Differences between the popularity and bestseller lists:

  • The popularity list figures in freebies. The bestseller list does not.
  • The popularity list does not figure in borrows. The bestseller list does.
  • The popularity list has a price bias. The bestseller list does not.
  • The popularity list influences the bestseller list more than the bestseller list influences popularity.
  • The popularity list figures in sales (and sales-equivalents) over the last 30 days. The bestseller list weights sales history, but not to the extent the pop list does.
  • The popularity list recrunches about once per day. The bestseller list recrunches hourly.
  • The popularity list has a lag time of about 2 days. The bestseller list has about an hour lag time.
  • The popularity list rank does not display anywhere except in the list itself. The bestseller rank is the rank you find on a book's product page.

The only way to know where your book ranks on the popularity list in any given category is to tediously scroll through the list to find it. (If you're pretty sure you're several pages in and don't want to scroll through every page, you can change the page numbers via the url in your address bar; but this option is for advanced users only who can find the page number designation in the url code.)

So, to make this as clear as possible:

  • The number of freebies you give away during a free run does not in any way affect your bestseller ranking. 
  • The price of your book does not in any way affect your bestselling ranking.

These variables are used only for determining a book's rank in the popularity lists.

The reason the popularity list rankings are important is that your book's visibility in those lists seems to be a huge sales driver. YOU may not personally find books by browsing that list, but a lot of folk apparently do. Also, some of the recommendation emails Amazon sends out display the top 6 or 7 books in a category, then provide a link to the pop list to discover more books in that category.

Watching your pop list numbers is as important -- and for some, even more important -- than watching your bestseller numbers.

Algorithm for Determining Popularity List Rank

The popularity list algorithm has undergone at least 2 major changes since it came under scrutiny in January. Back then, freebies appeared to be weighted 100% of a sale and borrows appeared to be counted in as well. Because of these weightings, books in Select that went on a successful free run with 2000 or so downloads would wind up at the top of the pop lists after the 2-day lag to get there. That resulted in the famous 3-day bump when browsers would start seeing a book on the first page of a pop list and hit Buy, catapulting a lot of indie books into the stratosphere. That was the Golden Age.

In March, Amazon started doing split marketing, testing different algorithms to create its popularity lists. Between late March and early May, there appeared to be 3 separate lists being tested, and predicting the popularity of a freebie following its free run was difficult because of the multiple lists.

In early May, Amazon apparently settled on a single algorithm to display to the majority of its customers. (Caveat: the list for the Fire seems to be out of synch from the rest -- either Fire readers are being presented a different list entirely or else the servers sending out the data to Fires are delayed.) There are umpteen possibilities as to WHY Amazon settled on the algorithm it did. I've speculated elsewhere about the why as have others, and this post won't rehash those speculations. We're simply accepting that Amazon wanted to elevate certain classes of books and decelerate the meteoric rise of others. It's how they're accomplishing this that we'll look at today.

Remember, we're working on best-guess speculation here, figured out from watching how the books on the list perform against each other. It's reverse-engineering -- and subject to a lot of variables that those of us outside of Amazon are simply not privy too. There will always be outliers, and there will always be minor differences in rank performance due to those other variables. For the most part, though, this simple formula seems to be the base for the current popularity list algorithm.

[(.1 x A) + B] x C / 30 = number of sales equivalents

A = the number of freebies given away in the past 30 days (notice it gets multiplied by 0.1 or 1/10);
B = the number of actual sales in the past 30 days; and
C = the weighting given for pricing.
30 = the number of days in a month (hence, the 30-day cliff that's talked about in conjunction with the pop list)

C is guess work since it's hard to figure exactly how Amazon is weighting price. It's a big enough variable to be noticeable, but not so big that it skews the results in a truly huge way. It also seems that the weighting of price goes by ranges of price, so a 2.99 book might be weighted the same as a 3.99 book. As a guess, the following matrix might be reasonably close:

99c -$2.98 = 1.0
2.99 - 3.99 = 1.1
4.00 - 5.99 = 1.2
6.00 - 7.99 = 1.3
8.00 - 9.99 = 1.4

So let's put some real numbers in there to see how this works. I'll use SECTOR C's past 30 days as an example since it had only a modest free run the last time out and its overall July sales were modest as well.

So for SECTOR C,
A = 3325 (number of freebies given away on the US site)
B = 328 (number of US sales from July 4 - Aug 3)
C = 1.2 ($4.39 is the book's typical list price)

Plugging the numbers into the equation, and showing our work, we get:

[(.1 x 3325) + 328] x 1.2 / 30 =
(332 + 328) x 1.2 / 30 =
660 x 1.2 / 30 =
792 / 30 = 26.4

So, 26.4 is the average daily sales equivalent for the past 30 days. Because of a healthy number of freebies being figured in, that means that SECTOR C is going to enjoy a better popularity rank than another book that has sold 328 copies over the last 30 days -- even if that other book currently has a better bestseller rank.

26.4 books is equivalent to a bestseller sales rank of around #3500. On Aug 3, SECTOR C's actual bestseller rank was between #5565 and #6930.

Now, because we don't know the exact number of books other authors are selling, we have to look at current ranks to make some best guesses to see why SECTOR C is at #29 on the popularity list for Technothrillers. And because books that have been on free runs are more volatile in the ranks, it's best to compare books that are not in Select (who is this Tom Clancy that has books on either side of mine on that list?!).

Here are the ranks and prices of the non-Select books closest to mine at #29:
#25 - 3178 - $3.99
#26 - 6030 - $8.99
#27 - 5380 - $8.99
#28 - 8720 - $4.95
#29 - 3500 (equivalent) - $4.39 (This is SECTOR C)
#30 - 24,365 - $3.99
#34 - 3931 - $3.99

"Aha!" you say. "A flaw in the calculations! Look at the 24,000+ rank on the book at #30!" Well, yes, I did look at that book and I found through Google that it had been free on at least July 25, so it was either price matched during the last 30 days or left Select in the past week. Variables like this are what makes reverse-engineering difficult -- and likely what makes many folk looking at a single snapshot question the accuracy of the findings. It has taken several snapshots over an extended period of time and deep research to come up with the guesstimations that we have.

While we can never draw conclusions from such limited data, we can look at the data above and see a couple of things:

At about the same price, the books at #25, #29 and #34 line up in the rank right where we would expect them to in relation to one another. We've already determined that #30 is skewed by an earlier free run. As I can only see today's rank for #28 (it's not listed on any of the tracker sites), it could well have had a better rank 2 days ago (another reason it's important to look at all this stuff over time as well). The 2 $8.99 books at #26 and #27 are Tom Clancy books that have been selling steadily at those ranks and are a demonstration of the price bias in action.

So, Realistically, What Can You Do With This Information?

Honestly? Not a whole lot. A higher price will give you a slight advantage, but only if you're selling well enough to be near the top of the pop lists anyway. It's not like a $4.99 book is going to rank dozens of ranks better than one that sells the same number of copies at $3.99. And a 99c book that sells 1000 copies will still rank higher than a $2.99 book that sells only 300. Simply pricing your book higher is not going to automatically boost your ranking.

Giving away a LOT of books during a free run can certainly help. Even so, the 3325 copies of SECTOR C given away last month only equaled about 332 sales equivalents. Depending on the category your book is in, that could be a drop in the bucket. In categories where the top books are selling 1000 copies a day, you'd have to give away 300,000 books to compete for first-page visibility. If you only gave away 200,000 books, you'd have to make up the difference with 10,000 paid sales. For most of us, it ain't gonna happen.

So if you're looking to the algorithms to help you sell, understand exactly what the algos are doing for you -- and how they work against you. There's no magic to them. It's all pure math. And Amazon may choose to change the math that feeds them tomorrow. Just maybe, the next changes will be in our favor...


Lisa Grace said...

Thanks for explaining the algos so well and for taking the time to do the math. May the algos always be in your favor.

I have eBooks at several different price points on the five novels I have out. That way, I can at least guestimate when a change has been made.

Jan Strnad said...

This was a great explanation! Thank you so much for doing the heavy lifting.

J. R. Tomlin said...

Excellent explanation and thanks for the list of what does and does not affect which. I am embarrassed to admit it but I get forget sometimes which is which. Information like this is a huge help in making decisions about what to do. Thanks for doing all that work. I frankly would have been at a loss how to even start.

JGreen20 said...

Brilliant post. Very clear. Thanks for taking the time to explain it.

B. Justin Shier said...

What, no love for:

price.weight = 2^(0.12*price) - 0.9


price.weight = sqrt(0.2*price + 0.01)

; )

I want to commend you for putting an actual formula out for review. It's nice to have something that can be objectively assessed. I'm a bit unclear about your methods, though.

About how big was your sample size? Did you conduct a regression analysis on a large data set to obtain that price.weight matrix? Were any stats tests applied to assess the quality of the fit? (I'm not too great at running non-linear regressions, but if I remember correctly, SAS and R can do some of that stuff if you're interested.)

Also, can you provide more detail about the outliers referenced in the article? How often did you observe them? Did they balloon the variance? I don't expect you to be able to explain them all; I'd just like to get a sense of their rate and severity.


Mike McIntyre said...

Awesome piece of investigation and explanation!

Anonymous said...

Thanks for the research and sharing! I've noticed that even with stellar giveaway days, the boost has been rather disappointing. Out of Select and on to other things ...

Christina Garner said...

Holy moly! That's more math than I've seen since high school, but I sure do appreciate you taking time to figure it out and post it.

Calee @ Xist Publishing said...

Here's one more thing to add to the puzzle-- beginning on May 1st, Amazon completely changed the way it serves up the popularity list of children's books. Instead of it being browseable for all children's book categories, it has effectively disappeared from the website for the 2 most important categories to our books. The popularity list still exists, but has really lost its power for certain categories (yet is still effective for the ones that have not been hidden.) Amazon has also diverted traffic away from the main Children's Books category on the Fire and on the Web to a new category called
Kindle Store › Children's Color Picture Books. The only way to be present in this category, as far as I can tell, is to produce a KF8 fixed layout book that includes the programming for pop up text, thus limiting the book to only work on the Fire and Android devices. We track daily/weekly/monthly sales of our 100+ children's titles and the category change was more disruptive to our sales than the algorithm change (though that was rough too!)
It's great to get data from another source and I really appreciate what you continue to share!

Phoenix Sullivan said...

Thanks, all! If it helps to take any of the mystery away, it was worth the time to post :o).

B, I've responded over on Kindleboards. Thanks for your update as well!

Calee: Thanks for this information about the Children's Books categories. Children's and erotica are areas we haven't spent much time on. But I can see where interactivity in kids' books may be a harbinger of things to come on the adult side too. Definitely something to explore further and keep an eye on!

Michelle said...

Impressive explanation!

Does this work the same with normal non-book items listed on Amazon?

Phoenix Sullivan said...

Thanks, Michelle!

I'm afraid I have no idea if it correlates with non-book items at all. My guess would be no, that Amazon has different teams working different departments and adjusting the agorithms to return the specific results they need/want for their specific department's goals. The basic principle would be the same, but the equations would be far different, I suspect.

Jo Antareau said...

Excellent post Phoenix.

And... did I spy a new cover for Sector C?
Has that influenced sales figures?

Phoenix Sullivan said...

Thanks for spotting the new cover for SECTOR C, Jo! Yes, it has influenced sales - in an unexpected way. I'll have deets on that in my next post ;o).

Peter Dudley said...

@Lisa, ROFL (May the algos be always in your favor.)

Phoenix, you may know that I'm pretty sharp and can handle some reasonably complex analysis (I have an EE degree from Berkeley after all), but for some reason this boggles my mind. (Maybe it's because I always read this kind of thing after my work week and after a couple of glasses of wine. Think there's a correlation? Nah, probably not.)

In any case, just knowing some of the more squishy stuff, even without the math, is incredibly valuable to a nube in this area. I appreciate you.

David Biddle said...

Excellent work PS. Gonna plug my own numbers in and see what they yield. That said, I had 10,108 free downloads of my mystery over a 3 day weekend shot (Aug 17-19). I had 55 sales and 16 borrows in the first 6 days after.

I been tracking popularity and it fell from #12 to where I can't find it in two days after the frees in the Mystery category. It was up to #5 in "Metaphysical" for Kindle and was only down to #10 last night. I haven't had a sale in three days now.

The whole system is so wanky. Your awesome work is damned important. Keep it up.

King Samuel Benson said...

A very useful post, Phoenix. Now I know what to expect. Thanks for sharing.

Greg Hamerton said...

Great post Phoenix!

This is what I've noticed.

The Poplist seems to recrunch twice daily at roughly 5am and 4pm EST.

The Poplist reacts to your browsing history and location - to get a clean/true poplist you must LOG OUT. When I did this my book dropped from #49 to #128! Amazon knows I often browse my own title to check its rank, and so it is shown higher up MY poplist when I'm logged in.

I experimented with raising prices to $7.69 around the latest free run, then dropped back to $3.99 after a few days when sales paused, which helped to increase profit, but I have no idea if this damaged sales or helped to promote the book at higher perceived value. Now that Amazon can discount trade titles, the days of high priced ebooks are numbered. My poplist ranking didn't change when I lowered the price.

Lady T. L. Jennings said...

Thank you for this most excellent blog post!
Informative, accurate, and well written!
If I could I would give it a Five Star review.

/ Yours sincerely Lady T. L. Jennings