Sequential Rank Ordering with Pandas

A frequent need in data analysis is to understand the order that a given event ocurred in for a given user or account. An example question that lends itself to this is "What is the avg 1st order value of people who have purchased 2 items from us?"

When I first started in analytics, I typically did this with SQL, however I thought I'd share how to do this in Pandas because I've been really enjoying how brief and concise the code is.

df['order_rank'] = df.sort_values('order_date', ascending=True).groupby(by=['user_id'])['order_date'].transform(lambda x: x.rank())

Above we create a new column, order_rank in our dataframe. We sort our dataframe of orders by date and group the dataframe by user_id and apply the transform method to the date series to come up with a sequential rank from low to high.

We go from this
image-2

to this
image-3

The above screenshots give you an example of the use of this. Now you can filter the dataframe by order number 2, create a new dataframe of user_ids that have more than one order and countless other things.