Sequential Rank Ordering with Pandas

A frequent need in data analysis is to understand the order that a given event ocurred in for a given user or account. An example question that lends itself to this is "What is the avg 1st order value of people who have purchased 2 items from us?"

When I first started in analytics, I typically did this with SQL, however I thought I'd share how to do this in Pandas because I've been really enjoying how brief and concise the code is.

df['order_rank'] = df.sort_values('order_date', ascending=True).groupby(by=['user_id'])['order_date'].transform(lambda x: x.rank())

Above we create a new column, order_rank in our dataframe. We sort our dataframe of orders by date and group the dataframe by user_id and apply the transform method to the date series to come up with a sequential rank from low to high.

We go from this

to this

The above screenshots give you an example of the use of this. Now you can filter the dataframe by order number 2, create a new dataframe of user_ids that have more than one order and countless other things.