# Why Einstein Discovery is an amazing product

## Motivation

After got the Salesforce Einstein certification, I still wanted to learn more about Einstein Discovery, and had an idea that if only use python, at what extent I can reproduce the whole story? What would be the difference if compared to Einstein Discovery? What the precision of predict model would be like? How many pieces of facts I can figure out without Einstein Discovery……?

So here comes this blog, since multivariable regression (one dependent variable, multiple independent variables) and ANOVA analysis is not easy, and I am not so proficient in data science, all the following python analysis is very basic and I didn’t integrate any sophisticated algorithm in this try.

## Process

I used the ‘RetailSales’ dataset which is provided by Salesforce, you can download it from the newly created EA org. After download and read it, the dataset is something like this:

While reading the dataset, I transformed the ‘Epoch of Days’ to date format, and dropped the ’Record ID’ column, the target variable is ‘DailyQuantity’(maximized). Next, I drew two images to get an intuitive description about the dataset, such as:

From the above, we can see that there are few records when the value of ’DailyQuantity’ is greater than 3000, so these outliers are being removed before data training. After normalization, the dataset is something like this:

The next step is to split the dataset, 80% of data are used for training

OK, after data preprocessing, I drew a correlation heat map (at when those categorical data columns still kept as labels). The result indicates that city is strongly related to ’DailyQuantity’.

If we open the story created by Einstein, similar conclusion is listed on the top of “What’s happened”, which shows “City explains 35.3% of the variation in DailyQuantity__c”.

And if I use the normalized dataframe, the result heat map looks like below:

We can know that ’DailyQuantity’ has some relationship with independent variables ‘Promotion’ and ‘EpochDays’. Let’s take a look at what Discovery tells us:

It seems good, Salesforce Discovery gives a straight and quantified result, which is much better than what I did in python by hands. Until then, I clearly understand how hard the reproduce work would be. If more details we want to dig, the more on-hands exploration we should do, both from the feather correlation and regression perspective.

○eg, draw a thermal map to see the correlation of ’Discount’ and ’DailyQuantity’.

○Another example, the correlation between independent variables (Color is depending on the value of ’DailyQuantity’):

And the detailed results of above examples are all included in the result of Discovery, what we need to do is only pick it up from the list or from the menu. They are more comprehensive, more accurate, and more understandable:

The cool thing here is that we can also get a deeper view inside the data which is valuable but could be neglected. Eg, if click on the waterfall bar, when ‘Discount’ is 0.1 and City is Boston, the detail would be:

Being equipped with this powerful weapon, we can detect potential **unusual pattern** much easier and save us a lot of times.

Last, check the R Square value in Salesforce Discovery:

If only use Ridge Regression（from sklearn.linear_model）, the result :

The difference between these two models is significant. Also, I drew an image which traces the coefficient trend of independent variables as alpha increases in ridge regression.

## The last thing, Why we need segment before regression?

Beacuse purely linear regression is not enough, here I changed the model and used Keras Sequential(no branching, every layer has one input and output. The output of one layer is the input of the layer below it) to demostrate how the predited and real data is scatted.

```
def create_mlp(dim, regress=False):
# define our MLP network
model = Sequential()
model.add(Dense(8, input_dim=dim, activation="relu"))
model.add(Dense(4, activation="relu"))
# check to see if the regression node should be added
if regress:
model.add(Dense(1, activation="linear"))
return model
model = create_mlp(X_train.shape[1], regress=True)
opt = Adam(lr=1e-3, decay=1e-3 / 200)
model.compile(loss="mean_absolute_percentage_error", optimizer=opt)
model.fit(X_train, y_train, validation_data=(X_train, y_train),
epochs=200, batch_size=8)
```

Hope you would enjoy this post.